riff

Voice agent platform that generates deterministic FSM-based voice flows from plain English business descriptions.

https://github.com/davidbmar/riff  ·  private  ·  shipped

📋 Project Documentation & Roadmap →

What it is

riff is a Python-based voice agent framework that combines Large Language Models (LLMs) for natural language understanding with declared Finite State Machines (FSMs) for strict workflow enforcement. It allows developers to describe a business process in plain English, automatically generating a YAML-defined state graph that handles phone conversations (ordering, scheduling, intake). The system ensures reliability by keeping all state transitions, slot validation, and guardrails as deterministic pure functions, while the LLM only handles language generation and intent recognition within those bounds.

Features

Quickstart

cd ~/src/riff
.venv/bin/python3 -m pytest tests/ -q
.venv/bin/python3 -m riff.web_server

Architecture

flowchart TD
    Caller[Caller] -->|Speaks| STT[STT]
    STT -->|Text| RunTurn[run_turn]
    subgraph CorePipeline [Core Pipeline]
        RunTurn -->|Context| Guardrails[Guardrails<br/>8 layers pure fns]
        RunTurn -->|Prompt| LLM[LLM Call<br/>Gemini/Gemma/Claude]
        RunTurn -->|State| StateMgr[State Manager<br/>Declared Graph]
    end
    Guardrails -->|Validated| RunTurn
    LLM -->|Response| RunTurn
    StateMgr -->|Next State| RunTurn
    RunTurn -->|Result| SlotExt[slot_extractor]
    RunTurn -->|Result| Eval[eval framework]
    RunTurn -->|Result| Logger[turn_logger JSONL]
    RunTurn -->|Audio| TTS[TTS]
    TTS -->|Hears| Caller

How it's built

The core engine uses a `StateManager` to enforce transitions defined in YAML flow files. Input speech is converted to text (STT), processed by `run_turn()` which consults the LLM (via adapters for Gemini, Gemma, or Claude) and deterministic guardrails, then converts response text to speech (TTS). Key components include `slot_extractor` for deterministic data capture, `state_manager` for graph traversal, and an event bus for internal signaling. The architecture isolates non-deterministic LLM calls behind pure-function interfaces for slots and transitions.

How it runs

sequenceDiagram
    participant C as Caller
    participant S as STT
    participant RT as run_turn()
    participant SM as StateManager
    participant L as LLM Adapter
    participant T as TTS
    
    C->>S: Speaks audio
    S->>RT: Transcribed text
    RT->>SM: Get current state & valid transitions
    SM-->>RT: State definition & guards
    RT->>L: Generate response based on state/context
    L-->>RT: Raw text response
    RT->>SM: Validate transition & extract slots
    SM-->>RT: Updated state & validated slots
    RT->>T: Synthesize audio response
    T->>C: Plays audio
    RT->>RT: Log turn result (JSONL)

How to apply & reuse

Use riff to build robust customer service voice agents where hallucination prevention is critical. Ideal for scheduling services (plumbing, dental), retail orders (pizza, coffee), or complex intakes (apartment viewings). Developers define business logic via YAML or generate it via the `/api/flows/generate` endpoint, then integrate the web server or MCP server into their telephony infrastructure.

At a glance

CapabilitiesPlain-text to voice-flow generationDeterministic state machine enforcementMulti-provider LLM adaptation (Gemini, Gemma, Claude)Real-time session inspection and debuggingA/B testing with synthetic scoringMCP server exposure for AI clients
ComponentsStateManagerFlow LoaderSession ContextTurn RunnerGuardrail EngineSlot ExtractorWeb ServerMCP ServerEvent Bus
TechPythonYAMLMermaidPytestFastAPI/Stdlib HTTPGit
Depends onGoogle Gemini API (optional)Local Gemma model (optional)Anthropic Claude API (optional)MCP CLI (optional)
Integrates withTelephony systems via STT/TTS adaptersModel Context Protocol (MCP) clientsBrowser-based voice UI (voice.html)
PatternsLLM riffs, FSM keeps beatPure function guardrailsDeterministic slot extractionWeakref-finalize cache cleanupGrep-audit invariant testsPre-registered canary tests
Reuse tagsvoice-agentfsm-orchestrationllm-guardrailspython-frameworkdeterministic-aibusiness-automation

⚠ Needs attention