Riff · davidbmar.com

What it is

riff is a Python-based platform for building voice agents that combine Large Language Models (LLMs) for natural language understanding with Finite State Machines (FSMs) for strict workflow enforcement. Users describe a business process in plain text, and riff generates a YAML-defined state graph. During calls, the LLM handles conversation nuances while the FSM validates slots, enforces transitions, and prevents hallucinations, ensuring reliable execution of tasks like scheduling, ordering, or intake.

Features

Generates voice agent flows from plain English business descriptions
Enforces deterministic state transitions using YAML-defined FSMs
Prevents LLM hallucinations via slot validation and guardrails
Supports multiple LLM backends including Gemini, Gemma, and Claude
Includes built-in web UI for real-time conversation monitoring
Provides MCP server integration for reusable agent blocks

Quickstart

cd ~/src/riff
.venv/bin/python3 -m pytest tests/ -q
.venv/bin/python3 -m riff.web_server
curl -X POST localhost:8765/api/flows/generate -H "Content-Type: application/json" -d '{"description":"Hair salon in Austin. Book cuts and colors."}'

Architecture

flowchart TD
    Caller[Caller]
    STT[Speech to Text]
    RunTurn[run_turn pipeline]
    Guardrails[Guardrails Pure Functions]
    LLMCall[LLM Adapter]
    StateManager[State Manager]
    SlotExtractor[Slot Extractor]
    EvalFramework[Evaluation Framework]
    TurnLogger[Turn Logger JSONL]
    TTS[Text to Speech]
    Caller --> STT
    STT --> RunTurn
    RunTurn --> Guardrails
    RunTurn --> LLMCall
    RunTurn --> StateManager
    Guardrails --> RunTurn
    LLMCall --> RunTurn
    StateManager --> RunTurn
    RunTurn --> SlotExtractor
    RunTurn --> EvalFramework
    RunTurn --> TurnLogger
    RunTurn --> TTS
    TTS --> Caller

How it's built

The core engine uses a `run_turn` pipeline where speech-to-text input passes through deterministic guardrails, an LLM adapter (Gemini/Gemma/Claude), and a state manager. The state manager uses pure functions to validate transitions against a declared YAML graph. Slot extraction uses deterministic fallbacks. The system is designed with strict separation between non-deterministic LLM calls and deterministic logic (guardrails, validators, state transitions). It includes a web server for UI interaction, an MCP server for tool integration, and a self-improvement loop for metric tracking.

How it runs

sequenceDiagram
    participant Caller
    participant WebServer
    participant RunTurn
    participant LLMAdapter
    participant StateManager
    participant STT
    participant TTS
    Caller->>WebServer: Speaks audio
    WebServer->>STT: Convert audio to text
    STT->>RunTurn: Submit user utterance
    RunTurn->>StateManager: Get current state and constraints
    StateManager-->>RunTurn: Return valid transitions and slots
    RunTurn->>LLMAdapter: Generate response based on context
    LLMAdapter-->>RunTurn: Return proposed action and text
    RunTurn->>StateManager: Validate transition and slots
    StateManager-->>RunTurn: Confirm valid state change
    RunTurn->>TTS: Convert response text to audio
    TTS-->>WebServer: Return audio stream
    WebServer-->>Caller: Play audio response

How to apply & reuse

Use riff to rapidly prototype and deploy voice agents for specific business verticals (e.g., plumbing, dental clinics, retail). Define the business logic via natural language description to auto-generate the flow, or manually edit the YAML state graph for complex requirements. Integrate into existing telephony systems via the provided API endpoints or use the built-in web interface for testing and demonstration. Extend capabilities by registering custom tools and guards in Python modules that auto-load with the package.

At a glance

CapabilitiesNatural language to FSM flow generationReal-time voice conversation handlingDeterministic slot filling and validationMulti-LLM backend supportSelf-improvement metric trackingModel Context Protocol MCP integration

Componentsrun_turn pipelinestate_managerLLM adaptersguardrails engineslot extractorweb_serverMCP serverflow loadersession storeevent bus

TechPythonYAMLMermaidJSONLSQLiteWebSocketsHTTP API

Depends onpytestGoogle Gemini APIGemma modelsClaude APIMCP CLIOrderedDict

Integrates withTelephony systemsWeb browsersMCP clientsLLM providers

PatternsFinite State Machine orchestrationDeterministic guardrailsPure function state transitionsLLM riffing with FSM beatingSilent failure observabilityLRU caching for previews

Reuse tagsvoice-agentfsmllm-orchestrationpythonyaml-configdeterministic-ai

⚠ Needs attention

unmerged_branch: agent/l5-style-and-mars-ota is 1 commit ahead of the default branch
unmerged_branch: agent/space-channel-link-only-watchdog is 1 commit ahead of the default branch
unmerged_branch: architect/flow-metrics-schema-2026-05-27 is 1 commit ahead of the default branch
unmerged_branch: codex-m3/ladder-semantic-ack-judge-20260528 is 1 commit ahead of the default branch
unmerged_branch: dependabot/npm_and_yarn/web/flow-editor/npm_and_yarn-3213b4e331 is 1 commit ahead of the default branch
unmerged_branch: feat/gemini-off-record-audio is 37 commits ahead of the default branch
unmerged_branch: feat/noc-briefing-integration is 1 commit ahead of the default branch
unmerged_branch: feat/tenant-message-handoff is 1 commit ahead of the default branch
unmerged_branch: release-mgr/property-ticket-readiness-2026-05-27 is 2 commits ahead of the default branch
unmerged_branch: review/metrics-audit-2026-05-27 is 2 commits ahead of the default branch
unmerged_branch: space-channel-open-line is 5 commits ahead of the default branch
unmerged_branch: wip/property-ticket-l2-real-routing-20260526-174019 is 3 commits ahead of the default branch
open_pr: PR #18: chore(deps-dev): Bump undici from 7.25.0 to 7.28.0 in /web/flow-editor in the npm_and_yarn group across 1 directory