FSM-generic (Voice OS) · davidbmar.com

What it is

FSM-generic is a Python-based framework for building voice-driven applications using finite state machines (FSMs). It separates the core runtime (kernel) from application logic (workflows), allowing developers to define conversational flows as JSONL files. The system handles session management, LLM orchestration, event broadcasting, and tool execution, supporting both voice (via Twilio/WebRTC) and text interfaces. It includes a Research Dashboard for managing intelligence-gathering missions and a Visual Editor for designing workflows.

Features

Define conversational apps as JSONL workflow state machines
Built-in kernel for FSM runtime, session management, and event bus
Research Operations Dashboard for managing intelligence missions
Visual Workflow Editor for browser-based state machine design
Support for voice channels (Twilio, WebRTC) and text CLI demos
Extensible tool registry for custom actions like email and search

Quickstart

git clone --recurse-submodules https://github.com/davidbmar/FSM-generic
cd FSM-generic
./scripts/setup.sh
echo 'ANTHROPIC_API_KEY=sk-ant-your-key' > .env
./scripts/run.sh

Architecture

flowchart TD
    User[User/Voice Client] -->|WebRTC/Twilio| Gateway[Gateway/WebRTC Signaling]
    Gateway -->|Events| Server[FastAPI Server]
    Server -->|Routes| Shell[Shell Router]
    Shell -->|Manages| Session[VoiceSession FSM]
    Session -->|Uses| Kernel[Kernel Runtime]
    Kernel -->|Orchestrates| Engine[LLM Engine Submodule]
    Kernel -->|Publishes| EventBus[Event Bus]
    EventBus -->|Notifies| Services[System Services]
    Services --> Memory[Memory Service]
    Services --> Transcript[Transcript Service]
    Services --> Timer[Timer Service]
    Session -->|Invokes| Tools[Tool Registry]
    Tools --> AppTools[App Specific Tools]
    Server -->|Serves| Frontend[Web Frontends]
    Frontend --> Dashboard[Research Dashboard]
    Frontend --> Editor[Visual Editor]

How it's built

The backend is built with Python 3.11+ using FastAPI for the server interface. The core engine relies on a custom FSM runtime, an event bus for pub/sub communication, and a tool registry for extensible actions. LLM integration is handled via a submodule (engine-repo) supporting Anthropic Claude, OpenAI, or Ollama. Frontends are built with React, TypeScript, Vite, and Tailwind CSS. The architecture follows a layered model: Kernel (runtime/voice pipeline), System Services (memory/transcript), Shell (intent/workflow stack), and Apps (workflow definitions).

How it runs

sequenceDiagram
    participant U as User
    participant F as Frontend (Dashboard/Editor)
    participant S as FastAPI Server
    participant SR as Shell Router
    participant VS as VoiceSession
    participant K as Kernel/Engine
    participant T as Tool Registry
    
    U->>F: Interacts with UI
    F->>S: API Request (e.g., start mission)
    S->>SR: Route Intent
    SR->>VS: Create/Resume Session
    VS->>K: Process State (LLM Call)
    K-->>VS: LLM Response & Action
    alt Action Required
        VS->>T: Execute Tool
        T-->>VS: Tool Result
    end
    VS->>S: Update Session State
    S-->>F: Return Response/Update
    F-->>U: Display Result

How to apply & reuse

Use this framework to build complex, multi-turn conversational agents where strict control over dialogue flow is required. It is suitable for creating research assistants, meeting scribes, or interactive voice response systems. Developers define workflows as state machines, register custom tools (e.g., email sending, web search), and deploy via the provided FastAPI server. The visual editor allows non-developers to inspect and modify workflow logic.

At a glance

CapabilitiesState Machine RuntimeLLM OrchestrationEvent Bus Pub/SubSession ManagementTool ExecutionVoice Channel IntegrationWorkflow Visualization

ComponentsKernel (FSM Runtime, Event Bus)Shell (Intent Classifier, Stack Manager)System Services (Memory, Timer, Transcript)Engine Submodule (LLM, STT, TTS)Research Dashboard (React)Visual Editor (TypeScript)Gateway (WebRTC Signaling)

TechPython 3.11+FastAPIReactTypeScriptViteTailwind CSSNode.js 18+Pydantic

Depends onAnthropic API KeyGit SubmodulesPython Virtual EnvironmentNode Package Manager

Integrates withAnthropic ClaudeOpenAIOllamaTwilioWebRTCTavily SearchBrave SearchSerper API

PatternsFinite State MachineEvent-Driven ArchitecturePlugin/Tool RegistryLayered ArchitecturePub/Sub Messaging

Reuse tagsvoice-assistantstate-machinellm-orchestrationconversational-aiworkflow-enginefastapireact-dashboard