A generic voice-first operating system framework that defines conversational apps as workflow state machines.
https://github.com/davidbmar/FSM-generic · private · shipped

FSM-generic is a Python-based framework for building voice-driven applications using finite state machines (FSMs). It separates the core runtime (kernel) from application logic (workflows), allowing developers to define conversational flows as JSONL files. The system handles session management, LLM orchestration, event broadcasting, and tool execution, supporting both voice (via Twilio/WebRTC) and text interfaces. It includes a Research Dashboard for managing intelligence-gathering missions and a Visual Editor for designing workflows.
git clone --recurse-submodules https://github.com/davidbmar/FSM-generic cd FSM-generic ./scripts/setup.sh echo 'ANTHROPIC_API_KEY=sk-ant-your-key' > .env ./scripts/run.sh
flowchart TD
User[User/Voice Client] -->|WebRTC/Twilio| Gateway[Gateway/WebRTC Signaling]
Gateway -->|Events| Server[FastAPI Server]
Server -->|Routes| Shell[Shell Router]
Shell -->|Manages| Session[VoiceSession FSM]
Session -->|Uses| Kernel[Kernel Runtime]
Kernel -->|Orchestrates| Engine[LLM Engine Submodule]
Kernel -->|Publishes| EventBus[Event Bus]
EventBus -->|Notifies| Services[System Services]
Services --> Memory[Memory Service]
Services --> Transcript[Transcript Service]
Services --> Timer[Timer Service]
Session -->|Invokes| Tools[Tool Registry]
Tools --> AppTools[App Specific Tools]
Server -->|Serves| Frontend[Web Frontends]
Frontend --> Dashboard[Research Dashboard]
Frontend --> Editor[Visual Editor]
The backend is built with Python 3.11+ using FastAPI for the server interface. The core engine relies on a custom FSM runtime, an event bus for pub/sub communication, and a tool registry for extensible actions. LLM integration is handled via a submodule (engine-repo) supporting Anthropic Claude, OpenAI, or Ollama. Frontends are built with React, TypeScript, Vite, and Tailwind CSS. The architecture follows a layered model: Kernel (runtime/voice pipeline), System Services (memory/transcript), Shell (intent/workflow stack), and Apps (workflow definitions).
sequenceDiagram
participant U as User
participant F as Frontend (Dashboard/Editor)
participant S as FastAPI Server
participant SR as Shell Router
participant VS as VoiceSession
participant K as Kernel/Engine
participant T as Tool Registry
U->>F: Interacts with UI
F->>S: API Request (e.g., start mission)
S->>SR: Route Intent
SR->>VS: Create/Resume Session
VS->>K: Process State (LLM Call)
K-->>VS: LLM Response & Action
alt Action Required
VS->>T: Execute Tool
T-->>VS: Tool Result
end
VS->>S: Update Session State
S-->>F: Return Response/Update
F-->>U: Display Result
Use this framework to build complex, multi-turn conversational agents where strict control over dialogue flow is required. It is suitable for creating research assistants, meeting scribes, or interactive voice response systems. Developers define workflows as state machines, register custom tools (e.g., email sending, web search), and deploy via the provided FastAPI server. The visual editor allows non-developers to inspect and modify workflow logic.
✓ all on main — nothing unmerged.