A generic voice-first operating system framework that defines conversational apps as workflow state machines.
https://github.com/davidbmar/grassy-knoll · public · shipped
Grassy Knoll is a Python-based framework for building voice-driven applications. It treats conversations as Finite State Machines (FSMs), where the kernel manages runtime execution, event broadcasting, and session state. Developers define apps using JSONL workflow files that specify states, LLM prompts, transitions, and tool invocations, allowing for structured, deterministic voice interactions powered by large language models.
git clone --recurse-submodules https://github.com/davidbmar/grassy-knoll cd grassy-knoll ./scripts/setup.sh echo 'ANTHROPIC_API_KEY=sk-ant-your-key' > .env ./scripts/run.sh
flowchart TD
User[User/Voice Input] --> Channel[Voice Channel]
Channel --> Shell[Shell Router]
Shell --> Session[VoiceSession FSM]
Session --> Kernel[Kernel Runtime]
Kernel --> Engine[Engine Submodule]
Engine --> LLM[Anthropic LLM]
Kernel --> EventBus[EventBus]
EventBus --> Services[System Services]
Services --> Memory[Memory Service]
Services --> Transcript[Transcript Service]
Session --> Tools[Tool Registry]
Tools --> AppTools[App Specific Tools]
The system is built in Python 3.11+ with a modular architecture. The core 'kernel' handles the FSM runtime, event bus, and session management. An external 'engine' submodule (linked via symlink) provides LLM orchestration (Anthropic), STT, and TTS capabilities. System services like memory, timers, and file I/O run as independent components communicating via an internal EventBus. Workflows are defined declaratively in JSONL, while custom logic is implemented as Python tools registered in a ToolRegistry.
sequenceDiagram
participant U as User
participant S as ShellRouter
participant VS as VoiceSession
participant KB as Kernel/FSM
participant Eng as Engine (LLM)
participant TR as ToolRegistry
U->>S: Voice/Input Text
S->>VS: Route to Active Workflow
VS->>KB: Process Current State
KB->>Eng: Generate Response/Action
Eng-->>KB: LLM Output
alt Tool Call Required
KB->>TR: Execute Tool
TR-->>KB: Tool Result
KB->>VS: Update State/Transition
else Direct Response
KB->>VS: Update State/Transition
end
VS-->>S: Formatted Response
S-->>U: Output Audio/Text
Use this framework to build structured voice assistants such as meeting scribes, reservation systems, or interactive briefing agents. It is suitable for developers who need more control than standard chat interfaces but less overhead than building a voice pipeline from scratch. Define your conversation flow in JSONL, implement specific side-effects (like sending emails or querying databases) as Python tools, and let the kernel handle the state transitions and LLM context management.
✓ all on main — nothing unmerged.