grassy-knoll

A generic voice-first operating system framework that defines conversational apps as workflow state machines.

https://github.com/davidbmar/grassy-knoll  ·  public  ·  shipped

What it is

Grassy Knoll is a Python-based framework for building voice-driven applications. It treats conversations as Finite State Machines (FSMs), where the kernel manages runtime execution, event broadcasting, and session state. Developers define apps using JSONL workflow files that specify states, LLM prompts, transitions, and tool invocations, allowing for structured, deterministic voice interactions powered by large language models.

Features

Quickstart

git clone --recurse-submodules https://github.com/davidbmar/grassy-knoll
cd grassy-knoll
./scripts/setup.sh
echo 'ANTHROPIC_API_KEY=sk-ant-your-key' > .env
./scripts/run.sh

Architecture

flowchart TD
    User[User/Voice Input] --> Channel[Voice Channel]
    Channel --> Shell[Shell Router]
    Shell --> Session[VoiceSession FSM]
    Session --> Kernel[Kernel Runtime]
    Kernel --> Engine[Engine Submodule]
    Engine --> LLM[Anthropic LLM]
    Kernel --> EventBus[EventBus]
    EventBus --> Services[System Services]
    Services --> Memory[Memory Service]
    Services --> Transcript[Transcript Service]
    Session --> Tools[Tool Registry]
    Tools --> AppTools[App Specific Tools]

How it's built

The system is built in Python 3.11+ with a modular architecture. The core 'kernel' handles the FSM runtime, event bus, and session management. An external 'engine' submodule (linked via symlink) provides LLM orchestration (Anthropic), STT, and TTS capabilities. System services like memory, timers, and file I/O run as independent components communicating via an internal EventBus. Workflows are defined declaratively in JSONL, while custom logic is implemented as Python tools registered in a ToolRegistry.

How it runs

sequenceDiagram
    participant U as User
    participant S as ShellRouter
    participant VS as VoiceSession
    participant KB as Kernel/FSM
    participant Eng as Engine (LLM)
    participant TR as ToolRegistry
    
    U->>S: Voice/Input Text
    S->>VS: Route to Active Workflow
    VS->>KB: Process Current State
    KB->>Eng: Generate Response/Action
    Eng-->>KB: LLM Output
    alt Tool Call Required
        KB->>TR: Execute Tool
        TR-->>KB: Tool Result
        KB->>VS: Update State/Transition
    else Direct Response
        KB->>VS: Update State/Transition
    end
    VS-->>S: Formatted Response
    S-->>U: Output Audio/Text

How to apply & reuse

Use this framework to build structured voice assistants such as meeting scribes, reservation systems, or interactive briefing agents. It is suitable for developers who need more control than standard chat interfaces but less overhead than building a voice pipeline from scratch. Define your conversation flow in JSONL, implement specific side-effects (like sending emails or querying databases) as Python tools, and let the kernel handle the state transitions and LLM context management.

At a glance

CapabilitiesState machine managementLLM orchestrationEvent broadcastingSession persistenceTool invocationWorkflow loading
ComponentsKernelShell RouterVoiceSessionEventBusToolRegistrySystem ServicesEngine Submodule
TechPython 3.11+Anthropic APIJSONLAsyncIOPytest
Depends onanthropicpython-dotenvpytestfaster-whisper (optional)piper-tts (optional)
Integrates withAnthropic ClaudeTwilio (via channels)WebRTC (via channels)SMTP (via tools)
PatternsFinite State MachinePublish-SubscribeCommand Pattern (Tools)Dependency InjectionModule Symlinking
Reuse tagsvoice-assistantconversational-aistate-machinellm-frameworkpython-library

Repo hygiene

✓ all on main — nothing unmerged.