iPhone Streaming Plus Finite State Machine

Mac-hosted Python voice assistant streaming TTS to iPhone Safari via WebRTC with a hybrid FSM workflow engine for complex queries.

https://github.com/davidbmar/iphone-streaming-plus-Finite-State-Machine  ·  public  ·  shipped

iPhone Streaming Plus Finite State Machine screenshot

What it is

A local-first voice agent that runs on macOS and connects to iPhone Safari. It uses Whisper for STT, routes queries via regex to either a fast LLM path or a multi-step Finite State Machine (Research, Deep Dive, Fact Check), and streams Piper TTS audio back to the phone using WebRTC.

Features

Quickstart

git clone https://github.com/davidbmar/iphone-streaming-plus-Finite-State-Machine.git
cd iphone-streaming-plus-Finite-State-Machine
pip install -r requirements.txt
python main.py

Architecture

flowchart TD
    subgraph Mac_Host["Mac Host"]
        Engine["Engine\nWorkflowRunner\nKeyword Router\nFSM Executor"]
        Gateway["Gateway\naiohttp Server :8080\nWebSocket Signaling\nRTCPeerConnection"]
        TTS["Piper TTS (ONNX)"]
        STT["Whisper STT"]
        LLM["LLM Provider\n(Ollama/Claude/OpenAI)"]
    end
    subgraph iPhone["iPhone Safari"]
        UI["Voice Agent UI\nHold-to-Talk\nWorkflow Debugger"]
    end
    UI -->|Audio/Mic Input| Gateway
    Gateway -->|Audio Data| STT
    STT -->|Text| Engine
    Engine -->|Query| LLM
    LLM -->|Response Text| Engine
    Engine -->|Text| TTS
    TTS -->|Audio Chunks| Gateway
    Gateway -->|WebRTC Audio Track| UI
    Engine -->|State Updates| Gateway
    Gateway -->|Debug Info| UI

How it's built

Python backend using aiohttp for signaling and WebRTC peer connections. The core logic includes a keyword router, an FSM executor for complex workflows, and adapters for Whisper (STT), Piper (TTS), and multiple LLM providers (Ollama, Claude, OpenAI). The frontend is a mobile-optimized web UI with hold-to-talk controls and a workflow debugger.

How it runs

sequenceDiagram
    participant User as iPhone User
    participant UI as Safari UI
    participant GW as Mac Gateway (aiohttp)
    participant Eng as Engine (FSM/Router)
    participant LLM as LLM Provider
    participant TTS as Piper TTS
    
    User->>UI: Hold to Talk (Mic Input)
    UI->>GW: Send Audio via WebSocket
    GW->>Eng: Forward Audio Data
    Eng->>Eng: Whisper STT Transcription
    Eng->>Eng: Keyword Router Decision
    alt Complex Query
        Eng->>LLM: FSM State Prompt (e.g., Initial Lookup)
        LLM-->>Eng: Search Query/Reasoning
        Eng->>Eng: Execute Tool (Web Search)
        Eng->>LLM: Next FSM State (Synthesize)
        LLM-->>Eng: Final Answer Text
    else Simple Query
        Eng->>LLM: Direct Chat Completion
        LLM-->>Eng: Response Text
    end
    Eng->>TTS: Generate Audio from Text
    TTS-->>Eng: Audio Chunks (PCM)
    Eng->>GW: Stream Audio Chunks
    GW->>UI: WebRTC Audio Track
    UI->>User: Play Response Audio

How to apply & reuse

Deploy on a Mac connected to the same network as your iPhone. Use it for privacy-focused voice assistance, local LLM experimentation, or as a template for building stateful voice agents with WebRTC audio streaming.

At a glance

CapabilitiesVoice InteractionWebRTC StreamingFinite State Machine ExecutionLocal LLM IntegrationReal-time TranscriptionText-to-Speech Synthesis
Componentsengine/workflow.pyengine/adapter.pyengine/conversation.pyengine/fast_path.pyengine/input_filter.pyengine/llm.pygateway/server.py
TechPythonWebRTCaiohttpWhisperPiper TTSOllamaMermaid
Depends onmacOS HostPython 3.10+Ollama (optional)Anthropic API Key (optional)OpenAI API Key (optional)
Integrates withiPhone SafariOllamaClaude APIOpenAI APIWeb Search Tools
PatternsFinite State MachineKeyword RoutingClient-Server WebRTCSliding Window ContextFast Path Optimization
Reuse tagsvoice-agentwebrtc-audiofinite-state-machinelocal-llmpython-backendios-web-app

⚠ Needs attention