Mac-hosted Python voice assistant streaming TTS to iPhone Safari via WebRTC with a hybrid FSM workflow engine for complex queries.
https://github.com/davidbmar/iphone-streaming-plus-Finite-State-Machine · public · shipped

A local-first voice agent that runs on macOS and connects to iPhone Safari. It uses Whisper for STT, routes queries via regex to either a fast LLM path or a multi-step Finite State Machine (Research, Deep Dive, Fact Check), and streams Piper TTS audio back to the phone using WebRTC.
git clone https://github.com/davidbmar/iphone-streaming-plus-Finite-State-Machine.git cd iphone-streaming-plus-Finite-State-Machine pip install -r requirements.txt python main.py
flowchart TD
subgraph Mac_Host["Mac Host"]
Engine["Engine\nWorkflowRunner\nKeyword Router\nFSM Executor"]
Gateway["Gateway\naiohttp Server :8080\nWebSocket Signaling\nRTCPeerConnection"]
TTS["Piper TTS (ONNX)"]
STT["Whisper STT"]
LLM["LLM Provider\n(Ollama/Claude/OpenAI)"]
end
subgraph iPhone["iPhone Safari"]
UI["Voice Agent UI\nHold-to-Talk\nWorkflow Debugger"]
end
UI -->|Audio/Mic Input| Gateway
Gateway -->|Audio Data| STT
STT -->|Text| Engine
Engine -->|Query| LLM
LLM -->|Response Text| Engine
Engine -->|Text| TTS
TTS -->|Audio Chunks| Gateway
Gateway -->|WebRTC Audio Track| UI
Engine -->|State Updates| Gateway
Gateway -->|Debug Info| UI
Python backend using aiohttp for signaling and WebRTC peer connections. The core logic includes a keyword router, an FSM executor for complex workflows, and adapters for Whisper (STT), Piper (TTS), and multiple LLM providers (Ollama, Claude, OpenAI). The frontend is a mobile-optimized web UI with hold-to-talk controls and a workflow debugger.
sequenceDiagram
participant User as iPhone User
participant UI as Safari UI
participant GW as Mac Gateway (aiohttp)
participant Eng as Engine (FSM/Router)
participant LLM as LLM Provider
participant TTS as Piper TTS
User->>UI: Hold to Talk (Mic Input)
UI->>GW: Send Audio via WebSocket
GW->>Eng: Forward Audio Data
Eng->>Eng: Whisper STT Transcription
Eng->>Eng: Keyword Router Decision
alt Complex Query
Eng->>LLM: FSM State Prompt (e.g., Initial Lookup)
LLM-->>Eng: Search Query/Reasoning
Eng->>Eng: Execute Tool (Web Search)
Eng->>LLM: Next FSM State (Synthesize)
LLM-->>Eng: Final Answer Text
else Simple Query
Eng->>LLM: Direct Chat Completion
LLM-->>Eng: Response Text
end
Eng->>TTS: Generate Audio from Text
TTS-->>Eng: Audio Chunks (PCM)
Eng->>GW: Stream Audio Chunks
GW->>UI: WebRTC Audio Track
UI->>User: Play Response Audio
Deploy on a Mac connected to the same network as your iPhone. Use it for privacy-focused voice assistance, local LLM experimentation, or as a template for building stateful voice agents with WebRTC audio streaming.