Stream generated audio from a Mac host to an iPhone browser via WebRTC and TURN, featuring a voice agent loop with STT, LLM, and TTS.
https://github.com/davidbmar/iphone-webrtc-TURN-speaker-streaming-machost-iphonebrowser · public · shipped

A real-time voice agent system that runs on a Mac host and streams audio responses to an iPhone browser. It captures microphone input via WebRTC, transcribes it using Whisper, processes it with an LLM (Ollama, Claude, or OpenAI), synthesizes speech with Piper TTS, and streams the audio back to the iPhone speaker. It includes robust NAT traversal using TURN servers for cellular connectivity.
pip install -r requirements.txt cp .env.example .env python3 -m gateway.server open http://localhost:8080
flowchart TD
subgraph Mac_Host["Mac Host"]
Engine["Engine\n(Whisper/LLM/Piper)"]
Gateway["Gateway\n(aiohttp :8080)\nWebSocket + RTCPeerConnection"]
Engine -->|Audio/Data| Gateway
end
subgraph iPhone["iPhone Browser"]
App["web/app.js\nRTCPeerConnection\ngetUserMedia"]
end
Gateway <-->|WebRTC UDP\n(via TURN)| App
Gateway <-->|WebSocket Signaling| App
TURN["TURN Server\n(Twilio)"]
App <-->|Relay| TURN
TURN <-->|Relay| Gateway
The backend is a Python aiohttp server acting as a signaling gateway and media engine. It uses faster-whisper for STT, various LLM APIs for reasoning, and Piper for TTS. The frontend is a TypeScript/JavaScript web app that handles WebRTC peer connections, microphone access, and UI state. Communication between client and server uses WebSocket for signaling and WebRTC data/audio channels for media.
sequenceDiagram
participant Client as iPhone Browser
participant Server as Mac Gateway
participant Engine as Voice Engine
participant LLM as LLM Provider
Client->>Server: WebSocket hello {token}
Server-->>Client: hello_ack {voices, ice_servers}
Client->>Server: webrtc_offer {sdp}
Server->>Server: Create RTCPeerConnection
Server-->>Client: webrtc_answer {sdp}
Note over Client,Server: Media Stream Established
Client->>Server: mic_start (Audio Track)
Server->>Engine: Buffer Audio
Engine->>Engine: Whisper STT
Engine->>LLM: Transcribed Text
LLM-->>Engine: Reply Text
Engine->>Engine: Piper TTS
Engine->>Server: Audio Chunks
Server->>Client: WebRTC Audio Track
Client->>Client: Play Speaker
Use this project to build low-latency voice assistants that work on mobile browsers without native apps. It serves as a reference for handling WebRTC audio streaming, TURN relay configuration for cellular networks, and integrating local LLMs like Ollama with real-time speech pipelines.