iphone-webrtc-TURN-speaker-streaming-machost-iphonebrowser

Stream generated audio from a Mac host to an iPhone browser via WebRTC and TURN, featuring a voice agent loop with STT, LLM, and TTS.

https://github.com/davidbmar/iphone-webrtc-TURN-speaker-streaming-machost-iphonebrowser  ·  public  ·  shipped

iphone-webrtc-TURN-speaker-streaming-machost-iphonebrowser screenshot

What it is

A real-time voice agent system that runs on a Mac host and streams audio responses to an iPhone browser. It captures microphone input via WebRTC, transcribes it using Whisper, processes it with an LLM (Ollama, Claude, or OpenAI), synthesizes speech with Piper TTS, and streams the audio back to the iPhone speaker. It includes robust NAT traversal using TURN servers for cellular connectivity.

Features

Quickstart

pip install -r requirements.txt
cp .env.example .env
python3 -m gateway.server
open http://localhost:8080

Architecture

flowchart TD
    subgraph Mac_Host["Mac Host"]
        Engine["Engine\n(Whisper/LLM/Piper)"]
        Gateway["Gateway\n(aiohttp :8080)\nWebSocket + RTCPeerConnection"]
        Engine -->|Audio/Data| Gateway
    end
    subgraph iPhone["iPhone Browser"]
        App["web/app.js\nRTCPeerConnection\ngetUserMedia"]
    end
    Gateway <-->|WebRTC UDP\n(via TURN)| App
    Gateway <-->|WebSocket Signaling| App
    TURN["TURN Server\n(Twilio)"]
    App <-->|Relay| TURN
    TURN <-->|Relay| Gateway

How it's built

The backend is a Python aiohttp server acting as a signaling gateway and media engine. It uses faster-whisper for STT, various LLM APIs for reasoning, and Piper for TTS. The frontend is a TypeScript/JavaScript web app that handles WebRTC peer connections, microphone access, and UI state. Communication between client and server uses WebSocket for signaling and WebRTC data/audio channels for media.

How it runs

sequenceDiagram
    participant Client as iPhone Browser
    participant Server as Mac Gateway
    participant Engine as Voice Engine
    participant LLM as LLM Provider
    
    Client->>Server: WebSocket hello {token}
    Server-->>Client: hello_ack {voices, ice_servers}
    
    Client->>Server: webrtc_offer {sdp}
    Server->>Server: Create RTCPeerConnection
    Server-->>Client: webrtc_answer {sdp}
    
    Note over Client,Server: Media Stream Established
    
    Client->>Server: mic_start (Audio Track)
    Server->>Engine: Buffer Audio
    Engine->>Engine: Whisper STT
    Engine->>LLM: Transcribed Text
    LLM-->>Engine: Reply Text
    Engine->>Engine: Piper TTS
    Engine->>Server: Audio Chunks
    Server->>Client: WebRTC Audio Track
    Client->>Client: Play Speaker

How to apply & reuse

Use this project to build low-latency voice assistants that work on mobile browsers without native apps. It serves as a reference for handling WebRTC audio streaming, TURN relay configuration for cellular networks, and integrating local LLMs like Ollama with real-time speech pipelines.

At a glance

CapabilitiesSpeech-to-TextText-to-SpeechLLM IntegrationWebRTC StreamingNAT Traversal
Componentsaiohttp GatewayWhisper STTPiper TTSLLM AdapterWeb Frontend
TechPythonTypeScriptWebRTCWebSocketaiohttpfaster-whisperPiper
Depends onOllamaTwilio TURNCloudflaredNode.jsPython 3.9+
Integrates withAnthropic ClaudeOpenAI GPTOllamaTavily SearchBrave Search
PatternsSignaling ServerMedia RelayVoice Agent LoopEdge AI
Reuse tagsvoice-assistantwebrtc-audiolocal-llmmobile-browserturn-server

⚠ Needs attention