A real-time speech-to-text server and client ecosystem using OpenAI's Whisper with WebSocket streaming.
https://github.com/davidbmar/WhisperLive · public · shipped

WhisperLive is a Python-based server application that exposes a WebSocket API for real-time audio transcription. It supports multiple inference backends (Faster-Whisper, TensorRT, OpenVINO) to optimize performance on different hardware. The project includes browser extensions (Chrome/Firefox) and Python clients that capture microphone or tab audio, preprocess it into 16kHz mono chunks, and stream it to the server for near-instant text output.
bash scripts/setup.sh pip install whisper-live python3 run_server.py --port 9090 --backend faster_whisper
flowchart TD
Client[Browser/Python Client] -->|WebSocket Binary Audio| Server[WhisperLive Server]
Server --> Backend{Inference Backend}
Backend -->|CPU| FW[Faster-Whisper]
Backend -->|GPU NVIDIA| TRT[TensorRT-LLM]
Backend -->|Intel HW| OV[OpenVINO]
FW --> Text[Transcribed Text]
TRT --> Text
OV --> Text
Text -->|WebSocket JSON| Client
The core server is built in Python using `websockets` for communication and `pyaudio`/`numpy` for audio handling. It integrates with ASR engines like `faster-whisper`, `tensorrt_llm`, or `openvino`. Clients are implemented as JavaScript Browser Extensions using the Web Audio API and AudioWorklets for efficient client-side resampling and buffering before sending binary data over WebSockets.
sequenceDiagram
participant C as Client (JS/Python)
participant S as WhisperLive Server
participant B as ASR Backend
C->>S: Connect WebSocket
S-->>C: Connection Established
loop Audio Stream
C->>C: Capture & Resample to 16kHz Mono
C->>S: Send Audio Chunk (Binary)
S->>B: Process Chunk
B-->>S: Return Transcript Segment
S-->>C: Send Transcript (JSON)
end
C->>S: Close Connection
Deploy the server on a machine with GPU acceleration (NVIDIA for TensorRT, Intel for OpenVINO) or CPU (Faster-Whisper). Connect browser extensions to transcribe meetings, lectures, or media playback directly in the browser. Integrate the Python client into desktop applications requiring live captioning.
✓ all on main — nothing unmerged.