whisperlive-salad · davidbmar.com

What it is

A Python-based WebSocket server that streams audio chunks from clients (browser extensions or scripts) to a backend running OpenAI's Whisper model. It supports multiple inference backends including Faster-Whisper, TensorRT, and OpenVINO, enabling low-latency transcription on GPU infrastructure like Salad Cloud.

Features

Real-time transcription via WebSocket streaming
Supports Faster-Whisper, TensorRT, and OpenVINO backends
Browser extensions for Chrome and Firefox for easy capture
Voice Activity Detection (VAD) to reduce processing load
Configurable client limits and connection timeouts
Single-model mode for reduced memory usage and latency

Quickstart

bash scripts/setup.sh
pip install whisper-live
python3 run_server.py --port 9090 --backend faster_whisper

Architecture

flowchart TD
    Client[Browser Extension/Client] -->|WebSocket Audio Chunks| Server[WhisperLive Server]
    Server -->|Load Balance| Backend[Inference Backend]
    Backend -->|Faster-Whisper| FW[Faster-Whisper Engine]
    Backend -->|TensorRT| TRT[TensorRT Engine]
    Backend -->|OpenVINO| OV[OpenVINO Engine]
    FW -->|Text Segments| Server
    TRT -->|Text Segments| Server
    OV -->|Text Segments| Server
    Server -->|Transcribed Text| Client

How it's built

Built with Python using the `websockets` library for communication. It integrates `faster-whisper` for efficient CPU/GPU inference, `tensorrt` for NVIDIA GPU acceleration, and `openvino` for Intel hardware. The client-side logic is implemented as Chrome/Firefox extensions using AudioWorklets to capture and preprocess microphone/tab audio before sending it via WebSocket.

How it runs

sequenceDiagram
    participant C as Client (Extension)
    participant S as WhisperLive Server
    participant B as Inference Backend
    C->>S: Connect WebSocket
    S->>C: Connection Accepted
    loop Audio Stream
        C->>S: Send Audio Chunk (16kHz Mono)
        S->>B: Forward Audio Data
        B->>B: Run Whisper Inference
        B-->>S: Return Transcribed Segment
        S-->>C: Send JSON Result
    end
    C->>S: Close Connection

How to apply & reuse

Deploy the server on a Salad Cloud container with GPU support. Connect browser extensions or custom clients to the server's WebSocket endpoint to transcribe live meetings, lectures, or voice notes in real-time.

At a glance

CapabilitiesLive audio transcriptionFile-based transcriptionMulti-client supportGPU accelerationVoice Activity Detection

Componentsrun_server.pyAudioPreProcessor (JS)Chrome ExtensionFirefox ExtensionWebSocket Handler

TechPythonWebSocketsFaster-WhisperTensorRTOpenVINOJavaScriptAudioWorklet

Depends onPyAudioNumPyOpenAI WhisperCUDA (for TensorRT)Intel OpenVINO Runtime

Integrates withSalad CloudGoogle ChromeMozilla FirefoxNVIDIA GPUsIntel CPUs/GPUs

PatternsClient-ServerStreamingProducer-ConsumerPlugin Architecture

Reuse tagsspeech-to-textreal-timegpu-acceleratedwebsocketwhisper