whisperlive-runpod

Deployable real-time speech-to-text service using OpenAI Whisper on RunPod GPU infrastructure.

https://github.com/davidbmar/whisperlive-runpod  ·  public  ·  shipped

What it is

A deployment wrapper and configuration toolkit for running Collabora's WhisperLive on RunPod. It provides scripts to build optimized Docker images (slim or full with diarization), push them to a registry, and deploy them to RunPod GPU instances. The system exposes a WebSocket API for real-time audio transcription and HTTP endpoints for health monitoring.

Features

Quickstart

./scripts/000-questions.sh
./scripts/200-build-image-local.sh --slim
export DOCKER_PASSWORD='your-docker-hub-token'
./scripts/205-push-to-registry.sh --slim
./scripts/210-deploy-to-runpod.sh
./scripts/215-test-runpod-health.sh

Architecture

flowchart TD
    Client[Client App] -->|WSS:443| Proxy[RunPod Proxy]
    Proxy -->|WS:9090| Server[WhisperLive Server]
    Server -->|Load Model| Whisper[OpenAI Whisper]
    Server -->|GPU Compute| GPU[NVIDIA GPU]
    Monitor[Monitoring System] -->|HTTP:9999| Health[Health Check Service]
    Health -->|Query Status| Server
    Health -->|nvidia-smi| GPU

How it's built

Python-based server wrapping the WhisperLive library, containerized with Docker. It uses shell scripts for infrastructure-as-code style deployment (building, pushing, deploying via RunPod API). The server supports multiple backends (faster_whisper, tensorrt) and includes a separate health-check microservice running on port 9999.

How it runs

sequenceDiagram
    participant C as Client
    participant P as RunPod Proxy
    participant S as WhisperLive Server
    participant H as Health Service
    participant W as Whisper Model

    Note over C, H: Deployment Phase
    H->>S: Check readiness
    S->>W: Load Model (small.en)
    W-->>S: Model Loaded
    S-->>H: Ready

    Note over C, W: Transcription Phase
    C->>P: Connect WebSocket (wss://...)
    P->>S: Forward Connection
    S-->>C: Connection Accepted
    loop Audio Stream
        C->>S: Send Audio Chunk (100ms)
        S->>W: Transcribe Chunk
        W-->>S: Text Result
        S-->>C: Send Transcription
    end

How to apply & reuse

Use this project to spin up a scalable, cost-effective transcription endpoint on cloud GPUs without managing complex Kubernetes clusters. Ideal for applications requiring low-latency speech-to-text where you want to pay only for active GPU usage via RunPod's serverless or pod model.

At a glance

CapabilitiesReal-time transcriptionSpeaker diarization (full image)Multi-language supportHealth monitoringGPU acceleration
Componentsrun_server.pyrun_client.pyhealthcheck.pydeployment scriptsDockerfile
TechPythonDockerWebSocketsRunPod APIWhisperFaster-Whisper
Depends onRunPod AccountDocker Hub AccountNVIDIA GPUCollabora WhisperLive
Integrates withRunPod PlatformDocker RegistryWebSocket Clients
PatternsMicroservicesContainerizationInfrastructure as CodeHealth Check Pattern
Reuse tagsspeech-to-textgpu-deploymentrunpodwhisperreal-time

Repo hygiene

✓ all on main — nothing unmerged.