whisper-runpod · davidbmar.com

What it is

A Docker-based solution that deploys a Faster Whisper speech-to-text API server, designed for easy deployment on RunPod.io. It includes utility scripts to fetch audio files from AWS S3 and submit them for transcription via the local API endpoint.

Features

Runs Faster Whisper model for efficient speech-to-text conversion
Exposes OpenAI-compatible transcription API on port 8000
Includes shell scripts for automated S3 audio retrieval and processing
Designed for seamless deployment on RunPod serverless GPU infrastructure
Monitors and restarts the transcription service via entrypoint logic

Quickstart

docker build -t yourusername/whisper-runpod:latest .
docker push yourusername/whisper-runpod:latest
docker run -p 8000:8000 yourusername/whisper-runpod:latest
curl http://localhost:8000/v1/audio/transcriptions -F "file=@your-audio-file.mp3" -F "language=en"

Architecture

flowchart TD
    User[User/Client] -->|HTTP POST| API[Faster Whisper API :8000]
    Script[S3 Transcribe Script] -->|aws s3 cp| S3[(AWS S3 Bucket)]
    S3 -->|Audio File| Script
    Script -->|curl POST| API
    API -->|Process| Model[Faster Whisper Engine]
    Model -->|Text Output| API
    API -->|JSON Response| User
    API -->|JSON Response| Script
    Entrypoint[entrypoint.sh] -->|Manages| API

How it's built

The project uses a Dockerfile to build an image containing the Faster Whisper engine and its dependencies. An entrypoint script manages the server lifecycle, ensuring the API remains active. Shell scripts handle external interactions, specifically downloading assets from S3 using AWS CLI and sending HTTP requests to the transcription endpoint.

How it runs

sequenceDiagram
    participant U as User/Script
    participant S as S3 Bucket
    participant A as API Server (Port 8000)
    participant M as Faster Whisper Model
    
    alt S3 Workflow
        U->>S: aws s3 cp (Download Audio)
        S-->>U: Return Audio File
        U->>A: POST /v1/audio/transcriptions (File + Language)
    else Direct Workflow
        U->>A: POST /v1/audio/transcriptions (Local File)
    end
    
    A->>M: Process Audio Stream
    M-->>A: Return Transcribed Text
    A-->>U: JSON Response with Text

How to apply & reuse

Deploy the built Docker image to a RunPod instance or run it locally. Use the provided shell scripts to automate the workflow of fetching remote audio files from S3 buckets and posting them to the running container's API for text extraction.

At a glance

CapabilitiesSpeech-to-Text TranscriptionS3 IntegrationAPI Service HostingContainerized Deployment

Componentsentrypoint.shtest_transcribe_by_fasterWhisperAPI_fromS3.shDockerfile

TechDockerShell ScriptingFaster WhisperPythonAWS CLI

Depends onDocker RuntimeAWS CredentialsRunPod Account (Optional)

Integrates withAWS S3RunPod.ioOpenAI-compatible Clients

PatternsMicroserviceWorker PatternSidecar Scripting

Reuse tagsspeech-to-textwhisperrunpoddockers3transcription