transcription-sqs-spot-s3

Scalable AWS-based audio transcription system using SQS queues, EC2 Spot instances, and WhisperX/Voxtral models.

https://github.com/davidbmar/transcription-sqs-spot-s3  ·  public  ·  shipped

What it is

A production-ready infrastructure-as-code solution for transcribing audio files stored in S3. It uses an SQS queue to manage job distribution to EC2 Spot instances (GPU or CPU) running containerized or native workers. The system supports automatic scaling, cost optimization via Spot instances, and dead-letter queue handling for failed jobs.

Features

Quickstart

git clone https://github.com/davidbmar/transcription-sqs-spot-s3.git
cd transcription-sqs-spot-s3
./scripts/step-000-setup-configuration.sh
./scripts/step-010-setup-iam-permissions.sh
./scripts/step-020-create-sqs-resources.sh
./scripts/step-060-choose-deployment-path.sh

Architecture

flowchart TD
    User[User/Application] -->|Upload Audio| S3In[(S3 Input Bucket)]
    User -->|Send Job Metadata| SQS[SQS Queue]
    SQS -->|Poll Jobs| Worker[EC2 Spot Worker]
    Worker -->|Download Audio| S3In
    Worker -->|Process| Model[WhisperX/Voxtral Model]
    Worker -->|Upload Transcript| S3Out[(S3 Output Bucket)]
    SQS -->|Failed Messages| DLQ[SQS Dead Letter Queue]
    subgraph AWS Cloud
        S3In
        SQS
        Worker
        S3Out
        DLQ
    end

How it's built

The system is orchestrated via Bash scripts that configure AWS resources (IAM, SQS, ECR) and deploy workers. Workers are Python applications using FastAPI, PyTorch, and Hugging Face Transformers (Whisper or Voxtral). Deployment supports two paths: traditional EC2 user-data installation or Docker containers pushed to Amazon ECR.

How it runs

sequenceDiagram
    participant Client as Client App
    participant S3 as S3 Bucket
    participant SQS as SQS Queue
    participant Worker as EC2 Worker
    participant Model as AI Model

    Client->>S3: Upload Audio File
    Client->>SQS: Send Message (S3 Path)
    loop Polling
        Worker->>SQS: Receive Message
    end
    Worker->>S3: Download Audio File
    Worker->>Model: Transcribe Audio
    Model-->>Worker: Return Text
    Worker->>S3: Upload Transcript JSON
    Worker->>SQS: Delete Message

How to apply & reuse

Use this project to build a cost-effective, scalable transcription backend for applications requiring high-volume audio processing. It is suitable for podcast transcription, meeting notes, or media archival where latency is secondary to cost and reliability.

At a glance

CapabilitiesAudio-to-text transcriptionBatch job processingCloud infrastructure automationGPU-accelerated inferenceCost-optimized computing
ComponentsBash Setup ScriptsPython FastAPI WorkersDockerfilesSQS Queue ConfigurationIAM Policy Definitions
TechPythonBashAWS EC2AWS SQSAWS S3DockerPyTorchFastAPIWhisperXVoxtral
Depends onAWS CLIDocker EngineGitPython 3.8+NVIDIA Drivers (for GPU path)
Integrates withAmazon ECRAmazon CloudWatchHugging Face Hub
PatternsWorker Queue PatternSpot Instance OptimizationContainerized DeploymentInfrastructure as Code (Scripted)
Reuse tagsawstranscriptionsqSspot-instanceswhisperdockergpu

⚠ Needs attention