Smart Transcription Router

What it is

A serverless routing layer that intelligently directs audio transcription requests. It checks the availability of a high-performance GPU-powered FastAPI server; if available, it routes requests directly for low-latency results. If the server is down or busy, it falls back to an SQS queue for deferred batch processing, optimizing for both cost and reliability.

Features

Intelligent routing between real-time HTTP and batch SQS based on health checks
Exponential backoff retry logic (1s, 2s, 4s) for transient failures
Idempotent processing to skip already transcribed chunks
Automatic session-level transcript combination upon chunk completion
Cost optimization by auto-terminating GPU instances when idle

Quickstart

./scripts/step-000-setup-configuration.sh
./scripts/step-001-validate-configuration.sh
./scripts/step-010-setup-iam-permissions.sh
./scripts/step-011-validate-iam-permissions.sh
./scripts/step-020-create-sqs-resources.sh
./scripts/step-021-validate-sqs-resources.sh
./scripts/step-340-deploy-lambda-router.sh
./scripts/step-341-configure-eventbridge-trigger.sh
./scripts/step-342-test-lambda-router.sh

Architecture

flowchart TD
    A[Audio Upload] -->|EventBridge| B(Lambda Router)
    B -->|Health Check| C{FastAPI Healthy?}
    C -->|Yes| D[FastAPI Server GPU]
    C -->|No| E[SQS Queue]
    E -->|Scheduled Trigger| F[Batch Worker GPU]
    D --> G[(S3 Storage)]
    F --> G

How it's built

Built using AWS Lambda (Python/Shell) for the routing logic, EventBridge for event ingestion, and SQS for queuing. The compute layer consists of Dockerized FastAPI servers running WhisperX or Voxtral models on GPU instances. Infrastructure is managed via shell scripts interacting with AWS CLI.

How it runs

sequenceDiagram
    participant User as Audio Source
    participant EB as EventBridge
    participant Lambda as Lambda Router
    participant API as FastAPI Server
    participant SQS as SQS Queue
    participant Worker as Batch Worker
    
    User->>EB: Upload Audio File
    EB->>Lambda: Trigger Event
    Lambda->>Lambda: Check Idempotency
    alt Already Transcribed
        Lambda-->>User: Skip Processing
    else Not Transcribed
        Lambda->>API: Health Check / Transcribe Request
        alt Server Healthy
            API->>API: Process with WhisperX/Voxtral
            API-->>Lambda: Return Transcript
            Lambda-->>User: Success
        else Server Unhealthy/Fail
            Lambda->>Lambda: Retry with Backoff
            alt Retries Exhausted
                Lambda->>SQS: Send Message
                SQS-->>Lambda: Acknowledge
                Note over Worker: Scheduled Trigger
                Worker->>SQS: Receive Message
                Worker->>Worker: Spin up GPU & Process
                Worker->>SQS: Delete Message
            end
        end
    end

How to apply & reuse

Deploy the core SQS-only router first to establish reliable batch processing. Optionally add the FastAPI GPU instances for real-time capabilities. Configure environment variables for AWS region, ECR URIs, and SQS queue URLs, then run the provided setup scripts in sequence.

At a glance

CapabilitiesReal-time transcriptionBatch processingHealth-based routingRetry managementSession assembly

ComponentsLambda RouterFastAPI ServerSQS QueueBatch WorkerEventBridge Trigger

TechPythonShellFastAPIAWS LambdaDockerWhisperXVoxtral

Depends onAWS CLIDockerGPU InstancesS3 BucketECR Repository

Integrates withAmazon S3Amazon SQSAmazon EventBridgeAmazon CloudWatch

PatternsCircuit BreakerDead Letter QueueBatch ProcessingServerless RoutingIdempotency Key

Reuse tagsaws-serverlesshybrid-cloudaudio-processinggpu-optimizationevent-driven