Hybrid AWS architecture routing audio transcription between real-time GPU FastAPI and batch SQS processing based on server health.
https://github.com/davidbmar/smart-transcription-router · public · shipped

A serverless routing layer that intelligently directs audio transcription requests. It checks the availability of a high-performance GPU-powered FastAPI server; if available, it routes requests directly for low-latency results. If the server is down or busy, it falls back to an SQS queue for deferred batch processing, optimizing for both cost and reliability.
./scripts/step-000-setup-configuration.sh ./scripts/step-001-validate-configuration.sh ./scripts/step-010-setup-iam-permissions.sh ./scripts/step-011-validate-iam-permissions.sh ./scripts/step-020-create-sqs-resources.sh ./scripts/step-021-validate-sqs-resources.sh ./scripts/step-340-deploy-lambda-router.sh ./scripts/step-341-configure-eventbridge-trigger.sh ./scripts/step-342-test-lambda-router.sh
flowchart TD
A[Audio Upload] -->|EventBridge| B(Lambda Router)
B -->|Health Check| C{FastAPI Healthy?}
C -->|Yes| D[FastAPI Server GPU]
C -->|No| E[SQS Queue]
E -->|Scheduled Trigger| F[Batch Worker GPU]
D --> G[(S3 Storage)]
F --> G
Built using AWS Lambda (Python/Shell) for the routing logic, EventBridge for event ingestion, and SQS for queuing. The compute layer consists of Dockerized FastAPI servers running WhisperX or Voxtral models on GPU instances. Infrastructure is managed via shell scripts interacting with AWS CLI.
sequenceDiagram
participant User as Audio Source
participant EB as EventBridge
participant Lambda as Lambda Router
participant API as FastAPI Server
participant SQS as SQS Queue
participant Worker as Batch Worker
User->>EB: Upload Audio File
EB->>Lambda: Trigger Event
Lambda->>Lambda: Check Idempotency
alt Already Transcribed
Lambda-->>User: Skip Processing
else Not Transcribed
Lambda->>API: Health Check / Transcribe Request
alt Server Healthy
API->>API: Process with WhisperX/Voxtral
API-->>Lambda: Return Transcript
Lambda-->>User: Success
else Server Unhealthy/Fail
Lambda->>Lambda: Retry with Backoff
alt Retries Exhausted
Lambda->>SQS: Send Message
SQS-->>Lambda: Acknowledge
Note over Worker: Scheduled Trigger
Worker->>SQS: Receive Message
Worker->>Worker: Spin up GPU & Process
Worker->>SQS: Delete Message
end
end
end
Deploy the core SQS-only router first to establish reliable batch processing. Optionally add the FastAPI GPU instances for real-time capabilities. Configure environment variables for AWS region, ECR URIs, and SQS queue URLs, then run the provided setup scripts in sequence.
✓ all on main — nothing unmerged.