transcription-end-to-end-docs

Documentation and integration guide for a serverless transcription pipeline using AWS EventBridge, Cognito, and Lambda.

https://github.com/davidbmar/transcription-end-to-end-docs  ·  public  ·  shipped

What it is

A meta-repository containing documentation and setup instructions to integrate three distinct AWS serverless components: a smart transcription router, a Cognito-authenticated S3/CloudFront web server, and an EventBridge-based orchestration layer. It serves as the central nervous system for decoupling frontend, transcription, and search microservices via validated, versioned events.

Features

Quickstart

git clone https://github.com/davidbmar/smart-transcription-router.git
git clone https://github.com/davidbmar/cognito-lambda-s3-webserver-cloudfront.git
git clone https://github.com/davidbmar/eventbridge-orchestrator.git
cd eventbridge-orchestrator
# Run step-000 to setup .env file
./step-000.sh
# Run subsequent step-xxx scripts as documented in that repo

Architecture

flowchart TD
    User[User] -->|Upload Audio| CF[CloudFront]
    CF -->|Auth & Serve| Cognito[Cognito]
    CF -->|Store| S3[S3 Bucket]
    S3 -->|Trigger| Router[Smart Transcription Router Lambda]
    Router -->|Publish Event| EB[EventBridge Bus]
    EB -->|Validate Schema| SR[Schema Registry]
    EB -->|Route| Transcriber[Transcription Service]
    EB -->|Route| Search[Search Service]
    EB -->|Log| Logger[Event Logger Lambda]
    EB -->|Fail| DLQ[SQS Dead-Letter Queue]
    DLQ -->|Process| DLQProcessor[DLQ Processor Lambda]

How it's built

The system is built using Terraform for infrastructure provisioning, AWS EventBridge for event routing and schema validation, AWS Lambda for compute (including logging and DLQ processing), Amazon S3 for storage, Amazon CloudFront for distribution, and Amazon Cognito for authentication. JSON schemas are registered in the EventBridge Schema Registry to ensure type safety.

How it runs

sequenceDiagram
    participant U as User
    participant S3 as S3 Bucket
    participant R as Transcription Router
    participant EB as EventBridge Bus
    participant T as Transcription Service
    participant L as Event Logger
    U->>S3: Upload Audio File
    S3->>R: Trigger Notification
    R->>EB: Publish AudioUploaded Event
    EB->>L: Forward Event for Logging
    L-->>EB: Acknowledge
    EB->>T: Route Event based on Rule
    T->>T: Process Transcription
    T->>EB: Publish TranscriptionCompleted Event
    EB->>L: Forward Completion Event

How to apply & reuse

Clone the three dependent repositories into a root directory. Execute the step-by-step setup scripts provided in the eventbridge-orchestrator repository, starting with environment configuration (.env) and proceeding through infrastructure deployment. Ensure an S3 bucket for audio storage is available and configured across the projects.

At a glance

CapabilitiesEvent-driven architectureSchema validationInfrastructure as CodeServerless computeSecure content deliveryError handling via DLQ
ComponentsSmart Transcription RouterCognito/Lambda/S3/CloudFront WebserverEventBridge OrchestratorEvent Logger LambdaDLQ Processor LambdaTerraform Modules
TechAWS EventBridgeAWS LambdaAmazon S3Amazon CloudFrontAmazon CognitoAmazon SQSTerraformJSON Schema
Depends onsmart-transcription-routercognito-lambda-s3-webserver-cloudfronteventbridge-orchestrator
Integrates withAWS Schema RegistryExternal Transcription APIsSearch Indexing Services
PatternsEvent-Driven ArchitecturePub/SubDead-Letter QueueInfrastructure as CodeMicroservices
Reuse tagsserverlessawseventbridgeterraformtranscriptionorchestration

Repo hygiene

✓ all on main — nothing unmerged.