A Python micro-service that processes transcribed audio messages from AWS SQS and compiles them into a sorted CSV script.
https://github.com/davidbmar/Audio2ScriptViewer · public · shipped
Audio2ScriptViewer is a backend utility designed to consume transcription outputs, likely from an upstream speech-to-text service. It listens to an AWS SQS queue for JSON messages containing filenames and transcribed text, cleans the text formatting, sorts entries by a numeric key extracted from the filename, and appends them to a local CSV file. It operates as part of a larger media processing pipeline but can function independently as a worker service.
terraform init terraform apply python3 audio2Script.py --run-once
flowchart TD
A[Upstream Transcriber] -->|JSON Message| B(AWS SQS Queue)
B -->|Poll| C[Audio2ScriptViewer]
C -->|Process & Clean| D[Local Storage]
D -->|Write| E[(output.csv)]
subgraph Infrastructure
B
C
end
The core logic is implemented in Python using the boto3 library for AWS interactions. It uses argparse for CLI configuration, allowing it to run in a continuous loop or a single-pass mode. Text cleaning is handled via regular expressions to normalize newlines and carriage returns. The infrastructure is defined using Terraform, suggesting an AWS-native deployment model involving SQS queues and potentially S3 or EC2 for hosting the script.
sequenceDiagram
participant T as Transcriber
participant Q as AWS SQS
participant S as Audio2ScriptViewer
participant F as Local CSV File
T->>Q: Send Message (filename, text)
loop Every X Seconds or Once
S->>Q: Receive Messages
Q-->>S: Return Message Batch
S->>S: Parse JSON & Extract Key
S->>S: Clean Text (regex)
S->>S: Sort Messages by Key
S->>F: Append Rows to CSV
end
This tool is applicable in media production workflows where audio files are transcribed asynchronously. It serves as the aggregation layer that converts scattered transcription events into a unified, chronological script document (CSV) for review or further processing by editors or downstream NLP tasks.
✓ all on main — nothing unmerged.