A web-based audio recording client that streams sound to a Python Flask server for real-time transcription and processing.
https://github.com/davidbmar/audio_client_server · public · shipped
This project implements a client-server architecture where a web browser captures audio input via the MediaRecorder API and sends it to a backend server. The server, built with Flask, receives the audio stream, optionally stores it (e.g., in S3), and performs real-time speech-to-text transcription. It includes authentication mechanisms using Auth0 for protected endpoints and supports both lightweight local processing and cloud-integrated workflows.
flowchart TD
Client[Web Browser] -->|Audio Stream| Server[Flask Server]
Server -->|Validate Token| Auth0[Auth0 Service]
Server -->|Transcribe| STT[Speech Engine]
Server -->|Store| S3[(S3 Bucket)]
Server -->|Response| Client
The backend is a Python Flask application structured with blueprints for modularity. It uses PyJWT for validating Auth0 access tokens and guards routes based on permissions. The frontend likely uses standard HTML5 JavaScript APIs for audio capture. The system supports dependency injection for services like message handling and security validation.
sequenceDiagram
participant U as User
participant B as Browser
participant F as Flask Server
participant A as Auth0
participant T as Transcriber
U->>B: Start Recording
B->>F: POST Audio Chunk
F->>A: Validate JWT
A-->>F: Valid Token
F->>T: Process Audio
T-->>F: Text Result
F-->>B: Return Transcription
B-->>U: Display Text
Deploy this system to create secure, voice-enabled web applications. Use cases include voice notes, real-time meeting transcription, or accessibility tools where users speak into their browser and receive immediate text feedback. The Auth0 integration makes it suitable for multi-user environments requiring role-based access control.