A lightweight Node.js HTTP server that proxies audio files to the OpenAI Whisper API for transcription.
https://github.com/davidbmar/openai_transcribe · public · shipped
This project provides two distinct Node.js server implementations for transcribing audio using OpenAI's Whisper model. It acts as a backend proxy, accepting audio data via HTTP POST requests and forwarding it to the OpenAI API. The 'turn_based' version handles discrete audio file uploads, while the 'streaming_based' version attempts to handle continuous audio streams by buffering chunks before sending them for transcription.
npm install export OPENAI_API_KEY=your_api_key node turn_based_transcribe/server.js
flowchart TD
Client[Web Client] -->|POST /audio| Server[Node.js HTTP Server]
Server -->|Check Env| Env[OPENAI_API_KEY]
Server -->|POST FormData| OpenAI[OpenAI Whisper API]
OpenAI -->|JSON Transcript| Server
Server -->|JSON Response| Client
The application is built using native Node.js modules (http, fs, path) and minimal dependencies like axios and form-data for the turn-based variant. It relies on environment variables for configuration, specifically requiring an OPENAI_API_KEY. The servers implement basic CORS headers to allow cross-origin requests from web clients.
sequenceDiagram
participant C as Client
participant S as Node.js Server
participant O as OpenAI API
C->>S: POST /audio (Audio Data)
S->>S: Validate OPENAI_API_KEY
S->>S: Construct FormData
S->>O: POST /v1/audio/transcriptions
O-->>S: JSON { text: "..." }
S-->>C: 200 OK (Transcription Text)
Use this project as a simple backend service to add speech-to-text capabilities to web applications without exposing your OpenAI API key to the client-side code. It is suitable for prototypes or internal tools where a full-featured media processing pipeline is not required.
✓ all on main — nothing unmerged.