Open Source Sound Generator LLMs

What it is

A full-stack application consisting of a React frontend and FastAPI backend that allows users to generate, compare, and benchmark sound effects, music, and speech models. It features real-time Server-Sent Events (SSE) for monitoring generation steps, an LRU cache for instant repeat results, and automated hardware detection to optimize for MPS (Apple Silicon), CUDA, or CPU.

Features

Supports 9 models across sound effects, music, and speech categories
Real-time SSE streaming for generation steps and model loading progress
Automated benchmarking script with Real-Time Factor (RTF) metrics
LRU generation cache for instant retrieval of identical prompts
Apple Silicon MPS optimization with float32 precision and pre-warming
Interactive waveform visualization and side-by-side comparison table

Quickstart

git clone https://github.com/davidbmar/opensource_sound_generator_llms.git
cd opensource_sound_generator_llms
make install
make start

Architecture

flowchart TD
    subgraph Client["Browser (React/Vite :5173)"]
        UI["UI Components\n(Model Selector, Prompt Input)"]
        Player["Audio Player & Waveform"]
        Table["Compare Table"]
    end

    subgraph Server["Python Backend (FastAPI :8000)"]
        API["API Routes\n(/api/generate-stream, /api/models)"]
        MM["ModelManager\n(Asyncio Lock, One-at-a-time)"]
        Cache["GenerationCache\n(LRU Store)"]
        Gen["Generators\n(AudioLDM, MusicGen, Bark, etc.)"]
        Utils["Audio Utils\n(Numpy to WAV)"]
    end

    UI -->|HTTP/SSE| API
    API -->|Manage Lifecycle| MM
    API -->|Check/Store| Cache
    MM -->|Load/Unload| Gen
    Gen -->|Raw Audio| Utils
    Utils -->|WAV Bytes| Cache
    Cache -->|WAV Bytes| API
    API -->|SSE Progress| UI
    API -->|WAV Binary| Player
    Player -->|Display Data| Table

How it's built

The backend uses FastAPI with PyTorch, Diffusers, and Transformers libraries to manage model loading and inference. It employs an asyncio lock to ensure only one model resides in memory at a time, using gc.collect() for cleanup. The frontend is built with React and Vite, consuming SSE streams for progress updates and fetching binary WAV data separately to avoid proxy limitations. Audio processing relies on NumPy and SoundFile for format conversion.

How it runs

sequenceDiagram
    participant User as Browser UI
    participant API as FastAPI Endpoint
    participant Manager as ModelManager
    participant Model as Loaded Model
    participant Cache as GenerationCache

    User->>API: POST /api/generate-stream
    API->>Cache: Check cache key
    alt Cache Hit
        Cache-->>API: Return cached WAV bytes
        API-->>User: SSE: complete (instant)
    else Cache Miss
        API->>Manager: Acquire lock & get model
        Manager->>Model: Ensure loaded (warm-up if needed)
        loop Inference Steps
            Model-->>API: Yield progress (step N/total)
            API-->>User: SSE: progress event
        end
        Model-->>API: Return raw audio array
        API->>Cache: Store WAV bytes
        API-->>User: SSE: complete (audio_id)
        User->>API: GET /api/audio/{id}
        API-->>User: Binary WAV data
    end

How to apply & reuse

Use this project to evaluate the quality, speed, and resource consumption of different open-source audio models before integrating them into production pipelines. It serves as a reference implementation for handling large model swaps, streaming inference progress, and managing GPU/MPS memory constraints in local AI applications.

At a glance

CapabilitiesText-to-Audio GenerationText-to-Music GenerationText-to-Speech SynthesisReal-time Progress StreamingAutomated Performance BenchmarkingLocal Model Management

ComponentsFastAPI BackendReact FrontendModelManager ServiceGenerationCache StoreBenchmark ScriptAudio Utilities Module

TechPython 3.13+FastAPIReactVitePyTorchDiffusersTransformersNumPySoundFile

Depends onNode.js 20+PyTorchHugging Face HubFFmpeg (implicit via soundfile)Make

Integrates withHugging Face Model HubApple Metal Performance Shaders (MPS)NVIDIA CUDA

PatternsServer-Sent Events (SSE)LRU CachingSingleton Model LoadingLazy InitializationResource Locking

Reuse tagsaudio-generationllm-benchmarkingfastapi-reactmps-optimizationsse-streaminglocal-ai

⚠ Needs attention

unmerged_branch: dependabot/npm_and_yarn/frontend/npm_and_yarn-790455a2e9 is 1 commit ahead of the default branch
unmerged_branch: dependabot/npm_and_yarn/frontend/rollup-4.61.1 is 1 commit ahead of the default branch
open_pr: PR #2: Bump rollup from 4.57.1 to 4.61.1 in /frontend
open_pr: PR #1: Bump the npm_and_yarn group across 1 directory with 4 updates