This project is a high-performance, browser-based text-to-speech system that runs entirely on the client. It utilizes VITS neural TTS models compiled to WebAssembly via ONNX Runtime, eliminating the need for server-side API calls. It features streaming playback where audio begins as soon as the first sentence is synthesized, multi-core processing via WebAssembly threads, and offline support through browser caching.
Features
100% client-side processing using WebAssembly and ONNX
Streaming playback: audio starts while remaining text generates
Multi-core parallel generation using WebAssembly threads
The application is built with React 18 and TypeScript, bundled with Vite. The core inference engine relies on @diffusionstudio/vits-web, which wraps ONNX Runtime Web. Audio playback is managed via standard HTML5 Audio APIs, while state management for chunk generation and playback status is handled by a custom TTSEngine class and the useTTS React hook. UI components are styled with Tailwind CSS and shadcn/ui.
How it runs
sequenceDiagram
participant U as User
participant H as useTTS Hook
participant E as TTSEngine
participant W as VITS WASM
participant A as Audio Element
U->>H: speak(text)
H->>E: processText(text)
E->>E: splitIntoSentences()
loop For each sentence
E->>W: generateAudio(sentence)
W->>W: ONNX Inference
W-->>E: return Audio Blob
E->>E: updateChunkStatus(ready)
end
E->>A: play(firstChunk)
A-->>H: onplay/onended events
H-->>U: update UI state
How to apply & reuse
Integrate the useTTS hook into any React component to enable local voice synthesis. It is suitable for privacy-sensitive applications requiring offline capability, accessibility tools needing natural-sounding voices without cloud dependencies, or educational platforms wanting to reduce API costs. Developers can customize voice IDs, speed, and concurrency limits via the hook's options.
At a glance
CapabilitiesNeural TTS InferenceWebAssembly ExecutionStreaming Audio PlaybackOffline Model CachingMulti-threaded Processing
ComponentsuseTTS HookTTSEngine ClassVITS WebAssembly ModuleSentence SplitterAudio Player Manager