RealtimeSTT

A robust, low-latency Python library for realtime speech-to-text with VAD, wake-word activation, and instant transcription.

Kolja Beigel · Since 2023-08-29

Loading score...

GitHub

Detailed Introduction

RealtimeSTT is a speech-to-text library designed for realtime applications, delivering low-latency transcription with high quality. It supports local and GPU-accelerated inference, multiple voice activity detection (VAD) strategies and wake-word activation, making it suitable for voice assistants, live captioning and interactive systems. The project is community-driven and focuses on usability and realtime performance.

Main Features

Low-latency realtime transcription with options for small realtime models and larger final models.
Multiple VAD approaches (WebRTCVAD, SileroVAD) for improved detection in noisy environments.
Optional wake-word support (Porcupine / OpenWakeWord) with callback and event hooks.
Command-line tools and a Python SDK for easy integration into existing applications.

Use Cases

RealtimeSTT fits voice assistants, live meeting captions, realtime voice input, live-stream subtitles, and any interactive systems requiring immediate text feedback. It can run locally to preserve privacy or on GPU-equipped servers for higher-accuracy realtime transcription.

Technical Features

The project combines modern models (e.g., Faster_Whisper) with multi-stage VAD pipelines, supports CUDA acceleration, streaming batch processing, and callback-based APIs. Configuration allows tuning realtime batch sizes, post-speech silence thresholds, and beam search parameters to balance latency and accuracy.

Core Content

Core Content

Technology

Technology

More

More

Feedback

Feedback

More

More

RealtimeSTT

Detailed Introduction

Main Features

Use Cases

Technical Features

Score Breakdown

Related Resources

AngelSlim

AutoSubs

Axolotl