whisper.cpp

A high-performance local implementation of OpenAI's Whisper for on-device speech recognition, with broad platform and backend support.

Author: ggml-org

Since: 2022-09-25

GitHub

Introduction

whisper.cpp is a lightweight C/C++ reimplementation of OpenAI’s Whisper focused on efficient on-device inference. It runs across a wide range of platforms (from Raspberry Pi to Apple Silicon) and supports multiple acceleration backends.

Key Features

Pure C/C++ implementation with minimal runtime dependencies for easy integration.
Multiple acceleration backends (Vulkan, CUDA, Core ML, OpenVINO, Moore Threads) and quantized model support to reduce memory usage.
Rich examples (CLI, stream, wasm, bench, server) and language bindings (Rust, JS, Java, etc.).

Use Cases

Local speech-to-text and offline voice assistants for privacy-sensitive applications.
ASR on resource-constrained devices or large-scale offline batch transcription.
Research and engineering experiments: benchmarking, quantization studies, and backend comparisons.

Technical Highlights

Uses ggml-format model weights with integer quantization (Q5/Q4 variants) and mixed precision to trade off quality vs. memory/performance.
Provides a C-style API and many bindings, Docker/CMake build flows, and prebuilt artifacts (XCFramework) for easy adoption.
MIT license, actively maintained community with extensive platform support and CI.

whisper.cpp

Introduction

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

ggml

llama.cpp

Pixeltable