Whisper

Whisper is OpenAI's general-purpose speech recognition model supporting multilingual transcription, translation, and language identification.

Author: OpenAI

Since: 2022-09-16

Visit Website GitHub

Overview

Whisper is a Transformer-based sequence-to-sequence model trained on diverse speech tasks. It enables high-quality multilingual speech recognition, translation, and language identification, and provides both CLI and Python APIs for integration.

Core Features

Multilingual speech recognition and optional translation across multiple model sizes (tiny → large-v3).
CLI and Python interfaces, pre-trained models, model cards, and example notebooks for quick onboarding.
Portable implementation with support across common hardware and environments.

Use Cases

Transcription and subtitle generation, cross-language speech translation, and voice data annotation.
Media processing, meeting summarization, and voice-driven interfaces.

Technical Highlights

Transformer sequence-to-sequence architecture with mel-spectrogram preprocessing and decoding utilities.
MIT licensed, open-source codebase with extensive examples, benchmarks, and community support.

Whisper

Overview

Core Features

Use Cases

Technical Highlights

Resource Info

Related Resources

gpt-oss

OpenAI Cookbook

MLE-bench