Whisper

Whisper is OpenAI's general-purpose speech recognition model supporting multilingual transcription, translation, and language identification.

OpenAI · Since 2022-09-16

Loading score...

GitHub Website

Overview

Whisper is a Transformer-based sequence-to-sequence model trained on diverse speech tasks. It enables high-quality multilingual speech recognition, translation, and language identification, and provides both CLI and Python APIs for integration.

Core Features

Multilingual speech recognition and optional translation across multiple model sizes (tiny → large-v3).
CLI and Python interfaces, pre-trained models, model cards, and example notebooks for quick onboarding.
Portable implementation with support across common hardware and environments.

Use Cases

Transcription and subtitle generation, cross-language speech translation, and voice data annotation.
Media processing, meeting summarization, and voice-driven interfaces.

Technical Highlights

Transformer sequence-to-sequence architecture with mel-spectrogram preprocessing and decoding utilities.
MIT licensed, open-source codebase with extensive examples, benchmarks, and community support.

Core Content

Core Content

Technology

Technology

More

More

AI Infrastructure

AI Infrastructure

Explore

Explore

Connect

Connect

Quick Links

Quick Links

LinkedIn

LinkedIn

Follow on X

Follow on X

Whisper

Overview

Core Features

Use Cases

Technical Highlights

Score Breakdown

Related Resources

Codex

gpt-oss

OpenAI Agents (Python)