MockingBird

An open-source voice cloning and real-time speech generation toolkit that can clone a speaker from a short sample and synthesize arbitrary speech.

Author: babysor

Since: 2021-08-07

Visit Website

Detailed Introduction

MockingBird is an open-source voice cloning and real-time speech generation toolkit designed for research and engineering use. It aims to clone a target speaker’s voice from very short audio samples (e.g., 5 seconds) and synthesize arbitrary text in that voice. Implemented in PyTorch, MockingBird includes preprocessing, training, and inference components, a demo toolbox, and optional web server interfaces for quick experimentation and local deployment.

Main Features

Fast cloning: build a speaker representation from short audio clips and produce synthetic speech in a similar timbre.
Cross-platform and hardware tested: runs on Windows and Linux, with notes for running on Apple M1; tested with GPUs like Tesla T4 and GTX series.
Tooling and demos: includes training scripts, a demo toolbox, and a web.py web server for remote invocation.
Community-shared models and comparisons for quality evaluation.

Use Cases

Voice cloning research and experiments to validate speaker modeling methods.
Prototyping speech products or demos that require sample-specific voices.
Educational projects for learning TTS pipelines, vocoders, and model training workflows.

Technical Features

Modular PyTorch codebase with encoder, synthesizer, and vocoder components for easy replacement and extension.
Options to reuse pretrained encoders/vocoders while training the synthesizer to reduce time-to-results.
Platform-specific setup guides (including Rosetta-based workarounds for M1 macOS) and extensive README/Wiki documentation.

MockingBird

Detailed Introduction

Main Features

Use Cases

Technical Features

Related Resources

Pixeltable

CoTyle

TOON