CosyVoice

Multilingual, high-quality streaming TTS / speech generation toolkit supporting zero-shot cloning and low-latency generation.

Author: FunAudioLLM

Added Date: 2025-09-13

Open Source Since: 2024-07-03

Visit Website GitHub Demo

Introduction

CosyVoice is a multilingual streaming text-to-speech (TTS) generation library supporting zero-shot voice cloning, low-latency streaming synthesis, and cross-language generation. It is suitable for both online and offline deployment.

Key Features

Supports speech synthesis for Chinese, English, Japanese, Korean, and various dialects
Zero-shot voice cloning and cross-language synthesis capabilities
Provides training, inference, and Docker deployment examples

Use Cases

Voice assistants, podcast dubbing, virtual characters, and content creation
Online services requiring low-latency, high-quality TTS
Research and model fine-tuning scenarios

Technical Highlights

Offers streaming inference and optimization paths such as TRITON/TensorRT
Rich models and demo pages, Apache-2.0 licensed
Supports vLLM integration and GPU-accelerated deployment

CosyVoice

Introduction

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

Nano-vLLM

DeepSeek-OCR

LeRobot