CosyVoice

Multilingual, high-quality streaming TTS / speech generation toolkit supporting zero-shot cloning and low-latency generation.

FunAudioLLM · Since 2024-07-03

Loading score...

GitHub Website Demo

Introduction

CosyVoice is a multilingual streaming text-to-speech (TTS) generation library supporting zero-shot voice cloning, low-latency streaming synthesis, and cross-language generation. It is suitable for both online and offline deployment.

Key Features

Supports speech synthesis for Chinese, English, Japanese, Korean, and various dialects
Zero-shot voice cloning and cross-language synthesis capabilities
Provides training, inference, and Docker deployment examples

Use Cases

Voice assistants, podcast dubbing, virtual characters, and content creation
Online services requiring low-latency, high-quality TTS
Research and model fine-tuning scenarios

Technical Highlights

Offers streaming inference and optimization paths such as TRITON/TensorRT
Rich models and demo pages, Apache-2.0 licensed
Supports vLLM integration and GPU-accelerated deployment

Core Content

Core Content

Technology

Technology

More

More

AI Infrastructure

AI Infrastructure

Explore

Explore

Connect

Connect

Quick Links

Quick Links

LinkedIn

LinkedIn

Follow on X

Follow on X

CosyVoice

Introduction

Key Features

Use Cases

Technical Highlights

Score Breakdown

Related Resources

AutoSubs

Axolotl

Cactus