A guide to building long-term compounding knowledge infrastructure. See details on GitHub .

CosyVoice

Multilingual, high-quality streaming TTS / speech generation toolkit supporting zero-shot cloning and low-latency generation.

Introduction

CosyVoice is a multilingual streaming text-to-speech (TTS) generation library supporting zero-shot voice cloning, low-latency streaming synthesis, and cross-language generation. It is suitable for both online and offline deployment.

Key Features

  • Supports speech synthesis for Chinese, English, Japanese, Korean, and various dialects
  • Zero-shot voice cloning and cross-language synthesis capabilities
  • Provides training, inference, and Docker deployment examples

Use Cases

  • Voice assistants, podcast dubbing, virtual characters, and content creation
  • Online services requiring low-latency, high-quality TTS
  • Research and model fine-tuning scenarios

Technical Highlights

  • Offers streaming inference and optimization paths such as TRITON/TensorRT
  • Rich models and demo pages, Apache-2.0 licensed
  • Supports vLLM integration and GPU-accelerated deployment

Comments

CosyVoice
Resource Info
Author FunAudioLLM
Added Date 2025-09-13
Tags
OSS Utility Project TTS