VibeVoiceFusion

VibeVoiceFusion is a full-stack web application for high-quality, multi-speaker voice synthesis with voice cloning and VRAM optimization.

Author: zhao-kun

Since: 2025-09-26

GitHub

Detailed Introduction

VibeVoiceFusion is a full-stack web application for multi-speaker voice synthesis built on the VibeVoice architecture (autoregressive + diffusion). It uses a Qwen backbone with acoustic and semantic encoders to process reference audio, generates speech tokens autoregressively, and refines waveforms with a DPM-Solver diffusion head. The project offers a web UI and CLI, bilingual interface, and project management features suitable for local deployment and research experiments.

Main Features

Complete web application: project and speaker management, dialog editor, generation history and live preview.
Multi-speaker synthesis: supports 2–4+ speaker dialogs and voice cloning from reference samples.
VRAM optimizations: layer offloading and Float8 quantization significantly reduce memory usage.
Deployment-ready: Docker multi-stage builds, automatic model download and build scripts for local installation.

Use Cases

Suitable for podcast production, dubbing, dialog content creation, and research prototypes. Creators can generate multi-speaker audio locally or on private servers; teams can manage sessions and export WAV files. Researchers can compare performance and audio quality across precision and offloading strategies via the CLI.

Technical Features

Model architecture: Qwen backbone + VAE acoustic tokenizer + diffusion generation head.
Memory strategies: dynamic layer offloading (Balanced/Aggressive/Extreme) and Float8 (E4M3FN) quantization to cut VRAM roughly in half.
Compatibility: backend in Python/Flask with PyTorch; frontend in Next.js and TailwindCSS; supports CUDA/mps/cpu devices.
Responsible use: project targets research and development; obtain explicit consent before cloning voices to avoid misuse.

VibeVoiceFusion

Detailed Introduction

Main Features

Use Cases

Technical Features

Resource Info

Related Resources

Pixeltable

CoTyle

TOON