Overview
Wan2.2 is an open family of high-quality video generation models supporting text-to-video, image-to-video, text-image-to-video and speech-to-video tasks. It introduces MoE architectures and high-compression VAE to balance quality and efficiency.
Key Features
- MoE architecture: increases effective model capacity through specialized experts.
- Multimodal support: text, image and audio to video pipelines with animation/replacement modules.
- Rich ecosystem: released weights, inference code, ComfyUI and Diffusers integrations, and online demos.
Use Cases
- Film and short-video content generation and stylistic editing.
- Research and benchmarking for video generation and MoE/compression strategies.
- Prototyping and demos via Hugging Face Spaces or self-hosted services.
Technical Characteristics
- High-compression VAE and MoE design for efficient high-resolution video generation.
- Multiple inference modes (single-GPU, multi-GPU, FSDP + DeepSpeed) and model conversion tooling.
- Apache-2.0 license and active maintenance with academic publication.