Wan2.2

Wan2.2 is a family of large-scale video generation models that support text/image/audio-to-video tasks, featuring MoE and high-compression VAE designs.

Author: Alibaba

Added Date: 2025-09-23

Open Source Since: 2025-07-28

Visit Website GitHub Demo

Overview

Wan2.2 is an open family of high-quality video generation models supporting text-to-video, image-to-video, text-image-to-video and speech-to-video tasks. It introduces MoE architectures and high-compression VAE to balance quality and efficiency.

Key Features

MoE architecture: increases effective model capacity through specialized experts.
Multimodal support: text, image and audio to video pipelines with animation/replacement modules.
Rich ecosystem: released weights, inference code, ComfyUI and Diffusers integrations, and online demos.

Use Cases

Film and short-video content generation and stylistic editing.
Research and benchmarking for video generation and MoE/compression strategies.
Prototyping and demos via Hugging Face Spaces or self-hosted services.

Technical Characteristics

High-compression VAE and MoE design for efficient high-resolution video generation.
Multiple inference modes (single-GPU, multi-GPU, FSDP + DeepSpeed) and model conversion tooling.
Apache-2.0 license and active maintenance with academic publication.

Wan2.2

Overview

Key Features

Use Cases

Technical Characteristics

Resource Info

Related Resources

Spring AI Alibaba

Qwen3-VL

ROLL