Wan2.2

Wan2.2: An open-source large-scale video generative model suite supporting multi-modal tasks (T2V/I2V/TI2V/S2V) with efficient inference.

Visit Website GitHub

Overview

Wan2.2 is an open-source suite of large-scale video generative models that covers text-to-video (T2V), image-to-video (I2V), text-image-to-video (TI2V) and speech-to-video (S2V) tasks. It introduces a Mixture-of-Experts (MoE) architecture and a high-compression VAE to enable efficient 720P video generation, and releases inference code and model weights to facilitate research and deployment on ModelScope, Hugging Face, or self-hosted environments.

Key Features

Multi-modal support: T2V, I2V, TI2V, S2V.
MoE architecture for increased capacity with controllable inference cost.
Range of model sizes and a high-compression Wan2.2-VAE for practical 720P generation.
Broad ecosystem integrations and demos (Hugging Face, ModelScope, ComfyUI).

Use Cases

Cinematic short video prototyping and content creation.
Automated animation and character replacement workflows.
Research and education for model scaling, optimization, and training techniques.

Technical Highlights

Architecture: MoE combined with high-compression VAE for quality-speed tradeoffs.
Data: Large-scale multi-modal datasets with curated aesthetic labels to improve visual fidelity.
Deployment: Examples for single-GPU and distributed inference (FSDP, DeepSpeed, offload) with performance benchmarks.

Wan2.2

Overview

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

Llumnix

Qwen

Tongyi DeepResearch