A guide to building long-term compounding knowledge infrastructure. See details on GitHub .

Wan2.2

Wan2.2: An open-source large-scale video generative model suite supporting multi-modal tasks (T2V/I2V/TI2V/S2V) with efficient inference.

Overview

Wan2.2 is an open-source suite of large-scale video generative models that covers text-to-video (T2V), image-to-video (I2V), text-image-to-video (TI2V) and speech-to-video (S2V) tasks. It introduces a Mixture-of-Experts (MoE) architecture and a high-compression VAE to enable efficient 720P video generation, and releases inference code and model weights to facilitate research and deployment on ModelScope, Hugging Face, or self-hosted environments.

Key Features

  • Multi-modal support: T2V, I2V, TI2V, S2V.
  • MoE architecture for increased capacity with controllable inference cost.
  • Range of model sizes and a high-compression Wan2.2-VAE for practical 720P generation.
  • Broad ecosystem integrations and demos (Hugging Face, ModelScope, ComfyUI).

Use Cases

  • Cinematic short video prototyping and content creation.
  • Automated animation and character replacement workflows.
  • Research and education for model scaling, optimization, and training techniques.

Technical Highlights

  • Architecture: MoE combined with high-compression VAE for quality-speed tradeoffs.
  • Data: Large-scale multi-modal datasets with curated aesthetic labels to improve visual fidelity.
  • Deployment: Examples for single-GPU and distributed inference (FSDP, DeepSpeed, offload) with performance benchmarks.

Comments

Wan2.2
Resource Info
Author Alibaba
Added Date 2025-09-24
Tags
OSS Image Generation Data