InfiniteTalk is an open-source sparse-frame video dubbing framework that generates long-form videos synchronized with input audio, aligning lip motion, head movement, body pose and facial expressions. It supports both image-to-video and video-to-video modes and provides demos, model weights and a technical report (2025 release). Use under Apache-2.0 license and follow applicable safety and ethical guidelines.
Key Features
- Sparse-frame dubbing: synchronizes lips as well as head and facial expressions to improve naturalness.
- Infinite-length generation: supports extended or effectively unlimited video duration for long-form content.
- Multi-mode input: image-to-video and video-to-video pipelines, with Gradio demos and ComfyUI branch support.
Use Cases
- Media research: prototyping long-form dubbing, identity preservation and camera motion handling for extended videos.
- Content prototyping: research prototypes for long-form dubbed content (respect copyright and ethics).
- Academic benchmarks: evaluate lip-sync accuracy and long-context generation stability.
Technical Highlights
- Audio-conditioned visual generation using pretrained audio encoders and model weights to drive consistent identity and motion.
- Supports acceleration and quantization strategies (TeaCache, int8/fp8 quantization, LoRA) to reduce inference cost.
- Flexible inference modes: streaming, clip, low-VRAM, multi-GPU, and Gradio-based demo serving.