InfiniteTalk

An open-source framework for synchronized video dubbing, enhancing long-form content with natural lip motion and expressions. Explore its innovative features today!

Author: MeiGen-AI

Added Date: 2025-09-15

Open Source Since: 2025-08-14

Visit Website GitHub

InfiniteTalk is an open-source sparse-frame video dubbing framework that generates long-form videos synchronized with input audio, aligning lip motion, head movement, body pose and facial expressions. It supports both image-to-video and video-to-video modes and provides demos, model weights and a technical report (2025 release). Use under Apache-2.0 license and follow applicable safety and ethical guidelines.

Key Features

Sparse-frame dubbing: synchronizes lips as well as head and facial expressions to improve naturalness.
Infinite-length generation: supports extended or effectively unlimited video duration for long-form content.
Multi-mode input: image-to-video and video-to-video pipelines, with Gradio demos and ComfyUI branch support.

Use Cases

Media research: prototyping long-form dubbing, identity preservation and camera motion handling for extended videos.
Content prototyping: research prototypes for long-form dubbed content (respect copyright and ethics).
Academic benchmarks: evaluate lip-sync accuracy and long-context generation stability.

Technical Highlights

Audio-conditioned visual generation using pretrained audio encoders and model weights to drive consistent identity and motion.
Supports acceleration and quantization strategies (TeaCache, int8/fp8 quantization, LoRA) to reduce inference cost.
Flexible inference modes: streaming, clip, low-VRAM, multi-GPU, and Gradio-based demo serving.

InfiniteTalk

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

Nano-vLLM

DeepSeek-OCR

LeRobot