xLLM

xLLM is an open-source framework for vision-language models, providing tools and documentation for training and inference.

Author: jd-opensource

Since: 2025-08-12

Visit Website GitHub

Detailed Introduction

xLLM is an open-source framework for vision-language models, offering training, fine-tuning, and inference tooling with documentation and examples to help research and engineering teams build multimodal systems.

Main Features

Supports joint training and inference pipelines for vision-language tasks.
Provides multimodal data processing and evaluation tools.
Comprehensive ReadTheDocs documentation and example code for engineering adoption.

Use Cases

Suitable for research and product teams building visual question answering, image captioning, and multimodal retrieval systems.

Technical Features

Focuses on multimodal feature fusion and cross-modal alignment, offering extensible model components and training strategies for large-scale training and fine-tuning.

xLLM

Detailed Introduction

Main Features

Use Cases

Technical Features

Resource Info

Related Resources

Pixeltable

CoTyle

TOON