Xinference (Xorbits Inference)

A model serving and inference framework that supports multiple backends, distributed deployments, and OpenAI-compatible APIs.

Author: Xorbits

Added Date: 2025-09-27

Open Source Since: 2023-06-14

Visit Website GitHub

Overview

Xinference (Xorbits Inference) is a model serving and inference framework for language, speech, and multimodal models. It supports heterogeneous backends, distributed deployment, and provides OpenAI-compatible RESTful APIs for easy integration.

Key Features

Support for various inference engines (vLLM, GGML, TensorRT) and efficient use of heterogeneous hardware.
OpenAI-compatible REST API, RPC, CLI and WebUI with streaming and function-calling support.
Built-in support for cluster and distributed deployments, with Docker and Helm charts for production setups.

Use Cases

Self-hosted LLM services to control cost and privacy.
Enterprise-grade model serving with multi-node, high-throughput requirements.
Rapid prototyping and experiments via Colab, Docker, or Kubernetes.

Technical Highlights

Modular architecture with backend plugins and custom model adapters.
Deep integrations with third-party ecosystems (LangChain, LlamaIndex, Dify) for building RAG and agent workflows.
Comprehensive docs and examples on ReadTheDocs to accelerate adoption and production migration.

Xinference (Xorbits Inference)

Overview

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

Kthena

kvcached

dInfer