A guide to building long-term compounding knowledge infrastructure. See details on GitHub .

FastChat

An open-source platform for training, serving, and evaluating conversational large language models, offering distributed serving, a Web GUI, and OpenAI-compatible APIs.

Introduction

FastChat is an open platform for training, serving, and evaluating conversational LLMs. It provides training code, a distributed multi-model serving system, a Gradio-based web UI, and OpenAI-compatible RESTful APIs, supporting many model weights and acceleration backends.

Key features

  • Support for various models (Vicuna, LongChat, FastChat-T5) and automatic Hugging Face weight downloads.
  • Distributed architecture (controller, model workers, web server) for high-throughput serving.
  • Support for acceleration and quantization strategies (ExLlama, GPTQ, AWQ, 8-bit) and platform-specific guides (Metal, XPU, Ascend).
  • Built-in evaluation and benchmarking tools (MT-bench, Chatbot Arena) for human preference collection and model comparison.

Use cases

  • Deploy a private inference service compatible with OpenAI APIs for internal use.
  • Run large-scale model evaluation, benchmarking, and Chatbot Arena experiments.
  • Use as a reference implementation for training and inference pipelines with LoRA and SkyPilot integrations.

Technical details

  • Python-first codebase leveraging PyTorch/Transformers, installable via pip or from source.
  • Offers both CLI and API for inference, enabling drop-in replacement of OpenAI endpoints.
  • Comprehensive documentation covering installation, weight management, serving, evaluation, and fine-tuning.

Comments

FastChat
Resource Info
Author lm-sys
Added Date 2025-09-30
Open Source Since 2023-03-19
Tags
Benchmark Open Source