FastChat

An open-source platform for training, serving, and evaluating conversational large language models, offering distributed serving, a Web GUI, and OpenAI-compatible APIs.

Author: lm-sys

Since: 2023-03-19

Visit Website GitHub

Introduction

FastChat is an open platform for training, serving, and evaluating conversational LLMs. It provides training code, a distributed multi-model serving system, a Gradio-based web UI, and OpenAI-compatible RESTful APIs, supporting many model weights and acceleration backends.

Key features

Support for various models (Vicuna, LongChat, FastChat-T5) and automatic Hugging Face weight downloads.
Distributed architecture (controller, model workers, web server) for high-throughput serving.
Support for acceleration and quantization strategies (ExLlama, GPTQ, AWQ, 8-bit) and platform-specific guides (Metal, XPU, Ascend).
Built-in evaluation and benchmarking tools (MT-bench, Chatbot Arena) for human preference collection and model comparison.

Use cases

Deploy a private inference service compatible with OpenAI APIs for internal use.
Run large-scale model evaluation, benchmarking, and Chatbot Arena experiments.
Use as a reference implementation for training and inference pipelines with LoRA and SkyPilot integrations.

Technical details

Python-first codebase leveraging PyTorch/Transformers, installable via pip or from source.
Offers both CLI and API for inference, enabling drop-in replacement of OpenAI endpoints.
Comprehensive documentation covering installation, weight management, serving, evaluation, and fine-tuning.

FastChat

Introduction

Key features

Use cases

Technical details

Resource Info

Related Resources

RouteLLM

Evaluation Guidebook

Giskard OSS