Introduction
FastChat is an open platform for training, serving, and evaluating conversational LLMs. It provides training code, a distributed multi-model serving system, a Gradio-based web UI, and OpenAI-compatible RESTful APIs, supporting many model weights and acceleration backends.
Key features
- Support for various models (Vicuna, LongChat, FastChat-T5) and automatic Hugging Face weight downloads.
- Distributed architecture (controller, model workers, web server) for high-throughput serving.
- Support for acceleration and quantization strategies (ExLlama, GPTQ, AWQ, 8-bit) and platform-specific guides (Metal, XPU, Ascend).
- Built-in evaluation and benchmarking tools (MT-bench, Chatbot Arena) for human preference collection and model comparison.
Use cases
- Deploy a private inference service compatible with OpenAI APIs for internal use.
- Run large-scale model evaluation, benchmarking, and Chatbot Arena experiments.
- Use as a reference implementation for training and inference pipelines with LoRA and SkyPilot integrations.
Technical details
- Python-first codebase leveraging PyTorch/Transformers, installable via pip or from source.
- Offers both CLI and API for inference, enabling drop-in replacement of OpenAI endpoints.
- Comprehensive documentation covering installation, weight management, serving, evaluation, and fine-tuning.