RouteLLM

RouteLLM is an open-source framework for serving and evaluating LLM routers that routes queries between models to reduce cost while retaining high-quality outputs.

Author: lm-sys

Since: 2024-06-03

GitHub

RouteLLM is an open-source framework for routing requests between models, sending simpler queries to cheaper models and complex queries to stronger models to achieve near high-quality performance at much lower cost. The project includes server components, evaluation tooling, and pretrained routers.

Key features

Router suite: multiple built-in routers (mf, sw_ranking, bert, etc.) with extensibility for custom routing strategies.
Evaluation framework: tools to evaluate router performance on benchmarks (MT Bench, MMLU, GSM8K) and visualize results.
OpenAI-compatible server: run an OpenAI-compatible server for seamless client integration.
Local & remote model support: route to local models or cloud providers with threshold calibration and cost controls.

Use cases

Reduce inference costs in multi-model deployments while preserving response quality.
Research and compare routing strategies across benchmarks.
Replace single-model clients with a router service to optimize cost/performance tradeoffs.

Technical details

Implementation: Python-based with controllers, server and evaluation scripts; examples and benchmarks included.
Deployment: provides a Python SDK and server mode; installable via pip or runnable from source.
License: Apache-2.0, actively maintained with an associated research paper and benchmark data.

RouteLLM

Key features

Use cases

Technical details

Resource Info

Related Resources

FastChat

Kata Containers

Golem