RouteLLM is an open-source framework for routing requests between models, sending simpler queries to cheaper models and complex queries to stronger models to achieve near high-quality performance at much lower cost. The project includes server components, evaluation tooling, and pretrained routers.
Key features
- Router suite: multiple built-in routers (mf, sw_ranking, bert, etc.) with extensibility for custom routing strategies.
- Evaluation framework: tools to evaluate router performance on benchmarks (MT Bench, MMLU, GSM8K) and visualize results.
- OpenAI-compatible server: run an OpenAI-compatible server for seamless client integration.
- Local & remote model support: route to local models or cloud providers with threshold calibration and cost controls.
Use cases
- Reduce inference costs in multi-model deployments while preserving response quality.
- Research and compare routing strategies across benchmarks.
- Replace single-model clients with a router service to optimize cost/performance tradeoffs.
Technical details
- Implementation: Python-based with controllers, server and evaluation scripts; examples and benchmarks included.
- Deployment: provides a Python SDK and server mode; installable via pip or runnable from source.
- License: Apache-2.0, actively maintained with an associated research paper and benchmark data.