OpenLLM

OpenLLM (by BentoML) simplifies self-hosting LLMs by providing CLI tools, an OpenAI-compatible server, built-in chat UI, and integrations with various inference backends.

Author: BentoML

Since: 2023-04-19

Visit Website GitHub

Overview

OpenLLM is an open-source toolkit maintained by BentoML to simplify self-hosting large language models. It offers CLI and Python APIs, an OpenAI-compatible model server (openllm serve), built-in web UI, and integrations with inference backends and cloud deployments.

Key features

One-command model serving: openllm serve <model> launches a service exposing OpenAI-compatible APIs and a web chat UI.
Broad model support: adapters and model repositories for many open-source LLMs (Llama, Mistral, Qwen, Gemma, etc.).
Deployment options: Docker, Kubernetes, and BentoML/BentoCloud integrations for production deployments.

Use cases

Quickly self-host models locally for experimentation or production.
Provide an audit-friendly, monitorable inference service for teams.
Integrate custom model repositories for organization-specific models.

Technical notes

Python-based with CLI and SDK; integrates with vLLM, BentoML and other inference tooling.
Does not store model weights; gated models require HF_TOKEN and appropriate access.
Apache-2.0 licensed with active community and detailed documentation.

OpenLLM

Overview

Key features

Use cases

Technical notes

Resource Info

Related Resources

BentoML

gtr — Git Worktree Runner

Katana