KServe

KServe: a Kubernetes-native model inference platform for scalable predictive and generative AI deployments.

Author: KServe

Added Date: 2025-09-27

Open Source Since: 2019-03-27

Introduction

KServe is a Kubernetes-native model inference platform that provides standardized CRDs and data-plane protocols to support scalable predictive and generative AI in production.

Key Features

Standardized Inference CRDs and APIs for simplified model deployment and lifecycle management.
Autoscaling (including GPU autoscaling and scale-to-zero) and high-density model loading via ModelMesh.
Support for canary releases, pipelines, and ensembles (InferenceGraph) for advanced deployment patterns.

Use Cases

Deploy and manage online inference services (real-time and batch) on Kubernetes declaratively.
Provide a unified ingress and routing layer for multi-framework, multi-model deployments.
Integrate with GenAI/LLM inference and MCP scenarios with observability and governance.

Technical Highlights

Extends Kubernetes via CRDs for smooth integration with k8s toolchains and CI/CD.
Integrates with ModelMesh for intelligent routing, resource reuse, and high-density serving.
Supports various deployment modes (Knative serverless, raw k8s, ModelMesh) to meet different scale and latency needs.

KServe

Introduction

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

Kthena

kvcached

dInfer