Knative Serving

Knative Serving is a Kubernetes-based serverless container runtime that supports scale-to-zero, request-driven autoscaling and traffic routing.

Author: Knative

Added Date: 2025-10-10

Open Source Since: 2018-01-24

Visit Website GitHub

Overview

Knative Serving is a Kubernetes-native serverless container runtime that provides a request-driven execution model, automatic scaling (including scale-to-zero), and traffic routing. It manages applications as revisions, enabling zero-downtime deployments, traffic splitting and easy rollback in cloud-native environments.

Key features

Scale-to-zero and automatic autoscaling to reduce idle resource cost.
Request-driven traffic routing, versioned revisions and zero-downtime deployments.
Integration with Kubernetes networking implementations (Istio, Contour, etc.).

Use cases

Suitable for event-driven microservices, short-lived jobs, HTTP/gRPC inference services and online services requiring frequent releases and traffic splitting. For ML/AI, Knative Serving can host model inference containers with on-demand autoscaling to balance latency and cost.

Technical details

Implemented in Go, Knative Serving focuses on availability, observability and tight Kubernetes integration. Core components like Activator, Autoscaler and Queue-Proxy handle request buffering, concurrency control and instance lifecycle management, while supporting multiple networking plugins and extension points.

Knative Serving

Overview

Key features

Use cases

Technical details

Resource Info

Related Resources

Kthena

kvcached

dInfer