Overview
Knative Serving is a Kubernetes-native serverless container runtime that provides a request-driven execution model, automatic scaling (including scale-to-zero), and traffic routing. It manages applications as revisions, enabling zero-downtime deployments, traffic splitting and easy rollback in cloud-native environments.
Key features
- Scale-to-zero and automatic autoscaling to reduce idle resource cost.
- Request-driven traffic routing, versioned revisions and zero-downtime deployments.
- Integration with Kubernetes networking implementations (Istio, Contour, etc.).
Use cases
Suitable for event-driven microservices, short-lived jobs, HTTP/gRPC inference services and online services requiring frequent releases and traffic splitting. For ML/AI, Knative Serving can host model inference containers with on-demand autoscaling to balance latency and cost.
Technical details
Implemented in Go, Knative Serving focuses on availability, observability and tight Kubernetes integration. Core components like Activator, Autoscaler and Queue-Proxy handle request buffering, concurrency control and instance lifecycle management, while supporting multiple networking plugins and extension points.