Inference | Jimmy Song

KAITO and KubeFleet: CNCF Is Reshaping AI Inference Infrastructure

Nov 8, 2025 • AI Engineering

CNCF is standardizing AI inference infrastructure for scalable deployment in multi-cluster Kubernetes environments through KAITO and KubeFleet.

AI Engineering

Building Efficient LLM Inference with the Cloud Native Quartet: KServe, vLLM, llm-d, and WG Serving

Nov 8, 2025 • AI Engineering

Essential reading for cloud native and AI-native architects: how KServe, vLLM, llm-d, and WG Serving form the cloud native ‘quartet’ for large model inference, their roles, synergy, and ecosystem trends.

Cloud Native

The Impact of Istio 1.28 on LLM Inference Infrastructure

Nov 7, 2025 • Cloud Native

Deep dive into Istio 1.28: How InferencePool, Ambient Multicluster, nftables, and Dual‑stack enhance observability, reliability, and high-concurrency networking for LLM inference infrastructure.

Why AI Inference Naturally Belongs to Kubernetes

AI Engineering

The Natural Fit Between AI Inference and Kubernetes

Nov 5, 2025 • AI Engineering

Explore why Kubernetes is the ideal runtime for AI inference — delivering elastic, cost-efficient, low-latency model serving with GPU-aware autoscaling, versioning, and observability.