AI Infra Industry Trends

Stage	Period	Core Problem	Direction
Famine	2023-2024	Availability	Procurement, cloud
Utilization	2025-2026	Efficiency	Virtualization, scheduling
Governance	2026+	Management	Cost control, observability

Project	Focus	Maturity	Highlight
HAMi	GPU virt & scheduling	CNCF Sandbox	Only AI heterogeneous scheduler
Volcano	Batch scheduling	CNCF Incubating	Training scheduling
KubeRay	Ray management	CNCF Incubating	Distributed training
GPU Operator	Device management	Production	NVIDIA lock-in
KubeVirt	GPU VM passthrough	CNCF Incubating	Whole-card approach

Dimension	Training	Inference	Agent
Occupancy	Long exclusive	High-freq bursts	Intermittent spikes
Core Metric	Throughput	Latency (TTFT/TPS)	Elasticity
Memory	Very high	Medium	Low-Medium
Scheduling	Topology, exclusive	Shared, Binpack	On-demand, elastic
Bottleneck	Interconnect	Latency & utilization	Cold start & cost
GPU Virt Value	Medium	Very High	Very High

Project	Domain	Notes
HAMi	GPU virt scheduling	CNCF Sandbox, AI heterogeneous only
vLLM	Inference engine	PagedAttention
SGLang	Inference framework	Structured generation
Ray	Distributed compute	Unified training & inference
Volcano	Batch scheduling	K8s-native
Ollama	Local inference	Simplified deployment
KubeRay	Ray on K8s	Training cluster mgmt
DRA	K8s dynamic resources	Native GPU scheduling

speaker-notes: Quick intro. I'm Jimmy Song, CNCF Ambassador. Over a decade in cloud-native infra, now focused on AI Infrastructure.

speaker-notes: Four topics today. These are questions practitioners keep asking.

speaker-notes: Topic one. Bottlenecks have been shifting — a fascinating trend.

speaker-notes: 2023-2024: compute scarcity. 2025-2026: efficiency. Bottleneck moved from "having GPUs" to "using GPUs well."

speaker-notes: Three layers: hardware (memory wall), scheduling (K8s can't do fine-grained GPU), application (tidal effects, co-location).

speaker-notes: Bottlenecks evolving from point problems to system problems. Previously just buy GPUs. Now optimize the full chain.

speaker-notes: Topic two — CPU, GPU, scheduling in production.

speaker-notes: Misconception: AI only needs GPUs. CPUs handle data pipeline, API, business logic. GPU is an accelerator, not a replacement.

speaker-notes: Scheduling bridges apps and hardware. K8s native device plugin: whole-GPU only. This spawned GPU virtualization tech.

speaker-notes: HAMi: CNCF sandbox, the only OSS project focused on AI heterogeneous scheduling. GPU virtualization on K8s.

speaker-notes: Topic three — cloud-native and OSS scheduling evolution.

speaker-notes: CNCF founded CNAI WG in 2024. AI Infra becomes first-class citizen in cloud-native.

speaker-notes: OSS landscape: HAMi, Volcano, KubeRay form a complementary ecosystem. China contributes strongly here.

speaker-notes: CNAI WG covers GPU scheduling, model serving, data pipelines, observability, security.

speaker-notes: Topic four — compute demand across training, inference, Agent.

speaker-notes: Training: long-running, exclusive, communication-heavy. 70B model needs 4-8 A100s for weeks.

speaker-notes: Inference: high-freq, latency-sensitive, tidal. Day busy, night idle. GPU virtualization biggest beneficiary.

speaker-notes: Agent: long-running, mostly idle, occasional bursts. More "online service" than "batch compute." New scheduling paradigm needed.

speaker-notes: Side-by-side comparison. Training=throughput, inference=latency, Agent=elasticity. No one-size-fits-all.

speaker-notes: Summary time.

speaker-notes: Four core takeaways.

speaker-notes: Trends to watch. Industry changes fast.

speaker-notes: Backup slide.

AI Infra Industry Trends

From Compute Bottlenecks to Ecosystem Evolution

About Me

Background

Focus Areas

Today's Topics

01

02

03

04

01 Evolution of AI Infra Bottlenecks

Bottleneck Migration

2023-2024: Compute Scarcity

2025-2026: Efficiency Crisis

Three Layers of Bottlenecks

Hardware

Scheduling

Application

Bottleneck Evolution Trend

02 CPU / GPU / Scheduling in Production

CPU vs GPU: Complementary, Not Replacement

CPU's Role

GPU's Role

Scheduling: The Underestimated Layer

K8s Native Limits

What's Needed

GPU Virtualization: HAMi

Core Capabilities

Production Results (Public)

03 Cloud Native × Open-Source Scheduling

From Cloud Native to AI Native

Cloud Native 1.0

Transition

AI Native

GPU Scheduling OSS Landscape

CNCF CNAI Ecosystem

Focus Areas

Milestones

04 Training / Inference / Agent Demands

Training: Compute-Intensive

Resource Profile

Infra Requirements

Inference: Latency-Sensitive

Resource Profile

Optimization

Agent: Event-Driven (New Paradigm)

Resource Profile

New Infra Needs

Three Scenarios Compared

Summary & Outlook

Four Core Takeaways

Bottlenecks Migrating

Scheduling Is Key

OSS Accelerating

Agents Reshape Demands

Trends to Watch

Thank You

Appendix: OSS Projects to Watch