speaker-notes: Quick intro. I'm Jimmy Song, CNCF Ambassador. Over a decade in cloud-native infra, now focused on AI Infrastructure.
speaker-notes: Four topics today. These are questions practitioners keep asking.
speaker-notes: Topic one. Bottlenecks have been shifting — a fascinating trend.
speaker-notes: 2023-2024: compute scarcity. 2025-2026: efficiency. Bottleneck moved from "having GPUs" to "using GPUs well."
speaker-notes: Three layers: hardware (memory wall), scheduling (K8s can't do fine-grained GPU), application (tidal effects, co-location).
speaker-notes: Bottlenecks evolving from point problems to system problems. Previously just buy GPUs. Now optimize the full chain.
speaker-notes: Topic two — CPU, GPU, scheduling in production.
speaker-notes: Misconception: AI only needs GPUs. CPUs handle data pipeline, API, business logic. GPU is an accelerator, not a replacement.
speaker-notes: Scheduling bridges apps and hardware. K8s native device plugin: whole-GPU only. This spawned GPU virtualization tech.
speaker-notes: HAMi: CNCF sandbox, the only OSS project focused on AI heterogeneous scheduling. GPU virtualization on K8s.
speaker-notes: Topic three — cloud-native and OSS scheduling evolution.
speaker-notes: CNCF founded CNAI WG in 2024. AI Infra becomes first-class citizen in cloud-native.
speaker-notes: OSS landscape: HAMi, Volcano, KubeRay form a complementary ecosystem. China contributes strongly here.
speaker-notes: CNAI WG covers GPU scheduling, model serving, data pipelines, observability, security.
speaker-notes: Topic four — compute demand across training, inference, Agent.
speaker-notes: Training: long-running, exclusive, communication-heavy. 70B model needs 4-8 A100s for weeks.
speaker-notes: Inference: high-freq, latency-sensitive, tidal. Day busy, night idle. GPU virtualization biggest beneficiary.
speaker-notes: Agent: long-running, mostly idle, occasional bursts. More "online service" than "batch compute." New scheduling paradigm needed.
speaker-notes: Side-by-side comparison. Training=throughput, inference=latency, Agent=elasticity. No one-size-fits-all.
speaker-notes: Summary time.
speaker-notes: Four core takeaways.
speaker-notes: Trends to watch. Industry changes fast.
speaker-notes: Backup slide.