AI Infra Industry Trends

From Compute Bottlenecks to Ecosystem Evolution

Jimmy Song · CNCF Ambassador · Cloud Native Community Founder

June 2026

AI Infra Industry Trends | Jimmy Song | 2026.06

About Me

Background

  • CNCF Ambassador, Cloud Native Community founder
  • 10+ years in cloud-native infrastructure
  • Focused on AI-Native Infra & GPU virtualization
  • AI Infra consulting for multiple enterprises

Focus Areas

  • GPU scheduling & virtualization
  • Cloud Native × AI convergence
  • Open-source ecosystem governance
  • AI Agent infrastructure
AI Infra Industry Trends | Jimmy Song | 2026.06

Today's Topics

01

Are AI Infra bottlenecks shifting?

Compute, memory, networking, scheduling...

02

Real roles of CPU / GPU / Scheduling

How each layer fits in production

03

Cloud Native × OSS scheduling evolution

Kubernetes to GPU scheduling — the arc

04

Training / Inference / Agent demands

Resource traits across scenarios

AI Infra Industry Trends | Jimmy Song | 2026.06

01 Evolution of AI Infra Bottlenecks

AI Infra Industry Trends | Jimmy Song | 2026.06

Bottleneck Migration

2023-2024: Compute Scarcity

  • Bottleneck GPU Supply Shortage
  • LLM training demand explosion
  • H100 impossible to acquire
  • Compute = competitive edge

2025-2026: Efficiency Crisis

  • Bottleneck Low GPU Utilization
  • Global average < 30%
  • Severe memory waste
  • Missing scheduling strategies
Core shift: "How to get compute" → "How to use compute well." A paradigm transition the whole industry is undergoing.
AI Infra Industry Trends | Jimmy Song | 2026.06

Three Layers of Bottlenecks

🔩 Hardware

  • Memory wall: params > VRAM
  • Interconnect: multi-GPU bottleneck
  • Fragmentation: N chip vendors

📊 Scheduling

  • K8s too coarse: whole-GPU only
  • No GPU awareness
  • No multi-tenant isolation

🔄 Application

  • Tidal effects: day peak, night idle
  • Co-location conflicts
  • Insufficient elasticity
AI Infra Industry Trends | Jimmy Song | 2026.06

Bottleneck Evolution Trend

Stage Period Core Problem Direction
Famine 2023-2024 Availability Procurement, cloud
Utilization 2025-2026 Efficiency Virtualization, scheduling
Governance 2026+ Management Cost control, observability
Verdict: Next 2-3 years, compute governance becomes the core proposition — cost attribution, auditing, quota management, cross-team coordination.
AI Infra Industry Trends | Jimmy Song | 2026.06

02 CPU / GPU / Scheduling in Production

AI Infra Industry Trends | Jimmy Song | 2026.06

CPU vs GPU: Complementary, Not Replacement

CPU's Role

  • Data pipeline: ETL, preprocessing
  • Control plane: API gateway, routing
  • Lightweight inference: small models, Agent dispatch
  • Caching: vector search, KV Cache

GPU's Role

  • Training: large-scale matrix ops
  • Inference: compute-heavy LLM portion
  • Parallel compute: batch, embeddings
  • Specific loads: long-sequence processing

Key insight: Production CPU:GPU ratio is typically 4:1 ~ 8:1. Both indispensable.

AI Infra Industry Trends | Jimmy Song | 2026.06

Scheduling: The Underestimated Layer

K8s Native Limits

  • Device Plugin = whole-GPU only
  • No memory/compute awareness
  • No GPU overcommit
  • No multi-tenant isolation

What's Needed

  • Fine-grained slicing
  • Topology awareness (NUMA, NVLink)
  • Priority preemption
  • Training/inference co-location
  • Elastic scaling
Field observation: After buying GPU clusters, the first problem is "how to partition GPUs" — not model issues. Scheduling is the invisible battlefield.
AI Infra Industry Trends | Jimmy Song | 2026.06

GPU Virtualization: HAMi

Core Capabilities

  • Compute & memory isolation: multi-task sharing
  • 10+ chip types: NVIDIA, Ascend, Cambricon...
  • Turbo mode: near-native performance
  • Elastic memory: auto-expand on OOM
  • Rich scheduling: Binpack / Spread / Priority

Production Results (Public)

  • Top bank: GPU utilization 20% → 70%
  • Jishi Zhisuan: per-card revenue +3.15x
  • Autonomous driving: utilization +200%
  • 200+ enterprise deployments

HAMi — CNCF Sandbox, 340+ contributors, 15+ countries.

AI Infra Industry Trends | Jimmy Song | 2026.06

03 Cloud Native × Open-Source Scheduling

AI Infra Industry Trends | Jimmy Song | 2026.06

From Cloud Native to AI Native

Cloud Native 1.0

(2015-2022)

  • Containers & microservices
  • K8s becomes standard
  • Elasticity & observability
  • Application-centric

Transition

(2023-2024)

  • GPU = first-class resource
  • AI workload needs
  • Scheduling extensions
  • Device awareness

AI Native

(2025+)

  • Compute-centric scheduling
  • GPU virtualization standard
  • Uncertainty = default
  • Compute-centric
Trend: AI Native adds a compute governance layer on top of cloud-native. K8s stays, scheduling upgrades from "app scheduling" to "compute scheduling."
AI Infra Industry Trends | Jimmy Song | 2026.06

GPU Scheduling OSS Landscape

Project Focus Maturity Highlight
HAMi GPU virt & scheduling CNCF Sandbox Only AI heterogeneous scheduler
Volcano Batch scheduling CNCF Incubating Training scheduling
KubeRay Ray management CNCF Incubating Distributed training
GPU Operator Device management Production NVIDIA lock-in
KubeVirt GPU VM passthrough CNCF Incubating Whole-card approach
Observation: GPU scheduling moves from "NVIDIA proprietary" toward open, multi-cloud, multi-chip standardization.
AI Infra Industry Trends | Jimmy Song | 2026.06

CNCF CNAI Ecosystem

Focus Areas

  • Compute scheduling: GPU/NPU unified
  • Model serving: inference standardization
  • Data pipelines: training data flow
  • Observability: GPU usage monitoring
  • Security: isolation & protection

Milestones

  • CNAI Working Group founded (2024)
  • HAMi → CNCF Sandbox
  • GPU scheduling in CNAI Landscape
  • K8s 1.31+ DRA enhancement
  • Multi-chip = community consensus

CNAI Landscape: 100+ projects covering training, inference, scheduling, observability.

AI Infra Industry Trends | Jimmy Song | 2026.06

04 Training / Inference / Agent Demands

AI Infra Industry Trends | Jimmy Song | 2026.06

Training: Compute-Intensive

Resource Profile

  • Long exclusive: days ~ weeks
  • High memory: 70B needs 4-8 A100s
  • Communication-heavy: AllReduce
  • Throughput-first

Infra Requirements

  • High-speed interconnect (NVLink/IB)
  • Topology-aware scheduling
  • Checkpoint management
  • Multi-tenant isolation
Trend: Training evolving to thousand-GPU clusters. Future small/medium model training will be more widespread, no longer H100-exclusive.
AI Infra Industry Trends | Jimmy Song | 2026.06

Inference: Latency-Sensitive

Resource Profile

  • High-freq short bursts: ms-level
  • Tidal effects: day peak, night idle
  • Memory fragmentation
  • Elasticity needed

Optimization

  • GPU sharing: multi-model per card
  • Memory overcommit: hot/cold split
  • Dynamic batching
  • Quantization: FP8/INT4
  • Speculative decoding

Inference = biggest beneficiary of GPU virtualization. Utilization: 20% → 70%+.

AI Infra Industry Trends | Jimmy Song | 2026.06

Agent: Event-Driven (New Paradigm)

Resource Profile

  • Long-lived, intermittent calls
  • Long chains: plan → tool → reflect
  • Unpredictable compute
  • Cold-start sensitive

New Infra Needs

  • On-demand GPU: use when needed
  • Hybrid inference + tool execution
  • State persistence
  • MCP ecosystem
  • Cost governance
Verdict: Agent workloads drive a new scheduling paradigm — "tasks drive resource flow," not "allocate and wait."
AI Infra Industry Trends | Jimmy Song | 2026.06

Three Scenarios Compared

Dimension Training Inference Agent
Occupancy Long exclusive High-freq bursts Intermittent spikes
Core Metric Throughput Latency (TTFT/TPS) Elasticity
Memory Very high Medium Low-Medium
Scheduling Topology, exclusive Shared, Binpack On-demand, elastic
Bottleneck Interconnect Latency & utilization Cold start & cost
GPU Virt Value Medium Very High Very High
AI Infra Industry Trends | Jimmy Song | 2026.06

Summary & Outlook

AI Infra Industry Trends | Jimmy Song | 2026.06

Four Core Takeaways

1️⃣ Bottlenecks Migrating

Scarcity → inefficiency → governance chaos. Compute governance is next.

2️⃣ Scheduling Is Key

GPU virtualization = highest ROI today. Use existing GPUs better.

3️⃣ OSS Accelerating

Proprietary → open, multi-cloud, multi-chip. China's community leads.

4️⃣ Agents Reshape Demands

New scheduling paradigm. On-demand GPU flow = standard capability.

AI Infra Industry Trends | Jimmy Song | 2026.06
  • Small models rising: distillation & quantization lower barriers — GPU from luxury to utility
  • Multimodal convergence: unified vision/speech/text — new memory & bandwidth demands
  • Edge inference: on-device AI chips mature, cloud → edge migration
  • Compute marketization: GPU cloud pricing drops, trading markets emerge
  • OSS catching up: Llama, Qwen, DeepSeek narrow the gap
Bottom line: AI Infra competition shifts from "who has the most GPUs" to "who uses GPUs best." Efficiency, governance, ecosystem win.
AI Infra Industry Trends | Jimmy Song | 2026.06

Thank You

Jimmy Song · CNCF Ambassador

jimmysong.io · Cloud Native Community

Personal views based on public info and frontline observations

AI Infra Industry Trends | Jimmy Song | 2026.06

Appendix: OSS Projects to Watch

Project Domain Notes
HAMi GPU virt scheduling CNCF Sandbox, AI heterogeneous only
vLLM Inference engine PagedAttention
SGLang Inference framework Structured generation
Ray Distributed compute Unified training & inference
Volcano Batch scheduling K8s-native
Ollama Local inference Simplified deployment
KubeRay Ray on K8s Training cluster mgmt
DRA K8s dynamic resources Native GPU scheduling
AI Infra Industry Trends | Jimmy Song | 2026.06

speaker-notes: Quick intro. I'm Jimmy Song, CNCF Ambassador. Over a decade in cloud-native infra, now focused on AI Infrastructure.

speaker-notes: Four topics today. These are questions practitioners keep asking.

speaker-notes: Topic one. Bottlenecks have been shifting — a fascinating trend.

speaker-notes: 2023-2024: compute scarcity. 2025-2026: efficiency. Bottleneck moved from "having GPUs" to "using GPUs well."

speaker-notes: Three layers: hardware (memory wall), scheduling (K8s can't do fine-grained GPU), application (tidal effects, co-location).

speaker-notes: Bottlenecks evolving from point problems to system problems. Previously just buy GPUs. Now optimize the full chain.

speaker-notes: Topic two — CPU, GPU, scheduling in production.

speaker-notes: Misconception: AI only needs GPUs. CPUs handle data pipeline, API, business logic. GPU is an accelerator, not a replacement.

speaker-notes: Scheduling bridges apps and hardware. K8s native device plugin: whole-GPU only. This spawned GPU virtualization tech.

speaker-notes: HAMi: CNCF sandbox, the only OSS project focused on AI heterogeneous scheduling. GPU virtualization on K8s.

speaker-notes: Topic three — cloud-native and OSS scheduling evolution.

speaker-notes: CNCF founded CNAI WG in 2024. AI Infra becomes first-class citizen in cloud-native.

speaker-notes: OSS landscape: HAMi, Volcano, KubeRay form a complementary ecosystem. China contributes strongly here.

speaker-notes: CNAI WG covers GPU scheduling, model serving, data pipelines, observability, security.

speaker-notes: Topic four — compute demand across training, inference, Agent.

speaker-notes: Training: long-running, exclusive, communication-heavy. 70B model needs 4-8 A100s for weeks.

speaker-notes: Inference: high-freq, latency-sensitive, tidal. Day busy, night idle. GPU virtualization biggest beneficiary.

speaker-notes: Agent: long-running, mostly idle, occasional bursts. More "online service" than "batch compute." New scheduling paradigm needed.

speaker-notes: Side-by-side comparison. Training=throughput, inference=latency, Agent=elasticity. No one-size-fits-all.

speaker-notes: Summary time.

speaker-notes: Four core takeaways.

speaker-notes: Trends to watch. Industry changes fast.

speaker-notes: Backup slide.