Day One in Amsterdam: Kubernetes Is Rethinking AI

Today marks my first day at KubeCon Europe 2026. The most striking feeling is: the world is vast, but this community is truly small.

Figure 1: Jimmy on the first day of KubeCon EU 2026

One strong impression stands out:

The world is big, but this circle is really small.

Old Friends, New Cycle

At the Maintainer Summit, I met many familiar faces—

Colleagues from Ant Group, friends from Tetrate, and some people I’ve known for nearly a decade. Together, we’ve journeyed from the early days of Kubernetes, Service Mesh, and cloud native infrastructure to today.

In a sense, this generation has fully experienced:

The rise of Kubernetes
The standardization of Cloud Native
The microservices and service mesh boom
And now, the era of AI Infrastructure

This isn’t about “new people entering the field,” but rather—

The same group stepping into a new technology cycle.

What Is the Maintainer Summit Discussing?

If you ask:

What is the Kubernetes community most concerned about right now?

Today’s answer is very clear:

👉 How to run AI workloads better on Kubernetes

Figure 2: The Maintainer Summit’s main topic is AI Infra

Many topics at the Maintainer Summit revolved around:

Scheduling models for LLM / AI workloads
GPU / accelerator resource management
Integrating inference systems with Kubernetes
Redefining the roles of data plane vs. control plane
How observability tools like OTel monitor AI workloads

In other words:

Kubernetes hasn’t been replaced by AI; it’s actively “absorbing” AI.

Key Signal: GPUs Are Becoming the “Infrastructure Layer”

Today, I had an in-depth discussion with CNCF TOC, Red Hat, and the vLLM community.

The core question was:

How should GPUs be “platformized”?

Some consensus is already clear:

GPUs are no longer just devices
They are now a schedulable, partitionable, and shareable resource layer

Figure 3: TOC meeting discussing GPU resource management and LLM Serving integration

At the Maintainer Summit in Amsterdam, we had deep discussions with CNCF TOC, Red Hat, and the vLLM community about GPU resource management and LLM Serving integration in Kubernetes scenarios, and explored potential collaboration between vLLM and HAMi.

Behind this is a major paradigm shift:

Past	Now
GPU = Node resource	GPU = Infrastructure layer
Exclusive use	Multi-tenant sharing
Static binding	Dynamic scheduling
Managed within frameworks	Unified management at the platform layer

This is exactly what we’ve been working on in HAMi.

HAMi: From “Project” to “Reference Pattern”

Another interesting change today:

HAMi is no longer just a “community project”—it’s becoming:

A reference implementation (reference pattern) for AI Infra

Figure 4: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit

This is reflected in several ways:

Invited to present at the Maintainer Summit
Participating in CNCF TOC discussions
Involved in incubating review demos
Exploring joint content with the vLLM community (even discussing a joint blog 👀)

Especially in conversations with Red Hat and vLLM, a clear trend emerged:

GPU resource management and LLM serving are becoming coupled

That is:

Upper layer: vLLM / inference frameworks
Lower layer: GPU scheduling / sharing

A new “interface layer” is gradually forming.

This is a direction worth betting on.

Figure 5: At the TAG Workshop, HAMi was discussed as an Incubating demo

A Caution: The AI Infra Startup Boom Hasn’t Really Begun

At the same time, I have a somewhat “counterintuitive” observation:

We haven’t yet seen a large wave of AI Infra (K8s-focused) startups.

Most companies I saw today:

Many are pivoting from CI/CD, Service Mesh, or Gateway
Many are traditional cloud vendors extending into AI
Many are working on models, agents, or even lower-level tech

But those truly focused on:

“Making AI workloads run better on Kubernetes”

There are actually not many startups at this layer.

This could mean two things:

1) This Layer Isn’t Fully Formed Yet

Currently, most activity is at:

The model layer (LLM / foundation models)
The application layer (Agent / Copilot)

But not at:

The scheduling layer
The resource layer
The runtime layer

2) Or, the Barrier to Entry Is Very High

Because at its core, this is:

The intersection of Cloud Native × GPU × AI workload

It’s not just “wrapping AI,” but a fundamental re-architecture at the infrastructure level.

My Take

If we break down the AI technology stack:

Agent / Application
        ↓
LLM Serving (vLLM, etc.)
        ↓
AI Runtime / Scheduling
        ↓
GPU Resource Layer
        ↓
Hardware

Most innovation today is concentrated in:

The top two layers (Agent / LLM)

But the real long-term moat lies in:

The middle two layers (Runtime + Resource Layer)

And Kubernetes is very likely to remain:

The default platform for this middle layer

Summary

Today’s takeaway:

Kubernetes is not obsolete; it’s being redefined.

And our generation is shifting from:

“Cloud Native Builders”

to:

“AI Infrastructure Builders”

More to come tomorrow.

Core Content

Core Content

Technology

Technology

More

More

AI Infrastructure

AI Infrastructure

Explore

Explore

Connect

Connect

Quick Links

Quick Links

LinkedIn

LinkedIn

Follow on X

Follow on X

Day One in Amsterdam: Kubernetes Is Rethinking AI

Old Friends, New Cycle

What Is the Maintainer Summit Discussing?

Key Signal: GPUs Are Becoming the “Infrastructure Layer”

HAMi: From “Project” to “Reference Pattern”

A Caution: The AI Infra Startup Boom Hasn’t Really Begun

1) This Layer Isn’t Fully Formed Yet

2) Or, the Barrier to Entry Is Very High

My Take

Summary

Jimmy Song

Share via WeChat

When GPUs Move Toward Open Scheduling: Structural Shifts in AI Native Infrastructure

From Spatial Data to AI Open Source: Technical Standards, Data Sovereignty, and the Global Divide

In-Depth Analysis of Ark: Kubernetes for the AI Era or a New Engineering Paradigm Shift?