Today marks my first day at KubeCon Europe 2026. The most striking feeling is: the world is vast, but this community is truly small.

One strong impression stands out:
The world is big, but this circle is really small.
Old Friends, New Cycle
At the Maintainer Summit, I met many familiar faces—
Colleagues from Ant Group, friends from Tetrate, and some people I’ve known for nearly a decade. Together, we’ve journeyed from the early days of Kubernetes, Service Mesh, and cloud native infrastructure to today.
In a sense, this generation has fully experienced:
- The rise of Kubernetes
- The standardization of Cloud Native
- The microservices and service mesh boom
- And now, the era of AI Infrastructure
This isn’t about “new people entering the field,” but rather—
The same group stepping into a new technology cycle.
What Is the Maintainer Summit Discussing?
If you ask:
What is the Kubernetes community most concerned about right now?
Today’s answer is very clear:
👉 How to run AI workloads better on Kubernetes

Many topics at the Maintainer Summit revolved around:
- Scheduling models for LLM / AI workloads
- GPU / accelerator resource management
- Integrating inference systems with Kubernetes
- Redefining the roles of data plane vs. control plane
- How observability tools like OTel monitor AI workloads
In other words:
Kubernetes hasn’t been replaced by AI; it’s actively “absorbing” AI.
Key Signal: GPUs Are Becoming the “Infrastructure Layer”
Today, I had an in-depth discussion with CNCF TOC, Red Hat, and the vLLM community.
The core question was:
How should GPUs be “platformized”?
Some consensus is already clear:
- GPUs are no longer just devices
- They are now a schedulable, partitionable, and shareable resource layer

At the Maintainer Summit in Amsterdam, we had deep discussions with CNCF TOC, Red Hat, and the vLLM community about GPU resource management and LLM Serving integration in Kubernetes scenarios, and explored potential collaboration between vLLM and HAMi.
Behind this is a major paradigm shift:
| Past | Now |
|---|---|
| GPU = Node resource | GPU = Infrastructure layer |
| Exclusive use | Multi-tenant sharing |
| Static binding | Dynamic scheduling |
| Managed within frameworks | Unified management at the platform layer |
This is exactly what we’ve been working on in HAMi.
HAMi: From “Project” to “Reference Pattern”
Another interesting change today:
HAMi is no longer just a “community project”—it’s becoming:
A reference implementation (reference pattern) for AI Infra

This is reflected in several ways:
- Invited to present at the Maintainer Summit
- Participating in CNCF TOC discussions
- Involved in incubating review demos
- Exploring joint content with the vLLM community (even discussing a joint blog 👀)
Especially in conversations with Red Hat and vLLM, a clear trend emerged:
GPU resource management and LLM serving are becoming coupled
That is:
- Upper layer: vLLM / inference frameworks
- Lower layer: GPU scheduling / sharing
A new “interface layer” is gradually forming.
This is a direction worth betting on.

A Caution: The AI Infra Startup Boom Hasn’t Really Begun
At the same time, I have a somewhat “counterintuitive” observation:
We haven’t yet seen a large wave of AI Infra (K8s-focused) startups.
Most companies I saw today:
- Many are pivoting from CI/CD, Service Mesh, or Gateway
- Many are traditional cloud vendors extending into AI
- Many are working on models, agents, or even lower-level tech
But those truly focused on:
“Making AI workloads run better on Kubernetes”
There are actually not many startups at this layer.
This could mean two things:
1) This Layer Isn’t Fully Formed Yet
Currently, most activity is at:
- The model layer (LLM / foundation models)
- The application layer (Agent / Copilot)
But not at:
- The scheduling layer
- The resource layer
- The runtime layer
2) Or, the Barrier to Entry Is Very High
Because at its core, this is:
The intersection of Cloud Native × GPU × AI workload
It’s not just “wrapping AI,” but a fundamental re-architecture at the infrastructure level.
My Take
If we break down the AI technology stack:
Agent / Application
↓
LLM Serving (vLLM, etc.)
↓
AI Runtime / Scheduling
↓
GPU Resource Layer
↓
Hardware
Most innovation today is concentrated in:
- The top two layers (Agent / LLM)
But the real long-term moat lies in:
- The middle two layers (Runtime + Resource Layer)
And Kubernetes is very likely to remain:
The default platform for this middle layer
Summary
Today’s takeaway:
Kubernetes is not obsolete; it’s being redefined.
And our generation is shifting from:
“Cloud Native Builders”
to:
“AI Infrastructure Builders”
More to come tomorrow.
