The second half of cloud native isn’t about being replaced by AI, but being rewritten by it. The future of platform engineering will revolve around models and agents, reshaping the tech stack and developer experience.
Since I first encountered Docker and Kubernetes in 2015, I’ve followed the cloud native journey: from writing Deployments in YAML, to exploring Service Mesh and observability, and in recent years, focusing on AI Infra and AI Native platforms. Looking back from 2025, the years 2015–2025 can be seen as the “first half” of cloud native. Marked by KubeCon / CloudNativeCon NA 2025 , the industry is collectively entering the “second half”: the era of AI Native platform engineering.
This article reviews the past decade of cloud native, and, combined with KubeCon NA 2025, outlines key turning points and the technical coordinates for the next ten years.
2015–2025: The “First Half” of Cloud Native
Over the past decade, cloud native technology themes have evolved through three main stages. The following flowchart illustrates the progression.
The first stage focused on containerization and orchestration standardization.
- Docker realized the engineering dream of “build once, run anywhere”
- Kubernetes won the orchestration wars and became the de facto standard
- CNCF was founded, with Prometheus, Envoy, and other projects joining
- Enterprises focused on migrating applications to Kubernetes
Typical tasks during this phase involved moving Java services from VMs to containers and K8s, emphasizing understanding of Deployment, Service, and Ingress.
The second stage, 2018–2020, saw complexity shift from “deployment” to “communication” and “operations”.
- Service Mesh (Istio / Linkerd / Consul) addressed east-west traffic management
- The observability trio (Logs / Metrics / Traces) became default configurations
- Multi-cluster and multi-region practices matured
- Enterprises focused on managing large microservice systems
During this period, I spent significant time researching Istio, service mesh, and traffic management, and authored Kubernetes and Istio books. The focus shifted to system stability, observability, and reliability.
The third stage, 2021–2025, is defined by Platform Engineering and GitOps.
As microservices and tools proliferated, platform complexity began to overwhelm developers, making Platform Engineering a key industry term.
- GitOps (Argo CD / Flux) drove declarative delivery processes
- Internal Developer Platforms (IDP) became priorities for large enterprises
- “Platform as a product” philosophy spread
- FinOps, cost management, and compliance auditing became platform concerns
- DevOps evolved from “tool practice” to “organizational + platform capability”
My takeaway: simply giving developers a pile of tools isn’t enough. End-to-end delivery paths and stable abstraction layers are needed so developers can focus on business, not tool integration.
The table below summarizes the main features of each “first half” stage.
| Stage | Core Challenge | Key Tech Stack | Typical Issues |
|---|---|---|---|
| 2015–2017 Orchestration | Migrating from VM to containers | Docker, Kubernetes, CNI | Reliable deployment, rolling upgrades |
| 2018–2020 Mesh | Microservice scale, complex communication & observability | Istio/Linkerd, Prometheus, Jaeger | Troubleshooting, fragmented observability |
| 2021–2025 Platform | Tool sprawl, declining developer experience | GitOps, IDP, FinOps, Policy-as-Code | Developer fatigue, platform team overload |
KubeCon NA 2025: Signals of Cloud Native’s “Second Half”
The main theme of KubeCon 2025 is no longer “how to use Kubernetes well,” but how to reconstruct Kubernetes and the cloud native ecosystem into AI Native platforms for the AI era.
Key signals from KubeCon NA 2025 include:
- CNCF released the Certified Kubernetes AI Conformance Program
- Dynamic Resource Allocation (DRA) entered mainstream discussions
- Model Runtime / Agent Runtime projects became conference hotspots
- Vendors focused on AI SRE, AI-assisted development, AI security, and supply chain governance
- Speakers like Alex Zenla openly stated that Kubernetes’ underlying structure needs rethinking
Together, these mark a clear dividing line: cloud native has officially entered its “second half.”
First Half vs Second Half: Shifting the Cloud Native Narrative
If 2015–2025 is the “first half,” then 2025–2035 is likely the “second half.” The table below compares their core differences.
It highlights changes in platform objects, goals, abstraction layers, and more.
| Dimension | First Half (2015–2025) | Second Half (2025–2035, AI Native) |
|---|---|---|
| Core Objects | Containers, Pods, Microservices | Models, inference tasks, Agents, data pipelines |
| Platform Goals | Stable application delivery | Efficient, continuous AI workload & agent orchestration |
| Abstraction Layers | Deployment / Service / Ingress / Job | Model / Endpoint / Graph / Policy / Agent |
| Resource Scheduling | CPU / Memory / Node | GPU / TPU / ASIC / KV Cache / Bandwidth / Power |
| Engineering Focus | DevOps / GitOps / Platform Engineering 1.0 | AI Native Platform Engineering / AI SRE |
| Security & Compliance | Image security, CVE, supply chain SBOM | Model security, data security, AI supply chain & “hallucination dependencies” |
| Runtime Forms | Container + VM + Serverless | Container + WASM + Nix + Agent Runtime |
From a developer’s perspective, the most direct change is: future platforms will no longer treat “services” as first-class citizens, but will center on “models + agents.”
Example: Technical Layers of an AI Native Platform
To clarify the structure of an AI Native platform, the following layered diagram shows the relationships between technical levels.
Historically, cloud native focused on L0 + L2 (Kubernetes + platform engineering), but in the AI Native era, L1 (Model Runtime, Agent Runtime, heterogeneous resource scheduling) becomes the new battleground.
Key Change 1: From “Container-Centric” to “Model-Centric”
In the first half, cloud native’s main object was the application process, with containers as packaging. The second half requires handling:
- Model version management and canary releases
- Balancing inference performance, latency, and cost
- Multi-model composition, routing, A/B testing
- Relationships between models, data, features, and vector indexes
At KubeCon NA 2025, CNCF’s AI Conformance Program aims to standardize model workloads, managing them like Deployments. Platform engineering will gain new abstractions—not just “deploying services,” but “deploying model capabilities.”
Key Change 2: DRA and the Golden Window for Heterogeneous Resource Scheduling
Previously, writing a Deployment meant focusing on CPU and memory. Now, GPU inference, training, and Agent Runtime scenarios demand more than static quotas.
Dynamic Resource Allocation (DRA) brings:
- Pluggable resource types (GPU/TPU/FPGA/ASIC)
- Topology-aware, NUMA, and memory fragmentation scheduling
- Binding inference requests to compute allocation for fine-grained QoS
- Cost optimization and power control in scheduling decisions
This is the most significant “resource perspective” upgrade since Kubernetes’ inception. The scheduler is no longer just a cluster component, but the AI platform’s policy engine.
Key Change 3: Agent Runtime as the New Generation of Runtime
KubeCon showcased several representative projects:
- Edera : Minimal, verifiable runtime redesign
- Flox : Nix-based “uncontained” runtime environment
- Golem : WASM-based large-scale agent orchestration
The consensus: AI agents aren’t suited to traditional container runtime models. Agents have these traits:
- Strong statefulness: context, memory, sessions
- High concurrency but fine granularity: massive lightweight tasks
- Extremely sensitive to latency and cold starts
- Need to resume after failure
Next-gen runtimes focus on reliably executing, managing state, and auditing “hundreds of thousands of agents,” not just “spinning up more Pods.”
Key Change 4: AI SRE and AI Security
At KubeCon NA 2025, security and operations topics were amplified by AI:
- Software supply chain attacks and CVEs continue to rise
- LLM-assisted coding introduces “hallucination dependencies” and “vibecoded vulnerabilities”
- AI-driven artifact scanning, dependency auditing, and license analysis
- “AI SRE” is now a formal product category
Traditional cloud native already emphasized security and SRE, but now must address model weights, datasets, vector stores, and agent workflows. AI Native platform engineering must answer:
- Are code and dependencies secure?
- Are models and data trustworthy?
- Are agent behaviors controllable?
This will drive deep integration of Policy-as-Code, MCP, graph permission systems, and AI.
Key Change 5: Open Source Participation Becomes a Baseline
In interviews, platform engineering leaders noted:
- Hiring increasingly values upstream contributions to Kubernetes and related projects
- Open source involvement shortens ramp-up time
- New AI Native projects (Model Runtime, Agent Runtime, Scheduler) are also open source
For career growth, contributing to AI Native open source projects will become a basic requirement for platform engineering and AI Infra roles—not just a resume bonus.
The Contours of Cloud Native’s “Second Half”
The table below summarizes the technical focus and essential differences of the “second half.”
It highlights the key coordinates of AI Native platform engineering.
| Direction | Technical Focus | Essential Difference from First Half |
|---|---|---|
| AI Native Platform | Models/Agents as first-class citizens, unified abstraction & governance | Objects shift from services to models & inference |
| Resource Scheduling | DRA, heterogeneous compute, topology awareness, power & cost | From static quotas to dynamic, policy-driven |
| Runtime | Container + WASM + Nix + Agent Runtime | From “process containerization” to “execution graph containerization” |
| Platform Engineering | IDP + AI SRE + Security + Cost + Compliance | From toolset to “autonomous platform” |
| Security & Supply Chain | LLM dependencies, model weights, datasets, vector store governance | Protection expands from images to “all AI engineering assets” |
| Open Source & Ecosystem | AI Infra / Model Runtime / Agent Runtime upstream collaboration | Not just “using open source,” but “building the future in open source” |
Summary
Over the past decade, cloud native evolved from container orchestration to platform engineering 1.0. With KubeCon NA 2025 as a milestone, the industry systematically brings AI into cloud native technology and organizational stacks:
- Kubernetes is no longer just “infrastructure for microservices,” but “runtime for AI workloads”
- Platform Engineering is no longer just “tool integration,” but “autonomous platforms for models and agents”
- Security, SRE, runtime, scheduling, and networking will all be reimagined under AI
For me, the past ten years were about “making applications more stable in the cloud native world.” The next ten will focus on “making AI better, safer, and more controllable in the cloud native world.” This is, in my view, the opening whistle for cloud native’s “second half.”