The Second Half of Cloud Native: The Era of AI Native Platform Engineering Has Arrived

The second half of cloud native isn’t about being replaced by AI, but being rewritten by it. The future of platform engineering will revolve around models and agents, reshaping the tech stack and developer experience.

Since I first encountered Docker and Kubernetes in 2015, I’ve followed the cloud native journey: from writing Deployments in YAML, to exploring Service Mesh and observability, and in recent years, focusing on AI Infra and AI Native platforms. Looking back from 2025, the years 2015–2025 can be seen as the “first half” of cloud native. Marked by KubeCon / CloudNativeCon NA 2025 , the industry is collectively entering the “second half”: the era of AI Native platform engineering.

This article reviews the past decade of cloud native, and, combined with KubeCon NA 2025, outlines key turning points and the technical coordinates for the next ten years.

2015–2025: The “First Half” of Cloud Native

Over the past decade, cloud native technology themes have evolved through three main stages. The following flowchart illustrates the progression.

Figure 1: Cloud Native Decade Technology Evolution Flow

The first stage focused on containerization and orchestration standardization.

Docker realized the engineering dream of “build once, run anywhere”
Kubernetes won the orchestration wars and became the de facto standard
CNCF was founded, with Prometheus, Envoy, and other projects joining
Enterprises focused on migrating applications to Kubernetes

Typical tasks during this phase involved moving Java services from VMs to containers and K8s, emphasizing understanding of Deployment, Service, and Ingress.

The second stage, 2018–2020, saw complexity shift from “deployment” to “communication” and “operations”.

Service Mesh (Istio / Linkerd / Consul) addressed east-west traffic management
The observability trio (Logs / Metrics / Traces) became default configurations
Multi-cluster and multi-region practices matured
Enterprises focused on managing large microservice systems

During this period, I spent significant time researching Istio, service mesh, and traffic management, and authored Kubernetes and Istio books. The focus shifted to system stability, observability, and reliability.

The third stage, 2021–2025, is defined by Platform Engineering and GitOps.

As microservices and tools proliferated, platform complexity began to overwhelm developers, making Platform Engineering a key industry term.

GitOps (Argo CD / Flux) drove declarative delivery processes
Internal Developer Platforms (IDP) became priorities for large enterprises
“Platform as a product” philosophy spread
FinOps, cost management, and compliance auditing became platform concerns
DevOps evolved from “tool practice” to “organizational + platform capability”

My takeaway: simply giving developers a pile of tools isn’t enough. End-to-end delivery paths and stable abstraction layers are needed so developers can focus on business, not tool integration.

The table below summarizes the main features of each “first half” stage.

Stage	Core Challenge	Key Tech Stack	Typical Issues
2015–2017 Orchestration	Migrating from VM to containers	Docker, Kubernetes, CNI	Reliable deployment, rolling upgrades
2018–2020 Mesh	Microservice scale, complex communication & observability	Istio/Linkerd, Prometheus, Jaeger	Troubleshooting, fragmented observability
2021–2025 Platform	Tool sprawl, declining developer experience	GitOps, IDP, FinOps, Policy-as-Code	Developer fatigue, platform team overload

Table 1: Cloud Native First Half Stage Features

KubeCon NA 2025: Signals of Cloud Native’s “Second Half”

The main theme of KubeCon 2025 is no longer “how to use Kubernetes well,” but how to reconstruct Kubernetes and the cloud native ecosystem into AI Native platforms for the AI era.

Key signals from KubeCon NA 2025 include:

CNCF released the Certified Kubernetes AI Conformance Program
Dynamic Resource Allocation (DRA) entered mainstream discussions
Model Runtime / Agent Runtime projects became conference hotspots
Vendors focused on AI SRE, AI-assisted development, AI security, and supply chain governance
Speakers like Alex Zenla openly stated that Kubernetes’ underlying structure needs rethinking

Together, these mark a clear dividing line: cloud native has officially entered its “second half.”

First Half vs Second Half: Shifting the Cloud Native Narrative

If 2015–2025 is the “first half,” then 2025–2035 is likely the “second half.” The table below compares their core differences.

It highlights changes in platform objects, goals, abstraction layers, and more.

Dimension	First Half (2015–2025)	Second Half (2025–2035, AI Native)
Core Objects	Containers, Pods, Microservices	Models, inference tasks, Agents, data pipelines
Platform Goals	Stable application delivery	Efficient, continuous AI workload & agent orchestration
Abstraction Layers	Deployment / Service / Ingress / Job	Model / Endpoint / Graph / Policy / Agent
Resource Scheduling	CPU / Memory / Node	GPU / TPU / ASIC / KV Cache / Bandwidth / Power
Engineering Focus	DevOps / GitOps / Platform Engineering 1.0	AI Native Platform Engineering / AI SRE
Security & Compliance	Image security, CVE, supply chain SBOM	Model security, data security, AI supply chain & “hallucination dependencies”
Runtime Forms	Container + VM + Serverless	Container + WASM + Nix + Agent Runtime

Table 2: Core Differences: First vs Second Half of Cloud Native

From a developer’s perspective, the most direct change is: future platforms will no longer treat “services” as first-class citizens, but will center on “models + agents.”

Example: Technical Layers of an AI Native Platform

To clarify the structure of an AI Native platform, the following layered diagram shows the relationships between technical levels.

Figure 2: AI Native Platform Layering Diagram

Historically, cloud native focused on L0 + L2 (Kubernetes + platform engineering), but in the AI Native era, L1 (Model Runtime, Agent Runtime, heterogeneous resource scheduling) becomes the new battleground.

Key Change 1: From “Container-Centric” to “Model-Centric”

In the first half, cloud native’s main object was the application process, with containers as packaging. The second half requires handling:

Model version management and canary releases
Balancing inference performance, latency, and cost
Multi-model composition, routing, A/B testing
Relationships between models, data, features, and vector indexes

At KubeCon NA 2025, CNCF’s AI Conformance Program aims to standardize model workloads, managing them like Deployments. Platform engineering will gain new abstractions—not just “deploying services,” but “deploying model capabilities.”

Key Change 2: DRA and the Golden Window for Heterogeneous Resource Scheduling

Previously, writing a Deployment meant focusing on CPU and memory. Now, GPU inference, training, and Agent Runtime scenarios demand more than static quotas.

Dynamic Resource Allocation (DRA) brings:

Pluggable resource types (GPU/TPU/FPGA/ASIC)
Topology-aware, NUMA, and memory fragmentation scheduling
Binding inference requests to compute allocation for fine-grained QoS
Cost optimization and power control in scheduling decisions

This is the most significant “resource perspective” upgrade since Kubernetes’ inception. The scheduler is no longer just a cluster component, but the AI platform’s policy engine.

Key Change 3: Agent Runtime as the New Generation of Runtime

KubeCon showcased several representative projects:

Edera : Minimal, verifiable runtime redesign
Flox : Nix-based “uncontained” runtime environment
Golem : WASM-based large-scale agent orchestration

The consensus: AI agents aren’t suited to traditional container runtime models. Agents have these traits:

Strong statefulness: context, memory, sessions
High concurrency but fine granularity: massive lightweight tasks
Extremely sensitive to latency and cold starts
Need to resume after failure

Next-gen runtimes focus on reliably executing, managing state, and auditing “hundreds of thousands of agents,” not just “spinning up more Pods.”

Key Change 4: AI SRE and AI Security

At KubeCon NA 2025, security and operations topics were amplified by AI:

Software supply chain attacks and CVEs continue to rise
LLM-assisted coding introduces “hallucination dependencies” and “vibecoded vulnerabilities”
AI-driven artifact scanning, dependency auditing, and license analysis
“AI SRE” is now a formal product category

Traditional cloud native already emphasized security and SRE, but now must address model weights, datasets, vector stores, and agent workflows. AI Native platform engineering must answer:

Are code and dependencies secure?
Are models and data trustworthy?
Are agent behaviors controllable?

This will drive deep integration of Policy-as-Code, MCP, graph permission systems, and AI.

Key Change 5: Open Source Participation Becomes a Baseline

In interviews, platform engineering leaders noted:

Hiring increasingly values upstream contributions to Kubernetes and related projects
Open source involvement shortens ramp-up time
New AI Native projects (Model Runtime, Agent Runtime, Scheduler) are also open source

For career growth, contributing to AI Native open source projects will become a basic requirement for platform engineering and AI Infra roles—not just a resume bonus.

The Contours of Cloud Native’s “Second Half”

The table below summarizes the technical focus and essential differences of the “second half.”

It highlights the key coordinates of AI Native platform engineering.

Direction	Technical Focus	Essential Difference from First Half
AI Native Platform	Models/Agents as first-class citizens, unified abstraction & governance	Objects shift from services to models & inference
Resource Scheduling	DRA, heterogeneous compute, topology awareness, power & cost	From static quotas to dynamic, policy-driven
Runtime	Container + WASM + Nix + Agent Runtime	From “process containerization” to “execution graph containerization”
Platform Engineering	IDP + AI SRE + Security + Cost + Compliance	From toolset to “autonomous platform”
Security & Supply Chain	LLM dependencies, model weights, datasets, vector store governance	Protection expands from images to “all AI engineering assets”
Open Source & Ecosystem	AI Infra / Model Runtime / Agent Runtime upstream collaboration	Not just “using open source,” but “building the future in open source”

Table 3: Cloud Native Second Half Technical Coordinates

Summary

Over the past decade, cloud native evolved from container orchestration to platform engineering 1.0. With KubeCon NA 2025 as a milestone, the industry systematically brings AI into cloud native technology and organizational stacks:

Kubernetes is no longer just “infrastructure for microservices,” but “runtime for AI workloads”
Platform Engineering is no longer just “tool integration,” but “autonomous platforms for models and agents”
Security, SRE, runtime, scheduling, and networking will all be reimagined under AI

For me, the past ten years were about “making applications more stable in the cloud native world.” The next ten will focus on “making AI better, safer, and more controllable in the cloud native world.” This is, in my view, the opening whistle for cloud native’s “second half.”