A curated list of AI tools and resources for developers, see the AI Resources .

The Second Half of Cloud Native: The Era of AI Native Platform Engineering Has Arrived

A decade of cloud native evolution, a look ahead to AI Native platform engineering, technical layers, and key changes. KubeCon NA 2025 signals a new era.

The second half of cloud native isn’t about being replaced by AI, but being rewritten by it. The future of platform engineering will revolve around models and agents, reshaping the tech stack and developer experience.

Since I first encountered Docker and Kubernetes in 2015, I’ve followed the cloud native journey: from writing Deployments in YAML, to exploring Service Mesh and observability, and in recent years, focusing on AI Infra and AI Native platforms. Looking back from 2025, the years 2015–2025 can be seen as the “first half” of cloud native. Marked by KubeCon / CloudNativeCon NA 2025 , the industry is collectively entering the “second half”: the era of AI Native platform engineering.

This article reviews the past decade of cloud native, and, combined with KubeCon NA 2025, outlines key turning points and the technical coordinates for the next ten years.

2015–2025: The “First Half” of Cloud Native

Over the past decade, cloud native technology themes have evolved through three main stages. The following flowchart illustrates the progression.

Figure 1: Cloud Native Decade Technology Evolution Flow
Figure 1: Cloud Native Decade Technology Evolution Flow

The first stage focused on containerization and orchestration standardization.

  • Docker realized the engineering dream of “build once, run anywhere”
  • Kubernetes won the orchestration wars and became the de facto standard
  • CNCF was founded, with Prometheus, Envoy, and other projects joining
  • Enterprises focused on migrating applications to Kubernetes

Typical tasks during this phase involved moving Java services from VMs to containers and K8s, emphasizing understanding of Deployment, Service, and Ingress.

The second stage, 2018–2020, saw complexity shift from “deployment” to “communication” and “operations”.

  • Service Mesh (Istio / Linkerd / Consul) addressed east-west traffic management
  • The observability trio (Logs / Metrics / Traces) became default configurations
  • Multi-cluster and multi-region practices matured
  • Enterprises focused on managing large microservice systems

During this period, I spent significant time researching Istio, service mesh, and traffic management, and authored Kubernetes and Istio books. The focus shifted to system stability, observability, and reliability.

The third stage, 2021–2025, is defined by Platform Engineering and GitOps.

As microservices and tools proliferated, platform complexity began to overwhelm developers, making Platform Engineering a key industry term.

  • GitOps (Argo CD / Flux) drove declarative delivery processes
  • Internal Developer Platforms (IDP) became priorities for large enterprises
  • “Platform as a product” philosophy spread
  • FinOps, cost management, and compliance auditing became platform concerns
  • DevOps evolved from “tool practice” to “organizational + platform capability”

My takeaway: simply giving developers a pile of tools isn’t enough. End-to-end delivery paths and stable abstraction layers are needed so developers can focus on business, not tool integration.

The table below summarizes the main features of each “first half” stage.

StageCore ChallengeKey Tech StackTypical Issues
2015–2017 OrchestrationMigrating from VM to containersDocker, Kubernetes, CNIReliable deployment, rolling upgrades
2018–2020 MeshMicroservice scale, complex communication & observabilityIstio/Linkerd, Prometheus, JaegerTroubleshooting, fragmented observability
2021–2025 PlatformTool sprawl, declining developer experienceGitOps, IDP, FinOps, Policy-as-CodeDeveloper fatigue, platform team overload
Table 1: Cloud Native First Half Stage Features

KubeCon NA 2025: Signals of Cloud Native’s “Second Half”

The main theme of KubeCon 2025 is no longer “how to use Kubernetes well,” but how to reconstruct Kubernetes and the cloud native ecosystem into AI Native platforms for the AI era.

Key signals from KubeCon NA 2025 include:

  • CNCF released the Certified Kubernetes AI Conformance Program
  • Dynamic Resource Allocation (DRA) entered mainstream discussions
  • Model Runtime / Agent Runtime projects became conference hotspots
  • Vendors focused on AI SRE, AI-assisted development, AI security, and supply chain governance
  • Speakers like Alex Zenla openly stated that Kubernetes’ underlying structure needs rethinking

Together, these mark a clear dividing line: cloud native has officially entered its “second half.”

First Half vs Second Half: Shifting the Cloud Native Narrative

If 2015–2025 is the “first half,” then 2025–2035 is likely the “second half.” The table below compares their core differences.

It highlights changes in platform objects, goals, abstraction layers, and more.

DimensionFirst Half (2015–2025)Second Half (2025–2035, AI Native)
Core ObjectsContainers, Pods, MicroservicesModels, inference tasks, Agents, data pipelines
Platform GoalsStable application deliveryEfficient, continuous AI workload & agent orchestration
Abstraction LayersDeployment / Service / Ingress / JobModel / Endpoint / Graph / Policy / Agent
Resource SchedulingCPU / Memory / NodeGPU / TPU / ASIC / KV Cache / Bandwidth / Power
Engineering FocusDevOps / GitOps / Platform Engineering 1.0AI Native Platform Engineering / AI SRE
Security & ComplianceImage security, CVE, supply chain SBOMModel security, data security, AI supply chain & “hallucination dependencies”
Runtime FormsContainer + VM + ServerlessContainer + WASM + Nix + Agent Runtime
Table 2: Core Differences: First vs Second Half of Cloud Native

From a developer’s perspective, the most direct change is: future platforms will no longer treat “services” as first-class citizens, but will center on “models + agents.”

Example: Technical Layers of an AI Native Platform

To clarify the structure of an AI Native platform, the following layered diagram shows the relationships between technical levels.

Figure 2: AI Native Platform Layering Diagram
Figure 2: AI Native Platform Layering Diagram

Historically, cloud native focused on L0 + L2 (Kubernetes + platform engineering), but in the AI Native era, L1 (Model Runtime, Agent Runtime, heterogeneous resource scheduling) becomes the new battleground.

Key Change 1: From “Container-Centric” to “Model-Centric”

In the first half, cloud native’s main object was the application process, with containers as packaging. The second half requires handling:

  • Model version management and canary releases
  • Balancing inference performance, latency, and cost
  • Multi-model composition, routing, A/B testing
  • Relationships between models, data, features, and vector indexes

At KubeCon NA 2025, CNCF’s AI Conformance Program aims to standardize model workloads, managing them like Deployments. Platform engineering will gain new abstractions—not just “deploying services,” but “deploying model capabilities.”

Key Change 2: DRA and the Golden Window for Heterogeneous Resource Scheduling

Previously, writing a Deployment meant focusing on CPU and memory. Now, GPU inference, training, and Agent Runtime scenarios demand more than static quotas.

Dynamic Resource Allocation (DRA) brings:

  • Pluggable resource types (GPU/TPU/FPGA/ASIC)
  • Topology-aware, NUMA, and memory fragmentation scheduling
  • Binding inference requests to compute allocation for fine-grained QoS
  • Cost optimization and power control in scheduling decisions

This is the most significant “resource perspective” upgrade since Kubernetes’ inception. The scheduler is no longer just a cluster component, but the AI platform’s policy engine.

Key Change 3: Agent Runtime as the New Generation of Runtime

KubeCon showcased several representative projects:

  • Edera : Minimal, verifiable runtime redesign
  • Flox : Nix-based “uncontained” runtime environment
  • Golem : WASM-based large-scale agent orchestration

The consensus: AI agents aren’t suited to traditional container runtime models. Agents have these traits:

  • Strong statefulness: context, memory, sessions
  • High concurrency but fine granularity: massive lightweight tasks
  • Extremely sensitive to latency and cold starts
  • Need to resume after failure

Next-gen runtimes focus on reliably executing, managing state, and auditing “hundreds of thousands of agents,” not just “spinning up more Pods.”

Key Change 4: AI SRE and AI Security

At KubeCon NA 2025, security and operations topics were amplified by AI:

  • Software supply chain attacks and CVEs continue to rise
  • LLM-assisted coding introduces “hallucination dependencies” and “vibecoded vulnerabilities”
  • AI-driven artifact scanning, dependency auditing, and license analysis
  • “AI SRE” is now a formal product category

Traditional cloud native already emphasized security and SRE, but now must address model weights, datasets, vector stores, and agent workflows. AI Native platform engineering must answer:

  1. Are code and dependencies secure?
  2. Are models and data trustworthy?
  3. Are agent behaviors controllable?

This will drive deep integration of Policy-as-Code, MCP, graph permission systems, and AI.

Key Change 5: Open Source Participation Becomes a Baseline

In interviews, platform engineering leaders noted:

  • Hiring increasingly values upstream contributions to Kubernetes and related projects
  • Open source involvement shortens ramp-up time
  • New AI Native projects (Model Runtime, Agent Runtime, Scheduler) are also open source

For career growth, contributing to AI Native open source projects will become a basic requirement for platform engineering and AI Infra roles—not just a resume bonus.

The Contours of Cloud Native’s “Second Half”

The table below summarizes the technical focus and essential differences of the “second half.”

It highlights the key coordinates of AI Native platform engineering.

DirectionTechnical FocusEssential Difference from First Half
AI Native PlatformModels/Agents as first-class citizens, unified abstraction & governanceObjects shift from services to models & inference
Resource SchedulingDRA, heterogeneous compute, topology awareness, power & costFrom static quotas to dynamic, policy-driven
RuntimeContainer + WASM + Nix + Agent RuntimeFrom “process containerization” to “execution graph containerization”
Platform EngineeringIDP + AI SRE + Security + Cost + ComplianceFrom toolset to “autonomous platform”
Security & Supply ChainLLM dependencies, model weights, datasets, vector store governanceProtection expands from images to “all AI engineering assets”
Open Source & EcosystemAI Infra / Model Runtime / Agent Runtime upstream collaborationNot just “using open source,” but “building the future in open source”
Table 3: Cloud Native Second Half Technical Coordinates

Summary

Over the past decade, cloud native evolved from container orchestration to platform engineering 1.0. With KubeCon NA 2025 as a milestone, the industry systematically brings AI into cloud native technology and organizational stacks:

  • Kubernetes is no longer just “infrastructure for microservices,” but “runtime for AI workloads”
  • Platform Engineering is no longer just “tool integration,” but “autonomous platforms for models and agents”
  • Security, SRE, runtime, scheduling, and networking will all be reimagined under AI

For me, the past ten years were about “making applications more stable in the cloud native world.” The next ten will focus on “making AI better, safer, and more controllable in the cloud native world.” This is, in my view, the opening whistle for cloud native’s “second half.”

Post Navigation

Comments