Why Start with Compute Governance, Not API Design

Compute and governance boundaries are the true foundation of AI-native infrastructure architecture.

The previous chapter presented a “Three Planes + One Closed Loop” reference architecture. This chapter focuses on a core CTO/CEO-level question:

How should AI-native infrastructure be layered? What belongs in the “control plane” of APIs/Agents, what belongs in the “execution plane” of runtime, and what must be pushed down to the “governance plane (compute and economic constraints)”?

This question is critical because over the past year, many platform companies “pivoting to AI” have fallen into a common trap: treating AI as an API morphology change rather than a system constraint change. When your system shifts from “serving requests” to “model behavior” (multi-step Agent actions with side effects), what truly determines system boundaries is often not the elegance of API design, but rather: whether compute, context, and economic constraints are institutionalized as enforceable governance boundaries.

The core argument of this chapter can be summarized as:

AI-native infrastructure must be designed starting from “Consequence” rather than stacking capabilities from “Intent”; the control plane is responsible for expressing intent, but the governance plane is responsible for bounding consequences.

The Purpose of Layering: Engineering the Binding Between “Intent” and “Resource Consequences”

In AI-native infrastructure, mechanisms like MCP, Agents, and Tool Calling enhance system capabilities while also introducing higher risks. These risks are not abstract “uncontrollability,” but rather engineering “unbudgetable consequences”:

Path explosion in behavior, long contexts, and multi-round reasoning bring long-tail resource consumption;
The same “intent” can lead to orders-of-magnitude differences in tokens, GPU time, and network/storage pressure;
Without governance closed loops, systems will move toward “cost and risk runaway” while becoming “more capable.”

Therefore, the fundamental purpose of layering is not abstract aesthetics, but achieving a hard constraint goal:

Ensure each layer can translate upper-layer “intent” into executable plans and produce measurable, attributable, and constrainable resource consequences.

In other words, layering is not about making architecture diagrams clearer, but about encoding “who expresses intent, who executes, and who bears consequences” into system structure.

AI-Native Infrastructure Five-Layer Structure and “Three Planes” Mapping

To help understand the layering logic, the diagram below refines the “Three Planes” architecture from the previous chapter, proposing a more actionable “five-layer structure”:

Figure 1: Layered governance relationship from intent to consequence

Top two layers = Intent Plane
Middle two layers = Execution Plane
Bottom layer = Governance Plane

Below is a detailed expansion of the five-layer architecture, showing the primary responsibilities and typical capabilities of each layer:

Figure 2: Five-layer architecture diagram

It is important to note that MCP belongs to Layer 4 (Intent and Orchestration Layer), not Layer 1. The reason is that MCP primarily defines “how capabilities are exposed to models/Agents and how they are invoked,” addressing control plane consistency and composability, but does not directly take responsibility for “how the resource consequences of capability invocations are metered, constrained, and attributed.”

MCP/Agent is the “New Control Plane,” But Must Be Constrained by the Governance Layer

MCP/Agent is called the “new control plane” because it moves system “decisions” from static code to dynamic processes:

“Tool catalogs + schemas + invocations” form a composable capability surface;
Agents complete tasks by selecting tools, invoking tools, and iterating reasoning;
“Policy” is no longer just in code branches but expressed as routing, priorities, budgets, and compliance intent.

However, it is crucial to emphasize an infrastructure stance, which is also the foundation of this chapter:

MCP/Agent can express intent, but the key to AI-native is: intent must be translated into governable execution plans and metered and constrained within economically viable boundaries.

This statement aims to correct two common misconceptions:

Control plane is not the starting point: Treating MCP/Agent as “the entry point for AI platform upgrades” easily leads systems down a “capability-first” path;
Governance plane is the baseline: When compute and tokens become capacity units, any unconstrained “intent expression” will leak as cost, latency, or risk.

Therefore, system layering should be clear: Layer 4 is responsible for “expression,” Layers 1/2/3 are responsible for “fulfillment and bearing consequences,” and the governance loop is responsible for “correction.”

“Context” Is Rising to a New Infrastructure Layer

In traditional cloud-native systems, request states are mostly short-lived, relying more on application-layer state management. Infrastructure typically only handles “computation and networking” without needing to understand the economic value of “request context.”

AI-native infrastructure is different. Long-context, multi-turn dialogue, and multi-agent reasoning mean inference state often survives across requests and directly determines throughput and cost. In particular, KV cache and context reuse are evolving from “performance optimization techniques” to “platform capacity structures.”

This can be summarized as an infrastructure law:

When a state asset (context/state) becomes a determinant variable of system cost and throughput, it rises from application detail to infrastructure layer.

This trend is gradually appearing in the industry: inference context and KV reuse are explicitly elevated to “infrastructure layer” capability development directions. Future expansion will include distributed KV, parameter caching, inference routing state, Agent memory, and a series of “state assets.”

The Foundation of AI-Native Infrastructure: Reference Designs and Delivery Systems

AI-native infrastructure is far more than “buying a few GPUs.” Compared to traditional internet services, AI workloads have three characteristics that make the “foundation” more engineered and productized:

Stronger topology dependencies: Network fabric, interconnects, storage tiers, and GPU affinity determine available throughput;
Harder scarcity constraints: GPU and token throughput boundaries are less “elastic” than CPU/memory;
Higher delivery complexity: Multi-cluster, multi-tenant, multi-model/multi-framework coexistence means only “replicable delivery” can scale.

Therefore, AI Infra is not just a component list, but must include “scalable delivery and repeatable operation” system capabilities:

Reference Designs (validated designs)

Codify “correct topology and ratios” into reusable solutions.

Automated Delivery

Institutionalize deployment, upgrade, scaling, rollback, and capacity planning.

Governance Implementation

Make budgeting, isolation, metering, and auditing default capabilities rather than after-the-fact patches.

From a CTO/CEO perspective, this means: what you purchase is not “hardware” but a “delivery system for predictable capacity.”

“Layered Responsibility Boundaries” from a CTO/CEO Perspective

To facilitate internal alignment on “who is responsible for what and what is the cost of failure,” the table below maps “technical layers” to “organizational responsibilities,” avoiding the scenario where platform teams only build control planes while no one bears consequence boundaries.

Layer	Typical Capabilities	Primary Owner (Recommended)	Cost of Failure
Layer 5 Business Interface	SLA, product experience, business goals	Product / Business	Customer experience and revenue impact
Layer 4 Intent/Orchestration (MCP/Agent)	Capability catalogs, workflow, policy expression	App / Platform / AI Eng	Behavior runaway, tool abuse
Layer 3 Execution (Runtime)	Serving, batching, routing, caching policies	AI Platform / Infra	Insufficient throughput, latency jitter
Layer 2 Context/State	KV/cache/context tier	Infra + AI Platform	Token cost spike, throughput collapse
Layer 1 Compute/Governance	Quotas, isolation, topology scheduling, metering	Infra / FinOps / SRE	Budget explosion, resource contention, incident spillover

Table 1: AI-Native Infrastructure Layer and Organizational Responsibility Mapping

As you can see, the organizational challenge of AI-native is not in “whether we have agents,” but in “whether inter-layer closed loops are established”. When model-driven amplification of consequences occurs, organizations must institutionalize governance mechanisms as platform capabilities: executable budgets, explainable consequences, attributable anomalies, and rewritable policies. This is the true meaning of “starting from compute governance” rather than “starting from API design.”

Conclusion

The layered design of AI-native infrastructure centers on engineering the binding between “intent” and “resource consequences.” The control plane is responsible for expressing intent, while the governance plane is responsible for bounding consequences. Only by institutionalizing governance mechanisms as platform capabilities can we ensure cost, risk, and capacity remain controllable while enhancing capabilities. As context, state assets, and other new variables become infrastructure, AI Infra delivery systems will continue to evolve, becoming the foundation for sustainable enterprise innovation.

References

Created on Jan 18, 2026 Updated on Jan 18, 2026 1292 words about 7 Minute

Core Content

Core Content

Technology