<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Jimmy Song – Jimmy Song's Blog</title><link>https://jimmysong.io/</link><description>Recent content in Jimmy Song's Blog on Jimmy Song</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>Jimmy Song</managingEditor><webMaster>Jimmy Song</webMaster><follow_challenge><feedId>51621818828612637</feedId><userId>59800919738273792</userId></follow_challenge><lastBuildDate>Fri, 13 Feb 2026 14:32:46 +0800</lastBuildDate><atom:link href="https://jimmysong.io/index.xml" rel="self" type="application/rss+xml"/><item><title>Core Model Overview</title><link>https://jimmysong.io/book/ai-infra-dao/model-overview/</link><pubDate>Tue, 10 Feb 2026 13:56:12 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/model-overview/</guid><description>A four-layer model (Yin-Yang, Five Elements, Yun, Qi) for understanding AI infrastructure as an evolving organic system</description><content:encoded>
&lt;p&gt;The &lt;strong&gt;Yin-Yang - Five Elements - Yun - Qi Model&lt;/strong&gt; views AI Infrastructure as an organic whole, revealing its operational mechanisms from four dimensions. Each layer focuses on different fundamental questions:&lt;/p&gt;
&lt;h2 id="four-layer-model"&gt;Four-Layer Model&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Focus Question&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yin-Yang&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;State Layer&lt;/td&gt;
&lt;td&gt;The system&amp;rsquo;s internal unity of opposites tension structure, revealing how dual elements like performance vs. constraints, innovation vs. governance coexist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Five Elements&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Role Layer&lt;/td&gt;
&lt;td&gt;Five basic role elements in the system and their collaborative relationships, breaking down complex infrastructure into data, models, compute, platforms, and hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yun&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Time Layer&lt;/td&gt;
&lt;td&gt;The development stage the system is in and its cyclical patterns, describing the evolution cycle from exploration to platformization, then scaling and rebalancing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flow Layer&lt;/td&gt;
&lt;td&gt;The effective &amp;ldquo;field&amp;rdquo; of flow within the system, characterizing the conduction and feedback of signals and resources, reflecting the overall smoothness of operation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Four-Layer Model
&lt;/figcaption&gt;
&lt;h2 id="model-interactions"&gt;Model Interactions&lt;/h2&gt;
&lt;p&gt;The four-layer model is not isolated but an interconnected organic whole:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The tension of &lt;strong&gt;Yin-Yang&lt;/strong&gt; permeates the dynamic balance of &lt;strong&gt;Five Elements&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The development of &lt;strong&gt;Five Elements&lt;/strong&gt; roles is constrained by their &lt;strong&gt;Yun&lt;/strong&gt; stage&lt;/li&gt;
&lt;li&gt;The flow of &lt;strong&gt;Qi&lt;/strong&gt; connects the above elements into a self-adaptive cyclic system&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The overview diagram below illustrates each layer of the model and their interactions:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/model-overview/aa67f442a69fb3a6e2b135543dffc745.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/model-overview/aa67f442a69fb3a6e2b135543dffc745.svg" alt="Figure 1: AI Infrastructure ‘Yin-Yang - Five Elements - Yun - Qi’ Model Overview. The Yin-Yang layer embodies the system’s internal tension and unity of opposites, the Five Elements layer defines core role elements, the Yun layer describes system stage cycles, and Qi as a flow element permeates and drives the entire system." data-caption="Figure 1: AI Infrastructure ‘Yin-Yang - Five Elements - Yun - Qi’ Model Overview. The Yin-Yang layer embodies the system’s internal tension and unity of opposites, the Five Elements layer defines core role elements, the Yun layer describes system stage cycles, and Qi as a flow element permeates and drives the entire system."
width="652"
height="1031"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI Infrastructure ‘Yin-Yang - Five Elements - Yun - Qi’ Model Overview. The Yin-Yang layer embodies the system’s internal tension and unity of opposites, the Five Elements layer defines core role elements, the Yun layer describes system stage cycles, and Qi as a flow element permeates and drives the entire system.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="model-application-value"&gt;Model Application Value&lt;/h2&gt;
&lt;p&gt;This four-layer model provides a unique perspective for the design, operations, and governance of AI Infrastructure:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Holistic Cognitive Framework&lt;/strong&gt;: Transcend the limitations of single technical metrics to grasp system state as a whole&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic Balance Thinking&lt;/strong&gt;: Understand unity of opposites relationships and avoid extremes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Evolutionary Stage Awareness&lt;/strong&gt;: Grasp the system&amp;rsquo;s development stage and act in accordance with the situation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flow Insights&lt;/strong&gt;: Focus on energy flow within the system to anticipate problems&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Next, we will delve into the connotation, engineering mapping, and mechanism of each layer.&lt;/p&gt;</content:encoded></item><item><title>The Yin-Yang Layer: Dynamic Balance of System States</title><link>https://jimmysong.io/book/ai-infra-dao/yin-yang/</link><pubDate>Tue, 10 Feb 2026 13:56:33 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/yin-yang/</guid><description>Understanding system tensions: expansion vs. constraint, innovation vs. governance, speed vs. stability in AI infrastructure</description><content:encoded>
&lt;p&gt;&lt;strong&gt;Yin-Yang&lt;/strong&gt; is originally a fundamental concept in Chinese philosophy, representing two opposing yet interdependent forces present in all things in the universe. Everything in the world can be classified as either Yin or Yang, and their continuous movement and change generate the various transformations we observe. In the context of systems, Yin-Yang represents the unity of opposites through &lt;strong&gt;tension&lt;/strong&gt;—a pair of attributes or tendencies that pull against yet depend on each other.&lt;/p&gt;
&lt;h2 id="three-typical-pairs-of-yin-yang-tensions"&gt;Three Typical Pairs of Yin-Yang Tensions&lt;/h2&gt;
&lt;p&gt;In AI infrastructure, we identify three typical pairs of &lt;strong&gt;Yin-Yang tensions&lt;/strong&gt;:&lt;/p&gt;
&lt;h2 id="expansion--constraint"&gt;Expansion ↔ Constraint&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Expansion ↔ Constraint&lt;/strong&gt;: The tension between &lt;strong&gt;growth&lt;/strong&gt; trends and &lt;strong&gt;limiting&lt;/strong&gt; forces.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Yang (Expansion)&lt;/strong&gt;: System expansion speed, such as continuously adding tasks and scaling resources&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Yin (Constraint)&lt;/strong&gt;: Limiting forces, such as cost controls, regulatory constraints, and hardware limits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;System expansion speed and constraint intensity always coexist. For example, continuously adding tasks and scaling resources in GPU clusters (the Yang of expansion) is constrained by costs, regulations, or hardware limits (the Yin of constraint).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Imbalance manifestations&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pursuing expansion without regard for constraints → Resource contention and crashes&lt;/li&gt;
&lt;li&gt;Excessive constraint → Stifling system vitality&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="innovation--governance"&gt;Innovation ↔ Governance&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Innovation ↔ Governance&lt;/strong&gt;: The tension between &lt;strong&gt;creative&lt;/strong&gt; capability and &lt;strong&gt;control&lt;/strong&gt; requirements.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Yang (Innovation)&lt;/strong&gt;: Technical innovation, introduction of new features&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Yin (Governance)&lt;/strong&gt;: Security reviews, rule-making&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The faster technical innovation progresses, the more easily governance gaps are exposed. For example, introducing new Agent features (innovation, Yang) may outpace security reviews and rule-making (governance, Yin), leading to potential risks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Imbalance manifestations&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Innovation outpaces governance → Potential security risks&lt;/li&gt;
&lt;li&gt;Excessively strict governance → Slowing innovation momentum&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="speed--stability"&gt;Speed ↔ Stability&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Speed ↔ Stability&lt;/strong&gt;: The tension between &lt;strong&gt;performance&lt;/strong&gt; advancement and &lt;strong&gt;reliable&lt;/strong&gt; operation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Yang (Speed)&lt;/strong&gt;: Performance improvements, increased throughput&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Yin (Stability)&lt;/strong&gt;: Reliable operation, system stability&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When we pursue speed improvements single-mindedly, the cost to stability will eventually manifest. For example, pushing GPU utilization to the limit during model training (speed, Yang) easily leads to more frequent failures or delays (decline in stability, Yin).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Imbalance manifestations&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extreme pursuit of speed → Decline in stability&lt;/li&gt;
&lt;li&gt;Excessive conservatism → Performance waste&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-art-of-yinyang-balance"&gt;The Art of Yin–Yang Balance&lt;/h2&gt;
&lt;p&gt;The Yin–Yang poles described above are not simple trade-offs where you choose one and sacrifice the other, but rather inherent relationships of unity of opposites in systems. Both Yin and Yang sides are opposed yet complementary, neither can be dispensed with:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Expansion without constraints is difficult to sustain, constraints without expansion lose meaning&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As the ancient saying goes, &amp;ldquo;One Yin and one Yang constitute the Way&amp;rdquo; (一阴一阳之谓道). Balancing Yin and Yang is the &amp;ldquo;Way&amp;rdquo; of healthy system operation. For architects, the key lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Insight into dominant tensions&lt;/strong&gt;: Determine which pair of tensions is currently dominant&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Introducing the opposite&lt;/strong&gt;: Introduce the complementary side at the right time to restore balance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic adjustment&lt;/strong&gt;: Dynamically transform based on changes in system environment and stage&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practical-cases"&gt;Practical Cases&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Case: GPU Cluster Expansion&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When the cluster is in a state of rapid expansion (Yang exuberant, Yin deficient):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Add scheduling policies and resource quotas (supplement Yin)&lt;/li&gt;
&lt;li&gt;✓ Establish cost control mechanisms (supplement Yin)&lt;/li&gt;
&lt;li&gt;✗ Do not pursue expansion speed single-mindedly&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Case: Agent Feature Innovation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When introducing new Agent features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Simultaneously establish monitoring and sandboxing mechanisms (supplement Yin)&lt;/li&gt;
&lt;li&gt;✓ Improve security review processes (supplement Yin)&lt;/li&gt;
&lt;li&gt;✗ Do not let innovation outpace governance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Case: Model Training Performance Optimization&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When optimizing model training performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Simultaneously strengthen fault tolerance mechanisms and testing (supplement the Yin of stability)&lt;/li&gt;
&lt;li&gt;✓ Set performance baselines and rollback mechanisms (supplement Yin)&lt;/li&gt;
&lt;li&gt;✗ Do not infinitely compress fault tolerance time&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="dynamic-transformation-of-yinyang-states"&gt;Dynamic Transformation of Yin–Yang States&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s important to note that Yin–Yang states are not static and unchanging, but dynamically transform with system environment and stage.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The same capability may transform from an advantage to a risk at different stages&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For example, a &amp;ldquo;rapid development&amp;rdquo; strategy that drives rapid iteration during the startup stage, if applied without restraint during the scaling stage, can instead become a major threat to stability.&lt;/p&gt;
&lt;p&gt;The analysis of the Yin–Yang layer reminds us to constantly pay attention to the ebb and flow of these opposing forces, and to keep the system in a state of elastic tension through adjustments, rather than snapping or becoming slack and ineffective.&lt;/p&gt;</content:encoded></item><item><title>Five Elements Layer: Classification and Collaboration of System Roles</title><link>https://jimmysong.io/book/ai-infra-dao/five-elements/</link><pubDate>Tue, 10 Feb 2026 13:56:06 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/five-elements/</guid><description>Five system roles: data, models, compute, platforms, and hardware—how they interact and balance in AI infrastructure</description><content:encoded>
&lt;p&gt;&lt;strong&gt;Five Elements (Wǔxíng, Five Elements or Five Phases) theory&lt;/strong&gt; divides everything in the world into five basic elements: Wood, Fire, Earth, Metal, Water. Each element represents a fundamental attribute or functional role, with the five elements generating and overcoming each other in an endless cycle.&lt;/p&gt;
&lt;p&gt;In AI infrastructure, we use &amp;ldquo;Five Elements&amp;rdquo; to characterize the system&amp;rsquo;s five core elements and their responsibilities:&lt;/p&gt;
&lt;h2 id="engineering-mapping-of-five-elements"&gt;Engineering Mapping of Five Elements&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Five Elements&lt;/th&gt;
&lt;th&gt;Symbol&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Engineering Correspondence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Water&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🌊&lt;/td&gt;
&lt;td&gt;Flow and containment&lt;/td&gt;
&lt;td&gt;Data flow and quality: data pipelines, data assets, and quality control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wood&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🌲&lt;/td&gt;
&lt;td&gt;Growth and creation&lt;/td&gt;
&lt;td&gt;Model growth and capability expansion: model architecture iteration, parameter scale expansion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fire&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🔥&lt;/td&gt;
&lt;td&gt;Energy and execution&lt;/td&gt;
&lt;td&gt;Compute conversion and work efficiency: GPU/TPU computing, job scheduling efficiency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Earth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🏔️&lt;/td&gt;
&lt;td&gt;Support and stability&lt;/td&gt;
&lt;td&gt;Platform support and orchestration governance: distributed coordination, middleware, scheduling systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚙️&lt;/td&gt;
&lt;td&gt;Strength and standardization&lt;/td&gt;
&lt;td&gt;Hardware constraints and physical boundaries: GPU/CPU performance, storage capacity, network bandwidth&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Engineering Mapping of Five Elements
&lt;/figcaption&gt;
&lt;h2 id="water--data-flow-and-quality"&gt;Water – Data Flow and Quality&lt;/h2&gt;
&lt;p&gt;Corresponds to &lt;strong&gt;data pipelines, data assets, and quality control&lt;/strong&gt; in the system.&lt;/p&gt;
&lt;p&gt;Water symbolizes flow and containment, analogous to the circulation and nourishing role of data in the system, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training data acquisition&lt;/li&gt;
&lt;li&gt;Real-time data input&lt;/li&gt;
&lt;li&gt;Feedback signal transmission&lt;/li&gt;
&lt;li&gt;Data cleaning and quality assurance&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="wood--model-growth-and-capability-expansion"&gt;Wood – Model Growth and Capability Expansion&lt;/h2&gt;
&lt;p&gt;Corresponds to &lt;strong&gt;the evolution and growth of machine learning models and algorithms&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Wood represents growth and creation, mapped to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model architecture iteration&lt;/li&gt;
&lt;li&gt;Parameter scale expansion&lt;/li&gt;
&lt;li&gt;Cultivation of new capabilities&lt;/li&gt;
&lt;li&gt;Algorithm optimization and improvement&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="fire--compute-conversion-and-work-efficiency"&gt;Fire – Compute Conversion and Work Efficiency&lt;/h2&gt;
&lt;p&gt;Corresponds to &lt;strong&gt;computing processes and the utilization of compute resources&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Fire symbolizes energy and execution, reflected as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using GPU/TPU and other compute resources for calculation&lt;/li&gt;
&lt;li&gt;Converting electrical energy into model training and inference work&lt;/li&gt;
&lt;li&gt;Parallel computing capability&lt;/li&gt;
&lt;li&gt;Job scheduling efficiency&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="earth--platform-support-and-orchestration-governance"&gt;Earth – Platform Support and Orchestration Governance&lt;/h2&gt;
&lt;p&gt;Corresponds to &lt;strong&gt;the support and governance capabilities of the platform layer&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Earth represents support and stability, analogous to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Infrastructure platform support for upper-layer applications&lt;/li&gt;
&lt;li&gt;Distributed system coordination and orchestration&lt;/li&gt;
&lt;li&gt;Middleware services&lt;/li&gt;
&lt;li&gt;Scheduling systems and policy management&lt;/li&gt;
&lt;li&gt;Permission systems, service quality assurance&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="metal--hardware-constraints-and-physical-boundaries"&gt;Metal – Hardware Constraints and Physical Boundaries&lt;/h2&gt;
&lt;p&gt;Corresponds to &lt;strong&gt;underlying hardware and system hard limits&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Metal represents strength and standardization, mapped to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPU/CPU hardware performance&lt;/li&gt;
&lt;li&gt;Storage capacity&lt;/li&gt;
&lt;li&gt;Network bandwidth&lt;/li&gt;
&lt;li&gt;Physical conditions and hard rules (power consumption, safety specifications, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="five-elements-generation-relationships"&gt;Five Elements Generation Relationships&lt;/h2&gt;
&lt;p&gt;The Five Elements form a &lt;strong&gt;positive cycle&lt;/strong&gt; through &amp;ldquo;generation&amp;rdquo; relationships:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Data (Water) spawns model growth (Wood), model requirements stimulate compute investment (Fire), compute development drives platform thickening (Earth), platform capabilities utilize hardware to push the boundaries of (Metal), and hardware progress in turn supports greater data acquisition (Water)&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/five-elements/8c375bf76fabf035907532a52a97107e.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/five-elements/8c375bf76fabf035907532a52a97107e.svg" alt="Figure 1: Five Elements generation relationship diagram. Water generates Wood, Wood generates Fire, Fire generates Earth, Earth generates Metal, Metal generates Water, representing the mutually reinforcing cycle between data, models, compute, platforms, and hardware." data-caption="Figure 1: Five Elements generation relationship diagram. Water generates Wood, Wood generates Fire, Fire generates Earth, Earth generates Metal, Metal generates Water, representing the mutually reinforcing cycle between data, models, compute, platforms, and hardware."
width="1040"
height="217"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Five Elements generation relationship diagram. Water generates Wood, Wood generates Fire, Fire generates Earth, Earth generates Metal, Metal generates Water, representing the mutually reinforcing cycle between data, models, compute, platforms, and hardware.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="five-elements-overcoming-relationships"&gt;Five Elements Overcoming Relationships&lt;/h2&gt;
&lt;p&gt;At the same time, &lt;strong&gt;overcoming&lt;/strong&gt; relationships also exist among the Five Elements, meaning when one element is too strong or imbalanced, it will suppress or weaken another element:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Wood overcomes Earth&lt;/strong&gt;: Excessive model expansion increases the burden on the platform (Earth), potentially even crushing the existing architecture&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Earth overcomes Water&lt;/strong&gt;: Overly heavy platforms and rules will hinder the free flow of data (Water)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Water overcomes Fire&lt;/strong&gt;: Data bottlenecks will limit the performance of compute&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fire overcomes Metal&lt;/strong&gt;: Excessive compute demand may break through hardware (Metal) limits&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Metal overcomes Wood&lt;/strong&gt;: Strict hardware and rule limitations will curb the expansion of models (Wood)&lt;/li&gt;
&lt;/ul&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/five-elements/a97764daf5c160b53c8b0426159cc16f.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/five-elements/a97764daf5c160b53c8b0426159cc16f.svg" alt="Figure 2: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element." data-caption="Figure 2: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element."
width="908"
height="333"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/five-elements/b7ac6707c367b63143436e20483fc83c.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/five-elements/b7ac6707c367b63143436e20483fc83c.svg" alt="Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element." data-caption="Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element."
width="729"
height="273"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system’s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;Figure 3: Five Elements generation and overcoming relationship diagram. Dashed arrows indicate overcoming relationships, reflecting the system&amp;rsquo;s internal checks and balances mechanism: any element becoming excessively strong will constrain another element.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="five-elements-balance-diagnosis"&gt;Five Elements Balance Diagnosis&lt;/h2&gt;
&lt;p&gt;Through the Five Elements model, engineering teams can systematically check the &lt;strong&gt;role completeness and balance&lt;/strong&gt; of infrastructure.&lt;/p&gt;
&lt;h2 id="common-imbalance-patterns"&gt;Common Imbalance Patterns&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Imbalance Pattern&lt;/th&gt;
&lt;th&gt;Manifestation&lt;/th&gt;
&lt;th&gt;Consequence&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Strong Wood, Weak Water&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Focus on model algorithm iteration, neglect data quality&lt;/td&gt;
&lt;td&gt;Model performance hits bottlenecks&lt;/td&gt;
&lt;td&gt;Strengthen data pipelines and quality control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Strong Metal, Weak Earth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stack hardware, insufficient platform governance capability&lt;/td&gt;
&lt;td&gt;Poor resource utilization, lack of vitality&lt;/td&gt;
&lt;td&gt;Improve platform governance and scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vigorous Fire, Broken Wood&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large compute investment, models can&amp;rsquo;t keep up&lt;/td&gt;
&lt;td&gt;Resource waste&lt;/td&gt;
&lt;td&gt;Optimize model architecture, improve compute utilization efficiency&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Common Imbalance Patterns
&lt;/figcaption&gt;
&lt;h2 id="balance-principles"&gt;Balance Principles&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Successful large-scale systems require coordinated cooperation of all five elements&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Let each of the five elements fulfill its duties in their respective roles&lt;/li&gt;
&lt;li&gt;Maintain generation as primary, overcoming as secondary&lt;/li&gt;
&lt;li&gt;Prevent any side from excessive expansion or shrinkage&lt;/li&gt;
&lt;li&gt;Regularly check the balance state of Five Elements&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Only by letting the five elements fulfill their respective roles and mutually promote each other, while preventing any side from excessive expansion or shrinkage, can the entire system maintain &lt;strong&gt;robustness and evolutionary capability&lt;/strong&gt;.&lt;/p&gt;</content:encoded></item><item><title>The Yun Layer: Stages and Cycles of System Evolution</title><link>https://jimmysong.io/book/ai-infra-dao/yun/</link><pubDate>Tue, 10 Feb 2026 13:56:38 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/yun/</guid><description>System evolution stages: exploration, platform, scale, and rebalancing phases in AI infrastructure growth</description><content:encoded>
&lt;p&gt;&lt;strong&gt;Yun (运)&lt;/strong&gt; here refers to the developmental stages and temporal rhythms experienced by a system, which can be understood as the lifecycle cycles or &amp;ldquo;fortune&amp;rdquo; of infrastructure.&lt;/p&gt;
&lt;p&gt;Large-scale infrastructure is not static but evolves cyclically through the &lt;strong&gt;Exploration Period&lt;/strong&gt;, &lt;strong&gt;Platform Period&lt;/strong&gt;, &lt;strong&gt;Scale Period&lt;/strong&gt;, and &lt;strong&gt;Rebalancing Period&lt;/strong&gt;, with each stage having its primary contradictions and tasks.&lt;/p&gt;
&lt;p&gt;Below are the four evolutionary stages.&lt;/p&gt;
&lt;h2 id="exploration-period-initial-stage"&gt;Exploration Period (Initial Stage)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Characteristics&lt;/strong&gt;: High variance, low structure, rapid trial and error&lt;/p&gt;
&lt;p&gt;At this stage, new technologies and requirements emerge constantly, system architecture is loose, and diverse experiments coexist.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary Tasks&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Explore effective paths&lt;/li&gt;
&lt;li&gt;Rapidly validate model and functional directions&lt;/li&gt;
&lt;li&gt;Collect data and preliminary stability signals&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Five Elements Characteristics&lt;/strong&gt;: &lt;strong&gt;Wood and Fire in Command&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model innovation (Wood) and computing experimentation (Fire) are core drivers&lt;/li&gt;
&lt;li&gt;Expansion (Yang) outweighs constraints (Yin)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Architecture Strategy&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Tolerate some chaos&lt;/li&gt;
&lt;li&gt;✓ Encourage innovation and iteration&lt;/li&gt;
&lt;li&gt;✓ Focus on collecting data and preliminary stability signals&lt;/li&gt;
&lt;li&gt;✗ Don&amp;rsquo;t prematurely introduce heavy processes and restrictions&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="platform-period-growth-stage"&gt;Platform Period (Growth Stage)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Characteristics&lt;/strong&gt;: Standardization emerges, interfaces and processes converge&lt;/p&gt;
&lt;p&gt;After exploration, the system enters a stage of integration and regulation, beginning to establish unified platforms, standard interfaces, and governance processes, consolidating scattered results into platform capabilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary Tasks&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Establish unified platforms&lt;/li&gt;
&lt;li&gt;Define standard interfaces&lt;/li&gt;
&lt;li&gt;Consolidate governance processes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Five Elements Characteristics&lt;/strong&gt;: &lt;strong&gt;Fire Generates Earth&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Successful practices in computing and functionality (Fire) give rise to platform support requirements (Earth)&lt;/li&gt;
&lt;li&gt;Governance and standards gradually strengthen&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Architecture Strategy&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Extract common requirements&lt;/li&gt;
&lt;li&gt;✓ Build support platforms (Yin increases)&lt;/li&gt;
&lt;li&gt;✓ Lay the foundation for next-stage scaling&lt;/li&gt;
&lt;li&gt;✗ Don&amp;rsquo;t remain in disordered exploration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="scale-period-mature-stage"&gt;Scale Period (Mature Stage)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Characteristics&lt;/strong&gt;: Efficiency, throughput, and cost become the main battlefield&lt;/p&gt;
&lt;p&gt;The system is deployed at scale, and focus shifts to optimizing efficiency and costs, improving throughput and reliability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary Tasks&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Optimize efficiency&lt;/li&gt;
&lt;li&gt;Improve throughput&lt;/li&gt;
&lt;li&gt;Reduce costs&lt;/li&gt;
&lt;li&gt;Ensure reliability&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Five Elements Characteristics&lt;/strong&gt;: &lt;strong&gt;Heavy Earth Breaks Wood&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Platforms (Earth) and hard constraints begin to dominate&lt;/li&gt;
&lt;li&gt;Overly idealistic model expansion (Wood) will encounter setbacks from realistic conditions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Architecture Strategy&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Strengthen monitoring and automated operations&lt;/li&gt;
&lt;li&gt;✓ Control overly strong &amp;ldquo;Yang&amp;rdquo; through governance means&lt;/li&gt;
&lt;li&gt;✓ Ensure robust system operation&lt;/li&gt;
&lt;li&gt;✗ Don&amp;rsquo;t continue with startup-era casual practices&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="rebalancingsubstitution-period-renewal-stage"&gt;Rebalancing/Substitution Period (Renewal Stage)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Characteristics&lt;/strong&gt;: Old structures are corrected or replaced by new structures&lt;/p&gt;
&lt;p&gt;When the previous stage&amp;rsquo;s patterns reach their limits, the system either enters self-correction by introducing new elements to rebalance, or gets disrupted and replaced by a new paradigm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary Tasks&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduce new elements to rebalance&lt;/li&gt;
&lt;li&gt;Or accept substitution by a new paradigm&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Five Elements Characteristics&lt;/strong&gt;: &lt;strong&gt;Metal and Water Rise Again&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Suppressed hardware/rule innovations (Metal) and new data potentials (Water) rise again&lt;/li&gt;
&lt;li&gt;Driving system transformation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Architecture Strategy&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Be forward-looking, dare to break through&lt;/li&gt;
&lt;li&gt;✓ Transition smoothly, avoid severe volatility&lt;/li&gt;
&lt;li&gt;✗ Don&amp;rsquo;t cling to the status quo&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="evolutionary-cycle"&gt;Evolutionary Cycle&lt;/h2&gt;
&lt;p&gt;The above stages form a cyclical pattern, where the endpoint of each stage is also the starting point of the next &lt;strong&gt;↻&lt;/strong&gt;.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/yun/f0b211635d09b8bc69e8540c27060c22.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/yun/f0b211635d09b8bc69e8540c27060c22.svg" alt="Figure 1: The “Yun” cycle of AI infrastructure evolution. Systems start from the exploration period, undergo platform period standardization, enter the scale period for efficiency optimization, and ultimately move toward a new cycle of rebalancing or substitution.*" data-caption="Figure 1: The “Yun” cycle of AI infrastructure evolution. Systems start from the exploration period, undergo platform period standardization, enter the scale period for efficiency optimization, and ultimately move toward a new cycle of rebalancing or substitution.*"
width="1692"
height="282"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: The “Yun” cycle of AI infrastructure evolution. Systems start from the exploration period, undergo platform period standardization, enter the scale period for efficiency optimization, and ultimately move toward a new cycle of rebalancing or substitution.*&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="the-art-of-following-the-momentum"&gt;The Art of Following the Momentum&lt;/h2&gt;
&lt;p&gt;A mature infrastructure organization should be able to determine its current stage based on internal and external signals and adjust its strategy accordingly.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If stage transitions are ignored or excessively rushed, the system will experience disturbances or even crises&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="error-examples"&gt;Error Examples&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Erroneous Behavior&lt;/th&gt;
&lt;th&gt;Manifestation&lt;/th&gt;
&lt;th&gt;Consequence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pulling Up Seedlings to Help Them Grow&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managing systems still in exploration period as scaled systems, prematurely suppressing change&lt;/td&gt;
&lt;td&gt;Stifling innovation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Going Against the Momentum&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Remaining in disordered exploration when it&amp;rsquo;s time to enter the platform period&lt;/td&gt;
&lt;td&gt;Missing the window for structured growth and creating hidden risks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Clinging to the Status Quo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unwilling to change when rebalancing period is needed&lt;/td&gt;
&lt;td&gt;System rigidity and aging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Development Stage Characteristics
&lt;/figcaption&gt;
&lt;h2 id="stage-assessment-checklist"&gt;Stage Assessment Checklist&lt;/h2&gt;
&lt;p&gt;Through the &amp;ldquo;Yun&amp;rdquo; layer perspective, teams can examine the current macro stage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Are we validating new concepts or expanding our achievements?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; What is the system&amp;rsquo;s primary contradiction?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; When might the next stage arrive?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Does our strategy align with the current stage?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example Questions&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Are we in the exploration period?
&lt;ul&gt;
&lt;li&gt;If yes → Focus on rapid trial and error and validation&lt;/li&gt;
&lt;li&gt;If no → Consider whether to enter the platform period&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Does our system need standardization?
&lt;ul&gt;
&lt;li&gt;If yes → Enter platform period, establish platforms and standards&lt;/li&gt;
&lt;li&gt;If no → Continue exploration&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Qi Layer: Effective System Flow and Pressure Fields</title><link>https://jimmysong.io/book/ai-infra-dao/qi/</link><pubDate>Tue, 10 Feb 2026 13:56:17 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/qi/</guid><description>Effective flow and pressure distribution in systems—data flow, signal propagation, and system health monitoring</description><content:encoded>
&lt;p&gt;&lt;strong&gt;Qi (气)&lt;/strong&gt; in Chinese culture refers to the energy and flow field that permeates all things. In AI infrastructure, we borrow the concept of &amp;ldquo;Qi&amp;rdquo; to describe the effective flow and pressure distribution within systems.&lt;/p&gt;
&lt;p&gt;This includes the circulation of data, tasks, and signals throughout the system, as well as how various explicit or implicit &lt;strong&gt;system pressures&lt;/strong&gt; accumulate, propagate, and release.&lt;/p&gt;
&lt;h2 id="the-essence-of-qi-overall-state-of-affairs"&gt;The Essence of Qi: Overall State of Affairs&lt;/h2&gt;
&lt;p&gt;Unlike traditional single-point metric monitoring, the concept of &amp;ldquo;Qi&amp;rdquo; reminds us to focus on the overall &lt;strong&gt;state of affairs&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Signals are not isolated events, but rather gather and flow like a field&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A sudden spike in GPU utilization may not be abnormal&lt;/li&gt;
&lt;li&gt;But if multiple metrics (job queue length, response latency, memory usage, etc.) show a simultaneous trend of increase and persistence → this indicates a change in the &amp;ldquo;Qi field&amp;rdquo;&lt;/li&gt;
&lt;li&gt;This signals the system entering a high-pressure state&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This &lt;strong&gt;signal field&lt;/strong&gt; manifests as the gathering and stretching of Qi, indicating the accumulation of some form of system tension.&lt;/p&gt;
&lt;h2 id="two-states-of-qi"&gt;Two States of Qi&lt;/h2&gt;
&lt;h2 id="qi-flow-system-active"&gt;Qi Flow: System Active&lt;/h2&gt;
&lt;p&gt;When all elements coordinate well, data and instructions flow smoothly, producing value efficiently:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Processing rates across all stages are basically matched&lt;/li&gt;
&lt;li&gt;No long-term backlogs or idle resources&lt;/li&gt;
&lt;li&gt;Timely system responses&lt;/li&gt;
&lt;li&gt;Balanced resource utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="qi-stagnation-system-pathological"&gt;Qi Stagnation: System Pathological&lt;/h2&gt;
&lt;p&gt;If a bottleneck or imbalance occurs somewhere, Qi&amp;rsquo;s flow is obstructed, causing local pressure to surge:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Jobs queue for long periods&lt;/li&gt;
&lt;li&gt;CPU/GPU long-term idle or 100% utilization&lt;/li&gt;
&lt;li&gt;Serious message queue backlog&lt;/li&gt;
&lt;li&gt;Frequent anomaly alerts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ultimately, this may trigger failures or performance collapse at weak points.&lt;/p&gt;
&lt;h2 id="qis-flow-path"&gt;Qi&amp;rsquo;s Flow Path&lt;/h2&gt;
&lt;p&gt;To intuitively understand Qi&amp;rsquo;s flow path, we can view the system as a closely connected network:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/qi/045a20987f2f172d95c9136ea84ea722.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/qi/045a20987f2f172d95c9136ea84ea722.svg" alt="Figure 1: Diagram of system ‘Qi’ flow path. Data (Water) Qi enters Model (Wood), triggering Computing Power (Fire) operation, coordinated via Platform (Earth), executed on Hardware (Metal), producing results that feed back to the data layer, forming a closed loop." data-caption="Figure 1: Diagram of system ‘Qi’ flow path. Data (Water) Qi enters Model (Wood), triggering Computing Power (Fire) operation, coordinated via Platform (Earth), executed on Hardware (Metal), producing results that feed back to the data layer, forming a closed loop."
width="2558"
height="382"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Diagram of system ‘Qi’ flow path. Data (Water) Qi enters Model (Wood), triggering Computing Power (Fire) operation, coordinated via Platform (Earth), executed on Hardware (Metal), producing results that feed back to the data layer, forming a closed loop.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;Qi&amp;rsquo;s Cycle&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data (Water) Qi enters Model (Wood)&lt;/li&gt;
&lt;li&gt;Drives Computing Power (Fire) to operate&lt;/li&gt;
&lt;li&gt;Coordinated via Platform (Earth)&lt;/li&gt;
&lt;li&gt;Executes computation on Hardware (Metal)&lt;/li&gt;
&lt;li&gt;Outputs results, producing new data or signals&lt;/li&gt;
&lt;li&gt;Feeds back into the data pool (Water)&lt;/li&gt;
&lt;li&gt;Cycle repeats&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="two-forms-of-qi"&gt;Two Forms of Qi&lt;/h2&gt;
&lt;h2 id="healthy-flow"&gt;Healthy Flow&lt;/h2&gt;
&lt;p&gt;Qi circulates ceaselessly among the five elements, maintaining system functionality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If every step flows smoothly → system operates smoothly&lt;/li&gt;
&lt;li&gt;If any step is obstructed → Qi flow slows or even reverses, damaging system performance and stability&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="pressure-propagation"&gt;Pressure Propagation&lt;/h2&gt;
&lt;p&gt;Qi refers not only to healthy flow, but also to &lt;strong&gt;pressure propagation&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example: Data Inflow Surge&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data inflow surges but model processing capacity cannot keep up&lt;/li&gt;
&lt;li&gt;Unprocessed data continuously accumulates&lt;/li&gt;
&lt;li&gt;Manifests as excessive pressure in the data layer (Water)&lt;/li&gt;
&lt;li&gt;Leading to suppression of computing power performance (Fire weakens)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Example: Hardware Resource Exhaustion&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hardware (Metal) resources exhausted&lt;/li&gt;
&lt;li&gt;Computing requests cannot be satisfied&lt;/li&gt;
&lt;li&gt;Obstructed Qi transforms into queuing pressure&lt;/li&gt;
&lt;li&gt;Feeds back to platform (Earth) scheduling layer and user experience&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="application-of-qi-layer-in-operations"&gt;Application of Qi Layer in Operations&lt;/h2&gt;
&lt;p&gt;Through the lens of &amp;ldquo;Qi&amp;rdquo;, operations and architecture teams can more sensitively detect sub-optimal system states:&lt;/p&gt;
&lt;h2 id="not-just-whether-theres-a-problem-but-how-its-trending"&gt;Not Just Whether There&amp;rsquo;s a Problem, But How It&amp;rsquo;s Trending&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Qi State&lt;/th&gt;
&lt;th&gt;Manifestation&lt;/th&gt;
&lt;th&gt;Warning Significance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stagnation Emerging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Latency jitter gradually worsening&lt;/td&gt;
&lt;td&gt;System entering sub-stable state, needs 疏导&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flow Obstruction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Request failure rate rising, retries increasing&lt;/td&gt;
&lt;td&gt;某环节阻塞，needs investigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi Scattering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Metrics fluctuating severely, irregular&lt;/td&gt;
&lt;td&gt;System severely imbalanced, needs overall adjustment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi Deficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Resource utilization long-term low&lt;/td&gt;
&lt;td&gt;Configuration unreasonable, needs optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Qi State and Warning Significance
&lt;/figcaption&gt;
&lt;h2 id="qi-disorder-precedes-major-incidents"&gt;Qi Disorder Precedes Major Incidents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Latency jitter gradually worsening → signals system entering sub-stable state&lt;/li&gt;
&lt;li&gt;If no measures are taken to resolve (scaling resources, optimizing algorithms, or rate limiting) → may evolve to complete failure&lt;/li&gt;
&lt;li&gt;Agent task interaction rhythm (Qi) slows or stops → may indicate poor communication between agents or deadlock&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="strategies-for-guiding-qi-flow"&gt;Strategies for Guiding Qi Flow&lt;/h2&gt;
&lt;p&gt;Maintaining &lt;strong&gt;smooth Qi flow&lt;/strong&gt; requires building resilience:&lt;/p&gt;
&lt;h2 id="architecture-level"&gt;Architecture Level&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Peak shaving and valley filling mechanisms&lt;/strong&gt;: Absorb 突发流量&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Message queue backpressure protection&lt;/strong&gt;: Prevent pressure backflow&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elastic buffer design&lt;/strong&gt;: Reserve margin to handle impacts&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="strategy-level"&gt;Strategy Level&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Slack capacity&lt;/strong&gt;: Maintain certain redundancy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elastic scaling strategies&lt;/strong&gt;: Dynamically adjust resources&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rate limiting and degradation mechanisms&lt;/strong&gt;: Protect core functionality&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="agent-system-special-attention"&gt;Agent System Special Attention&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Monitor task queues and communication latency&lt;/li&gt;
&lt;li&gt;Ensure information flow (Qi) between agents is unobstructed&lt;/li&gt;
&lt;li&gt;Introduce coordinator agents or reduce concurrency when necessary to smooth Qi flow&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="qi-layer-monitoring-practices"&gt;Qi Layer Monitoring Practices&lt;/h2&gt;
&lt;p&gt;Establish system-wide observability:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Monitoring Dimension&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Tool Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Traffic Distribution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Request flow across stages&lt;/td&gt;
&lt;td&gt;Distributed Tracing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Queue Backlog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Queue length trends&lt;/td&gt;
&lt;td&gt;Message Queue Monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resource Utilization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CPU/GPU/Memory/Storage&lt;/td&gt;
&lt;td&gt;Prometheus + Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency Distribution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;P50/P95/P99 latency&lt;/td&gt;
&lt;td&gt;APM Tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Anomaly Trends&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Error rate, retry rate changes&lt;/td&gt;
&lt;td&gt;Log Aggregation Analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Qi Layer Monitoring Dimensions
&lt;/figcaption&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Qi layer provides an effective liquidity metric, helping us pulse-check whether the system&amp;rsquo;s &amp;ldquo;blood and Qi&amp;rdquo; are abundant and flowing smoothly&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Qi&amp;rsquo;s operation can be understood as whether the system&amp;rsquo;s &amp;ldquo;meridians&amp;rdquo; are unobstructed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Qi flow means system active&lt;/strong&gt;: Data and instructions flow smoothly, producing value efficiently&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Qi stagnation means system pathological&lt;/strong&gt;: Flow obstructed, local pressure surges, ultimately triggering failures&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Just as in Traditional Chinese Medicine&amp;rsquo;s four examination methods, by observing &amp;ldquo;Qi&amp;rsquo;s&amp;rdquo; operation, we can predict the trajectory of system problems and apply targeted remedies.&lt;/p&gt;</content:encoded></item><item><title>Dynamic Relationship Modeling: Five Elements Flow Under Yin-Yang Balance</title><link>https://jimmysong.io/book/ai-infra-dao/dynamic-modeling/</link><pubDate>Tue, 10 Feb 2026 13:55:47 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/dynamic-modeling/</guid><description>Integrating Yin-Yang, Five Elements, Yun, and Qi layers to explain complex AI infrastructure system behavior</description><content:encoded>
&lt;h2 id="yin-yang--five-elements-intrinsic-tension-of-elements"&gt;Yin-Yang × Five Elements: Intrinsic Tension of Elements&lt;/h2&gt;
&lt;p&gt;Each &lt;strong&gt;Five Elements&lt;/strong&gt; component contains both &lt;strong&gt;Yin&lt;/strong&gt; and &lt;strong&gt;Yang&lt;/strong&gt; aspects, manifesting with different polarities in different contexts:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/dynamic-modeling/a179955c34047cc8d9fb31f92c6fb594.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/dynamic-modeling/a179955c34047cc8d9fb31f92c6fb594.svg" alt="Figure 1: Yin-Yang states of Five Elements. Each element includes Yin (potential, static, introverted) and Yang (explicit, dynamic, extroverted) aspects, with transformation possible between them depending on context." data-caption="Figure 1: Yin-Yang states of Five Elements. Each element includes Yin (potential, static, introverted) and Yang (explicit, dynamic, extroverted) aspects, with transformation possible between them depending on context."
width="4592"
height="408"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Yin-Yang states of Five Elements. Each element includes Yin (potential, static, introverted) and Yang (explicit, dynamic, extroverted) aspects, with transformation possible between them depending on context.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;Yin-Yang Attributes of the Five Elements&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Five Elements&lt;/th&gt;
&lt;th&gt;Yin State&lt;/th&gt;
&lt;th&gt;Yang State&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Water (Data)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Potential data reserves, implicit patterns (static storage of historical data)&lt;/td&gt;
&lt;td&gt;Instant data flow, real-time feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Wood (Model)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dormant capabilities (unactivated parameters, backup algorithms)&lt;/td&gt;
&lt;td&gt;Explicit expansion (model architecture updates, parameter surge)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fire (Compute)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stored energy (idle compute, waiting for scheduling)&lt;/td&gt;
&lt;td&gt;High-load operation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Earth (Platform)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Static support (stable operation, non-intervention)&lt;/td&gt;
&lt;td&gt;Proactive scheduling and expanded governance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metal (Hardware)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Implicit constraints (unused capacity)&lt;/td&gt;
&lt;td&gt;Explicit limits (resource hard caps maxed out)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Dynamic Model Overview
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Signs of Yin-Yang Imbalance&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fire Excessively Yin&lt;/strong&gt;: GPU compute idle for long periods while tasks backlog → Poor scheduling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fire Excessively Yang&lt;/strong&gt;: GPUs at 24-hour full load with no elasticity → Hidden crash risk&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Earth Excessively Yang&lt;/strong&gt;: Too many platform rules → Stifling innovation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Earth Excessively Yin&lt;/strong&gt;: Lack of platform control → Leading to chaos&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="five-elements--qi-dynamic-network-of-flow"&gt;Five Elements × Qi: Dynamic Network of Flow&lt;/h2&gt;
&lt;p&gt;The &lt;strong&gt;Five Elements&lt;/strong&gt; framework provides tools to decompose systems, but system components are not static puzzles—rather, they connect into a dynamic network through &lt;strong&gt;the flow of Qi&lt;/strong&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Generating Relationships&lt;/strong&gt;: Qi flows smoothly, forming positive feedback loops&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Controlling Relationships&lt;/strong&gt;: Qi stagnates at certain links or reverse effects strengthen&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Relationship Principles&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Generating primarily, Controlling secondarily&lt;/strong&gt;—main energy flows transmit successfully through each link, while balancing forces intervene moderately only to prevent extreme situations.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="yun--yin-yang-five-elements-boundary-conditions-for-stage-evolution"&gt;Yun × Yin-Yang Five Elements: Boundary Conditions for Stage Evolution&lt;/h2&gt;
&lt;p&gt;The stage-based nature of &lt;strong&gt;Yun&lt;/strong&gt; provides a perspective of &lt;strong&gt;boundary conditions&lt;/strong&gt; evolving over time for the aforementioned Yin-Yang Five Elements dynamics.&lt;/p&gt;
&lt;p&gt;Each stage strengthens or weakens certain elements and tensions:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Main Characteristics&lt;/th&gt;
&lt;th&gt;Five Elements Characteristics&lt;/th&gt;
&lt;th&gt;Yin-Yang Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Exploration Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High variance, low structure, rapid trial and error&lt;/td&gt;
&lt;td&gt;Wood and Fire dominant&lt;/td&gt;
&lt;td&gt;Expansion (Yang) outweighs Constraints (Yin)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standardization emerges, interfaces and processes converge&lt;/td&gt;
&lt;td&gt;Fire generates Earth&lt;/td&gt;
&lt;td&gt;Governance (Yin increasing) gradually strengthens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scale Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Efficiency, throughput, cost become main battlegrounds&lt;/td&gt;
&lt;td&gt;Earth dominates Wood&lt;/td&gt;
&lt;td&gt;Stability (Yin) takes precedence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rebalancing Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Old structures corrected or replaced by new structures&lt;/td&gt;
&lt;td&gt;Metal and Water resurge&lt;/td&gt;
&lt;td&gt;Transformation (Yang) rises again&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Typical Interaction Scenarios
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Stage Transitions&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;The Yun layer tells us &lt;strong&gt;when to shift focus&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As stages change, the system needs to &amp;ldquo;allocate interests&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Previously dominant elements may become excessive and need convergence&lt;/li&gt;
&lt;li&gt;Previously minor elements need strengthening to address shortcomings&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Examples&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In Platform Stage/Scale Stage → Must strengthen governance (Earth&amp;rsquo;s Yang) and hardware optimization (Metal&amp;rsquo;s Yang)&lt;/li&gt;
&lt;li&gt;To curb the 野蛮 growth tendencies left over from early stages (excessive Wood-Fire Qi)&lt;/li&gt;
&lt;li&gt;In Rebalancing Stage → May need to reactivate suppressed innovation potential (Water-Wood Qi)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="comprehensive-analysis-case-gpu-scheduling-scenario"&gt;Comprehensive Analysis Case: GPU Scheduling Scenario&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s see how to apply the four-layer model to analyze a real GPU scheduling problem.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Problem Scenario&lt;/strong&gt;: Cluster experiences task queues under high load&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Diagnosis&lt;/th&gt;
&lt;th&gt;Findings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Observe Qi flow state&lt;/td&gt;
&lt;td&gt;Compute Fire Qi is obstructed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Five Elements Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Locate elements&lt;/td&gt;
&lt;td&gt;Data input too intense (Water Yang excessive) but scheduling (Platform Earth) strategy cannot keep up&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yin-Yang Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Analyze tensions&lt;/td&gt;
&lt;td&gt;Scheduling strategy blindly pursues maximizing utilization (excessively Yang) while lacking elastic buffers (Yin)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yun Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Assess stage&lt;/td&gt;
&lt;td&gt;This is an emerging business that just passed exploration stage and hasn&amp;rsquo;t perfected scheduling—Platform Stage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Four-Layer Diagnostic Analysis
&lt;/figcaption&gt;
&lt;h2 id="solutions"&gt;Solutions&lt;/h2&gt;
&lt;p&gt;Based on four-layer collaborative diagnosis, develop comprehensive solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Qi Layer&lt;/strong&gt;: Unblock Qi flow&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Expand resources or optimize algorithms&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Five Elements Layer&lt;/strong&gt;: Balance elements&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Strengthen platform scheduling capabilities (Earth)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Yin-Yang Layer&lt;/strong&gt;: Restore balance&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduce elastic buffer mechanisms (supplement Yin)&lt;/li&gt;
&lt;li&gt;Avoid blindly pursuing high utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Yun Layer&lt;/strong&gt;: Follow the trend&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Accelerate introduction of standardized scheduling and resource governance (Earth&amp;rsquo;s Yun is approaching)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="value-of-dynamic-modeling"&gt;Value of Dynamic Modeling&lt;/h2&gt;
&lt;p&gt;Through the multi-level dynamic modeling above, we can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Explain complex scenarios more comprehensively&lt;/strong&gt;: No longer limited to single perspectives&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Locate root causes of problems&lt;/strong&gt;: Find fundamental causes rather than surface phenomena&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Point improvement directions&lt;/strong&gt;: Obtain systematic solutions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Predict system evolution&lt;/strong&gt;: Prepare in advance for stage transitions&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practical-recommendations"&gt;Practical Recommendations&lt;/h2&gt;
&lt;p&gt;In daily architecture design and operations, you can establish these thinking habits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;When encountering problems&lt;/strong&gt;: Analyze layer by layer from a four-layer perspective&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When making decisions&lt;/strong&gt;: Consider impacts on all four layers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;When conducting post-mortems&lt;/strong&gt;: Check whether warning signals from the four-layer model were ignored&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The value of a system lies not in pursuing the extreme of a single performance indicator without limit, but in balancing all elements to achieve long-term coordinated development&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>Engineering Practice Guide: Architecture Decisions Guided by Theory</title><link>https://jimmysong.io/book/ai-infra-dao/engineering-practice/</link><pubDate>Tue, 10 Feb 2026 13:55:59 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/engineering-practice/</guid><description>Practical principles for applying the Yin-Yang Five Elements Qi model in GPU scheduling, Agent Runtime, and platform governance</description><content:encoded>
&lt;p&gt;The theoretical models mentioned above are not 停留在停留在 the conceptual level, but directly provide guidance for the engineering practice of AI infrastructure. In specific scenarios such as GPU scheduling, Agent runtime, and platform governance, we can follow the principles below to apply the Yin-Yang Five Elements Qi Movement model.&lt;/p&gt;
&lt;h2 id="balance-yin-and-yang-avoid-extremes"&gt;Balance Yin and Yang, Avoid Extremes&lt;/h2&gt;
&lt;p&gt;Consider both propelling forces and restraining forces when making architecture decisions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GPU Cluster Scaling&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Satisfy business growth (expanding Yang)&lt;/li&gt;
&lt;li&gt;✓ Set quota and priority policies (constraining Yin)&lt;/li&gt;
&lt;li&gt;✓ Prevent resource abuse&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agent Runtime Design&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;✓ Give agents more autonomy (innovation, Yang)&lt;/li&gt;
&lt;li&gt;✓ Introduce monitoring and sandboxing mechanisms (governance, Yin)&lt;/li&gt;
&lt;li&gt;✓ Prevent loss of control&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Practice Checklist&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;After every major adjustment, ask yourself: &lt;strong&gt;Have I introduced corresponding counter-forces to stabilize the system?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="complete-the-five-elements-identify-and-fill-weaknesses"&gt;Complete the Five Elements, Identify and Fill Weaknesses&lt;/h2&gt;
&lt;p&gt;Regularly review whether the five types of elements in the system are balanced.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GPU Infrastructure Check&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Do data pipelines keep up with computing power improvements? (Water and Fire matching)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Does model optimization fully utilize hardware? (Wood and Metal matching)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Can the scheduling platform handle peak loads? (Earth supporting Fire)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Has hardware resources become a bottleneck? (Metal not holding back)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agent Platform Check&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is there high-quality knowledge base or real-time data support? (Water)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is there strong model capability? (Wood)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is there sufficient computing resources? (Fire)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is there a good orchestration framework? (Earth)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is there a reliable environment and interfaces? (Metal)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Practice Strategy&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Once a bottleneck or overload is discovered in a certain link, decisively invest resources to &lt;strong&gt;fill the weakness or reduce the burden on the overloaded part&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem Discovered&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Insufficient data quality (&amp;ldquo;Water&amp;rdquo; weak)&lt;/td&gt;
&lt;td&gt;Prioritize data governance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-term low hardware utilization (Metal strong, Fire weak)&lt;/td&gt;
&lt;td&gt;Optimize algorithms or scheduling to better utilize hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Problem Discovery and Solutions
&lt;/figcaption&gt;
&lt;h2 id="follow-the-trend-align-with-the-movement"&gt;Follow the Trend, Align with the Movement&lt;/h2&gt;
&lt;p&gt;Develop reasonable strategies based on the stage of the system.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Strategies for Different Stages&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Should Do&lt;/th&gt;
&lt;th&gt;Should Not Do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Exploration Phase&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Rapid trial and error, validate value&lt;/td&gt;
&lt;td&gt;Prematurely introduce heavy processes and constraints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform Phase&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standardized management, MLOps tools&lt;/td&gt;
&lt;td&gt;Remain in disordered exploration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scale Phase&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strengthen governance and efficiency optimization&lt;/td&gt;
&lt;td&gt;Still use the casual practices of the startup period&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rebalancing Phase&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Architecture innovation, introduce new technologies&lt;/td&gt;
&lt;td&gt;Refuse to move forward&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Strategies for Different Stages
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Regular Assessment&lt;/strong&gt;:
At each quarter or important milestone, assess:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; &lt;strong&gt;Which stage are we currently in&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; What is the main contradiction in this stage?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; When might the next stage arrive?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Prepare in advance for the transition&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Practice Cases&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An AI training cluster after validating the concept → Should consider entering standardized management (transitioning from exploration phase to platform phase)&lt;/li&gt;
&lt;li&gt;When system scale expansion encounters bottlenecks → Consider whether to enter the rebalancing phase and break through through architecture innovation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="observe-qi-field-optimize-flow"&gt;Observe Qi Field, Optimize Flow&lt;/h2&gt;
&lt;p&gt;Establish global observability of the system, focusing on trends and correlations rather than single-point metrics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Monitoring Methods&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distributed tracing&lt;/li&gt;
&lt;li&gt;Metric correlation analysis&lt;/li&gt;
&lt;li&gt;Full-link monitoring&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Signals of Qi Disorder&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Possible Cause&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frequent occurrence of various abnormal logs&lt;/td&gt;
&lt;td&gt;Global investigation needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;A metric&amp;rsquo;s periodic fluctuations becoming increasingly intense&lt;/td&gt;
&lt;td&gt;The system may be approaching a limit internally&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Signals of Qi Disorder
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Strategies to Keep Qi Flowing Smoothly&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Architecture Level&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Peak clipping and valley filling mechanisms&lt;/li&gt;
&lt;li&gt;Message queue backpressure protection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Strategy Level&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Slack capacity&lt;/li&gt;
&lt;li&gt;Elastic scaling strategies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agent System Special Attention&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monitor task queues and communication latency&lt;/li&gt;
&lt;li&gt;Ensure smooth information flow (Qi) between agents&lt;/li&gt;
&lt;li&gt;Introduce coordinator agents or reduce concurrency when necessary&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="dynamic-adjustment-continuous-rebalancing"&gt;Dynamic Adjustment, Continuous Rebalancing&lt;/h2&gt;
&lt;p&gt;Integrate the Yin-Yang Five Elements Qi Movement model into the team&amp;rsquo;s continuous improvement process.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core Questions in Architecture Reviews or Incident Retrospectives&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is the current main contradiction more inclined toward expansion or constraint, speed or stability?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is any Five Elements element overloaded (Yang excess) or missing (Yin deficiency)?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is &lt;strong&gt;System Qi&lt;/strong&gt; congested somewhere?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Do our strategies align with the current stage?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Continuous Improvement Process&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Problem Discovery → Four-Layer Model Diagnosis → Strategy Formulation → Implementation Adjustment → Effect Evaluation → Continuous Optimization
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="practice-case-large-scale-gpu-training-cluster-optimization"&gt;Practice Case: Large-Scale GPU Training Cluster Optimization&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Background&lt;/strong&gt;: A team encountered stability issues while operating a large-scale GPU training cluster.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Four-Layer Model Diagnosis&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Diagnosis&lt;/th&gt;
&lt;th&gt;Findings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yin-Yang Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Speed vs Stability&lt;/td&gt;
&lt;td&gt;Continuously compressing fault tolerance and testing time in pursuit of efficiency (speed Yang), leading to frequent online failures (stability Yin damaged)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Five Elements Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Five Elements Check&lt;/td&gt;
&lt;td&gt;Data pipeline latency gradually increasing (Water weaker than Fire)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Movement Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stage Judgment&lt;/td&gt;
&lt;td&gt;System has moved from barbaric growth period to maturity period&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi Layer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Qi Flow State&lt;/td&gt;
&lt;td&gt;Qi stagnation phenomenon obvious&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 4: Monitoring Methods
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Comprehensive Solution&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Yin-Yang Balance&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Suspend performance optimization&lt;/li&gt;
&lt;li&gt;Invest time to strengthen fault tolerance mechanisms and testing (supplement stability Yin)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Five Elements Completion&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add data preprocessing nodes and caching (strengthen Water)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Movement Adjustment&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Change mindset, shift focus from feature expansion to optimization and governance&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Qi Flow Regulation&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build full-link tracing system&lt;/li&gt;
&lt;li&gt;Monitor the time of each link from training job submission to completion&lt;/li&gt;
&lt;li&gt;Identify Qi stagnation points and clear them&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: While maintaining high utilization, the cluster&amp;rsquo;s stability was greatly improved, and no serious downtime occurred again.&lt;/p&gt;
&lt;h2 id="scenario-application-quick-reference-table"&gt;Scenario Application Quick Reference Table&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Yin-Yang Focus&lt;/th&gt;
&lt;th&gt;Five Elements Check&lt;/th&gt;
&lt;th&gt;Movement Judgment&lt;/th&gt;
&lt;th&gt;Qi Flow Monitoring&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU Scheduling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Utilization vs Elasticity&lt;/td&gt;
&lt;td&gt;Fire - Earth - Metal Balance&lt;/td&gt;
&lt;td&gt;Scale Phase Efficiency Optimization&lt;/td&gt;
&lt;td&gt;Task queues, resource utilization curves&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent Runtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Autonomy vs Governance&lt;/td&gt;
&lt;td&gt;Water - Wood - Fire Coordination&lt;/td&gt;
&lt;td&gt;Exploration Phase Rapid Iteration&lt;/td&gt;
&lt;td&gt;Communication latency, task interaction rhythm&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform Governance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Innovation Risk Control vs Process Efficiency&lt;/td&gt;
&lt;td&gt;Earth - Metal Constraints&lt;/td&gt;
&lt;td&gt;Platform Phase Standardization&lt;/td&gt;
&lt;td&gt;Rule execution rate, change frequency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost Optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Performance vs Cost&lt;/td&gt;
&lt;td&gt;Fire - Metal Matching&lt;/td&gt;
&lt;td&gt;Scale Phase Refinement&lt;/td&gt;
&lt;td&gt;Resource waste, idle time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 5: Signals of Qi Disorder
&lt;/figcaption&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Through the Yin-Yang Five Elements Qi Movement model, we can in practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Avoid Extremes&lt;/strong&gt;: Not blindly pursuing single metrics&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Systematic Thinking&lt;/strong&gt;: Analyzing problems from multiple dimensions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Follow the Trend&lt;/strong&gt;: Adjust strategies based on stages&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Predict Problems&lt;/strong&gt;: Early warning of risks through Qi field changes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuous Improvement&lt;/strong&gt;: Establish systematic optimization processes&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;The value of this system lies in: combining Eastern wisdom with engineering practice to provide a unique and effective thinking framework for complex AI infrastructure&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>System Diagnosis Principles: Criteria for Health Status</title><link>https://jimmysong.io/book/ai-infra-dao/system-diagnosis/</link><pubDate>Tue, 10 Feb 2026 13:56:28 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/system-diagnosis/</guid><description>Five-dimensional diagnosis framework for AI infrastructure health: element balance, flow smoothness, tension dynamics, stage alignment, and runaway warnings</description><content:encoded>
&lt;p&gt;To maintain the long-term healthy evolution of AI infrastructure, post-mortem summaries are far from sufficient. We need a set of &lt;strong&gt;system diagnosis principles&lt;/strong&gt; to detect hidden risks early and correct deviations.&lt;/p&gt;
&lt;p&gt;Based on the Yin-Yang Five Elements Yun model, diagnosis can be conducted from the following five dimensions:&lt;/p&gt;
&lt;h2 id="five-dimensional-diagnosis-framework"&gt;Five-Dimensional Diagnosis Framework&lt;/h2&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/system-diagnosis/14c23d6ef51af9a7b4c30cb6da07e94a.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/system-diagnosis/14c23d6ef51af9a7b4c30cb6da07e94a.svg" alt="Figure 1: Five-Dimensional Diagnosis Framework Diagram" data-caption="Figure 1: Five-Dimensional Diagnosis Framework Diagram"
width="2899"
height="615"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Five-Dimensional Diagnosis Framework Diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="five-elements-balance-check"&gt;Five Elements Balance Check&lt;/h2&gt;
&lt;p&gt;Assess the current status of five aspects: Data (Water), Models (Wood), Compute (Fire), Platform (Earth), and Hardware (Metal).&lt;/p&gt;
&lt;h2 id="diagnosis-method"&gt;Diagnosis Method&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Checklist&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Can data pipelines keep up with demands? (Water)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Are model capabilities fully utilized? (Wood)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Are compute resources effectively used? (Fire)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Can the platform support current load? (Earth)&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Is hardware becoming a bottleneck? (Metal)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="identify-problems"&gt;Identify Problems&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem Type&lt;/th&gt;
&lt;th&gt;Manifestation&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Short Board&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One element significantly weaker than others&lt;/td&gt;
&lt;td&gt;Prioritize strengthening that element&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overload&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One element consumes excessive resources or frequently becomes a bottleneck&lt;/td&gt;
&lt;td&gt;Introduce limits or expand other elements to share pressure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Problem Types and Solutions
&lt;/figcaption&gt;
&lt;h2 id="typical-symptoms"&gt;Typical Symptoms&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Water Level Too Low&lt;/strong&gt;: Data pipelines always lag behind training needs → Replenish data processing capacity&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Metal Overload&lt;/strong&gt;: Hardware often runs at full capacity or even triggers limit alarms → Expand capacity or impose constraints on upper layers&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Most failures do not stem from missing components, but from long-term role imbalance&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="qi-flow-smoothness-check"&gt;Qi Flow Smoothness Check&lt;/h2&gt;
&lt;p&gt;Analyze whether Qi flows smoothly through the system via full-link monitoring.&lt;/p&gt;
&lt;h2 id="diagnosis-method-1"&gt;Diagnosis Method&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Latency distribution of key processes&lt;/li&gt;
&lt;li&gt;Queue backlogs&lt;/li&gt;
&lt;li&gt;Resource utilization curves&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="qi-smooth-vs-qi-not-smooth"&gt;Qi Smooth vs. Qi Not Smooth&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;State&lt;/th&gt;
&lt;th&gt;Characteristics&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi Smooth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Processing rates across stages basically match, without long-term backlogs or idle resources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi Not Smooth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;One stage remains a bottleneck for long periods, or large amounts of resources sit idle&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Qi Flow: Smooth vs Obstructed
&lt;/figcaption&gt;
&lt;h2 id="diagnosis-points"&gt;Diagnosis Points&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Distinguish temporary fluctuations from persistent trends: brief peaks don&amp;rsquo;t necessarily indicate Qi blockage, but persistent deviations must be addressed&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tool Support&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dashboards and automated alerts&lt;/li&gt;
&lt;li&gt;Timely capture of &amp;ldquo;stagnant Qi&amp;rdquo; locations&lt;/li&gt;
&lt;li&gt;Further investigation of causes (which Five Elements imbalance corresponds)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="yin-yang-dynamics-check"&gt;Yin-Yang Dynamics Check&lt;/h2&gt;
&lt;p&gt;Assess whether current strategy and state are &lt;strong&gt;Yang Excess Yin Deficiency&lt;/strong&gt; or &lt;strong&gt;Yin Excess Yang Deficiency&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="diagnosis-method-2"&gt;Diagnosis Method&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Qualitative Analysis&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Look at whether recent architecture decisions overly favor one extreme&lt;/li&gt;
&lt;li&gt;Have you been continuously expanding and adding new features while ignoring stability?&lt;/li&gt;
&lt;li&gt;Or conversely, multiple layers of approval and strict constraints but lack innovation momentum?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Quantitative Metrics&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Yang Excess&lt;/th&gt;
&lt;th&gt;Yin Excess&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Change Frequency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Extremely high&lt;/td&gt;
&lt;td&gt;Extremely low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Incident Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Frequent&lt;/td&gt;
&lt;td&gt;Extremely low but no change&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Release Rhythm&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Continuous&lt;/td&gt;
&lt;td&gt;Long-term stagnation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Yin-Yang Status
&lt;/figcaption&gt;
&lt;h2 id="balance-strategy"&gt;Balance Strategy&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;State&lt;/th&gt;
&lt;th&gt;Symptoms&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yang Excess Yin Deficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Frequent changes with frequent incidents&lt;/td&gt;
&lt;td&gt;Pause releases, focus on addressing hazards (replenish Yin)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yin Excess Yang Deficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Long-term no change and stagnation&lt;/td&gt;
&lt;td&gt;Introduce challenges and innovation (add Yang)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 4: Balance Strategies
&lt;/figcaption&gt;
&lt;h2 id="yun-alignment-check"&gt;Yun Alignment Check&lt;/h2&gt;
&lt;p&gt;Determine whether the organization&amp;rsquo;s actions match the system&amp;rsquo;s current stage, preventing &lt;strong&gt;counter-Yun operation&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="diagnosis-method-3"&gt;Diagnosis Method&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Combine Business Development and Technical Maturity&lt;/strong&gt;:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error Pattern&lt;/th&gt;
&lt;th&gt;Manifestation&lt;/th&gt;
&lt;th&gt;Consequences&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Premature Standardization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spending 大量精力 on process management and cost optimization for emerging projects&lt;/td&gt;
&lt;td&gt;These are typically scale stage concerns, but the project is still in exploration stage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Counter-Yun Exploration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Frequently changing underlying architecture for widely used platforms without rigorous testing&lt;/td&gt;
&lt;td&gt;Inconsistent with scaling stage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 5: Error Patterns
&lt;/figcaption&gt;
&lt;h2 id="stage-strategy-reference-table"&gt;Stage-Strategy Reference Table&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Should Focus On&lt;/th&gt;
&lt;th&gt;Should Not Do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Exploration Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Diversity, flexibility, rapid trial and error&lt;/td&gt;
&lt;td&gt;Premature pursuit of efficiency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Standardization, process norms&lt;/td&gt;
&lt;td&gt;Frequent arbitrary changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scale Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optimization, stability, efficiency&lt;/td&gt;
&lt;td&gt;Still growing wildly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rebalancing Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transformation, breakthrough, innovation&lt;/td&gt;
&lt;td&gt;Clinging to the past&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 6: Stage-Strategy Mapping
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Checklist&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Which stage are we currently in?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Do our actions match the stage?&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Do we need to adjust strategy?&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;When discovering actions don&amp;rsquo;t match the stage, immediately adjust strategy to avoid working at cross-purposes&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="yang-runaway-warning"&gt;Yang Runaway Warning&lt;/h2&gt;
&lt;p&gt;Pay special attention to whether there are signs of &lt;strong&gt;Yang state runaway&lt;/strong&gt; in the system.&lt;/p&gt;
&lt;h2 id="what-is-yang-runaway"&gt;What is Yang Runaway?&lt;/h2&gt;
&lt;p&gt;Exponential explosion or collapse risk caused by unconstrained positive feedback.&lt;/p&gt;
&lt;h2 id="typical-scenarios"&gt;Typical Scenarios&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Service Call Volume Surge&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bug or abuse → Resource strain → Queuing and retry storms → Further increase in calls&lt;/td&gt;
&lt;td&gt;Resource exhaustion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training Task Self-Replication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tasks unlimitedly self-replicate to accelerate → Cluster resource exhaustion&lt;/td&gt;
&lt;td&gt;System collapse&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 7: Typical Scenarios
&lt;/figcaption&gt;
&lt;h2 id="diagnosis-signals"&gt;Diagnosis Signals&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;A metric shows &lt;strong&gt;exponential explosive growth&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Lack of slowing mechanisms&lt;/li&gt;
&lt;li&gt;Formation of vicious cycles&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="response-strategy"&gt;Response Strategy&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Means&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Establish Hard Limits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Metal&amp;rsquo;s constraints&lt;/td&gt;
&lt;td&gt;Immediate shutdown&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Introduce Negative Feedback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Earth&amp;rsquo;s governance (rate limiting, quotas)&lt;/td&gt;
&lt;td&gt;Braking and deceleration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Break Positive Feedback Chain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Activate emergency plan&lt;/td&gt;
&lt;td&gt;Pull back to steady state&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 8: Response Strategies
&lt;/figcaption&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;When discovering a metric showing exponential explosive growth without slowing mechanisms, intervene immediately&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="diagnosis-implementation-process"&gt;Diagnosis Implementation Process&lt;/h2&gt;
&lt;h2 id="regular-diagnosis-mechanism"&gt;Regular Diagnosis Mechanism&lt;/h2&gt;
&lt;p&gt;Recommend establishing a periodic diagnosis process:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-infra-dao/system-diagnosis/884491832caca1efa808558d60e611b9.svg" data-img="https://assets.jimmysong.io/images/book/ai-infra-dao/system-diagnosis/884491832caca1efa808558d60e611b9.svg" alt="Figure 2: Regular Diagnosis Mechanism Flowchart" data-caption="Figure 2: Regular Diagnosis Mechanism Flowchart"
width="3418"
height="361"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Regular Diagnosis Mechanism Flowchart&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="diagnosis-meeting-agenda"&gt;Diagnosis Meeting Agenda&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Fixed Session of Weekly Operations Review Meeting&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check Five Elements scores for each module&lt;/li&gt;
&lt;li&gt;Browse global Qi flow diagram&lt;/li&gt;
&lt;li&gt;Analyze Yin-Yang dynamics&lt;/li&gt;
&lt;li&gt;Discuss current Yun&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This systematic examination makes hidden risks 无处遁形，thus achieving prevention before problems occur&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="diagnosis-action-matrix"&gt;Diagnosis Action Matrix&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Diagnosis Result&lt;/th&gt;
&lt;th&gt;Action Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Five Elements: One Element Too Weak&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Concentrate resources to strengthen the weakness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Five Elements: One Element Overloaded&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Expand capacity or introduce constraints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi Stagnation at One Stage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clear bottlenecks, optimize processes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yang Excess Yin Deficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strengthen governance and stability mechanisms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yin Excess Yang Deficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Activate innovation and boost vitality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Counter-Yun Operation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Adjust strategy and go with the flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yang Runaway Warning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Immediate intervention, break positive feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 9: Diagnosis Action Matrix
&lt;/figcaption&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Through the above diagnosis principles, architects and operations teams can periodically take the pulse of infrastructure like TCM pulse diagnosis.&lt;/p&gt;
&lt;p&gt;When diagnosis indicates imbalance in some aspect, immediately prescribe remedy based on the theory: &lt;strong&gt;replenish what needs replenishing, purge what needs purging&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Long-term adherence will keep the system on a healthy evolutionary trajectory.&lt;/p&gt;</content:encoded></item><item><title>Conclusion and Outlook</title><link>https://jimmysong.io/book/ai-infra-dao/summary/</link><pubDate>Tue, 10 Feb 2026 13:56:22 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-infra-dao/summary/</guid><description>Core value and applications of the Yin-Yang Five Elements Qi-Yun model for AI infrastructure architects</description><content:encoded>
&lt;p&gt;This paper systematically presents the four-layer model of &amp;ldquo;Yin-Yang - Five Elements - Yun - Qi&amp;rdquo; for AI infrastructure, providing a comprehensive cognitive map from theory to practice.&lt;/p&gt;
&lt;h2 id="review-of-theoretical-model"&gt;Review of Theoretical Model&lt;/h2&gt;
&lt;p&gt;Through four dimensions, we have constructed a global framework for understanding AI infrastructure:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Core Value&lt;/th&gt;
&lt;th&gt;Key Insights&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yin-Yang&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Understanding the tension and balance within systems&lt;/td&gt;
&lt;td&gt;Expansion and constraint, innovation and governance, speed and stability—these three are opposites yet unified, all indispensable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Five Elements&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Organizing the fundamental role elements of systems&lt;/td&gt;
&lt;td&gt;Data, models, computing power, platforms, hardware—these five generate and restrain each other in endless cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Yun&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Grasping the periodic patterns of system evolution&lt;/td&gt;
&lt;td&gt;Exploration phase, platform phase, scale phase, rebalancing phase—act in accordance with the trends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qi&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Insight into the flow state of system operation&lt;/td&gt;
&lt;td&gt;When Qi flows, the system is active; when Qi stagnates, the system becomes pathological&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Layer, Core Value, and Key Insights
&lt;/figcaption&gt;
&lt;p&gt;More importantly, we have demonstrated how this theory combining Eastern wisdom with engineering practice can provide insights and guidance for real-world problems such as GPU scheduling, Agent Runtime, and platform governance.&lt;/p&gt;
&lt;h2 id="core-value-of-the-model"&gt;Core Value of the Model&lt;/h2&gt;
&lt;h2 id="holistic-view"&gt;Holistic View&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Traditional fragmented perspectives often see trees but not the forest, making it difficult to provide timely warnings of systemic risks&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The Yin-Yang Five Elements Qi-Yun model, with its &lt;strong&gt;holistic view&lt;/strong&gt;, helps architects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Break free from the constraints of pure technical metrics&lt;/li&gt;
&lt;li&gt;Grasp the principal contradictions and driving forces of system evolution&lt;/li&gt;
&lt;li&gt;Extract meaningful patterns from complex signals&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="dynamic-view"&gt;Dynamic View&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The value of a system lies not in pursuing the extreme of a single performance indicator without limit, but in balancing all elements to achieve long-term coordinated development&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The model&amp;rsquo;s &lt;strong&gt;dynamic view&lt;/strong&gt; reminds us:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Yin-Yang dynamics transform dynamically with environment and stage&lt;/li&gt;
&lt;li&gt;The same capability may shift from advantage to risk at different stages&lt;/li&gt;
&lt;li&gt;Strategies need timely adjustment as Yun changes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="balance-view"&gt;Balance View&lt;/h2&gt;
&lt;p&gt;The core philosophy of the model is &lt;strong&gt;balance&lt;/strong&gt; rather than extreme:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not pursuing the limit of a single metric&lt;/li&gt;
&lt;li&gt;But pursuing system coordination and sustainability&lt;/li&gt;
&lt;li&gt;Finding dynamic balance points within unity of opposites&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practical-application-value"&gt;Practical Application Value&lt;/h2&gt;
&lt;h2 id="during-architecture-design"&gt;During Architecture Design&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Consider the completeness and balance of the Five Elements&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Reserve Yin-Yang constraint mechanisms&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Design evolution paths that align with Yun trends&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Plan channels for Qi flow&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="during-operations-and-governance"&gt;During Operations and Governance&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Regularly check Five Elements balance&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Monitor Qi circulation status&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Assess Yin-Yang dynamic changes&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Determine Yun phase transitions&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Provide early warning of Yang loss-of-control risks&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="during-decision-review"&gt;During Decision Review&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Analyze root causes from the four-layer model perspective&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Check whether basic principles of any layer were violated&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Develop systematic solutions&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Establish long-term improvement mechanisms&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="insights-for-architects"&gt;Insights for Architects&lt;/h2&gt;
&lt;p&gt;In an era of flourishing large models and autonomous agents, infrastructure has become unprecedentedly complex and active.&lt;/p&gt;
&lt;h2 id="cognitive-upgrade"&gt;Cognitive Upgrade&lt;/h2&gt;
&lt;p&gt;From &amp;ldquo;managing machines and applications&amp;rdquo; to &amp;ldquo;managing intelligence and knowledge&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not only focus on application logic itself&lt;/li&gt;
&lt;li&gt;But more on how knowledge and intelligence integrate into systems&lt;/li&gt;
&lt;li&gt;View models as dynamically evolving components&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="mindset-shift"&gt;Mindset Shift&lt;/h2&gt;
&lt;p&gt;From single-metric optimization to system balance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not pursuing the extreme of a single element&lt;/li&gt;
&lt;li&gt;But pursuing overall coordination and sustainability&lt;/li&gt;
&lt;li&gt;Finding dynamic balance within unity of opposites&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="capability-development"&gt;Capability Development&lt;/h2&gt;
&lt;p&gt;From technical expert to systems philosopher:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;While mastering technical tools&lt;/li&gt;
&lt;li&gt;Cultivate systems thinking and philosophical reflection&lt;/li&gt;
&lt;li&gt;Apply holistic frameworks like Yin-Yang and Five Elements&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="limitations-of-the-model"&gt;Limitations of the Model&lt;/h2&gt;
&lt;p&gt;It must be noted that this theory is not a panacea:&lt;/p&gt;
&lt;h2 id="not-a-rigid-formula"&gt;Not a Rigid Formula&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Its value lies not in providing a rigid formula, but in guiding us to return to reality and think about problems from a more comprehensive perspective&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The model provides a thinking framework, not standard answers&lt;/li&gt;
&lt;li&gt;Specific applications need to consider actual scenarios&lt;/li&gt;
&lt;li&gt;Architects ultimately must make judgments based on specific context&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="requires-continuous-validation"&gt;Requires Continuous Validation&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Theory needs continuous validation and refinement in practice&lt;/li&gt;
&lt;li&gt;Different scenarios may require adjustment and extension&lt;/li&gt;
&lt;li&gt;Feedback and improvement in practice are encouraged&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="supplement-not-replace"&gt;Supplement, Not Replace&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The model is a tool to assist decision-making&lt;/li&gt;
&lt;li&gt;Cannot replace professional judgment and experience&lt;/li&gt;
&lt;li&gt;Should be used in combination with other methodologies&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="future-outlook"&gt;Future Outlook&lt;/h2&gt;
&lt;h2 id="theory-development"&gt;Theory Development&lt;/h2&gt;
&lt;p&gt;This model has significant room for development:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Quantitative Metrics&lt;/strong&gt;: Develop more precise quantitative indicators to make the theory more actionable&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool Support&lt;/strong&gt;: Develop analysis tools and automated diagnostic systems based on the model&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Case Accumulation&lt;/strong&gt;: Collect more practical cases to validate and enrich the theory&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cross-Domain Application&lt;/strong&gt;: Explore applications of the model in other complex system domains&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practice-promotion"&gt;Practice Promotion&lt;/h2&gt;
&lt;p&gt;We hope this framework can help:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CTOs, infrastructure architects, and platform R&amp;amp;D teams&lt;/li&gt;
&lt;li&gt;When facing increasingly complex AI infrastructure&lt;/li&gt;
&lt;li&gt;Make wiser decisions&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="ultimate-vision"&gt;Ultimate Vision&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Standing with sword in the midst of waves of change, embracing both the Yang of innovation and the Yin of governance, riding the system&amp;rsquo;s Qi above the currents&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;AI infrastructure stands at the starting point of a new era. We need not only technological innovation but also conceptual innovation.&lt;/p&gt;
&lt;p&gt;The Yin-Yang Five Elements Qi-Yun model offers a unique perspective—combining Eastern philosophical wisdom with modern engineering practice—helping us find simplicity in complexity, stability in change, and unity in opposition.&lt;/p&gt;
&lt;p&gt;We hope this model becomes a powerful tool for your thinking about AI infrastructure, helping you find your own &amp;ldquo;Way&amp;rdquo; in the balance and evolution of systems.&lt;/p&gt;</content:encoded></item><item><title>What Is AI-Native Infrastructure?</title><link>https://jimmysong.io/book/ai-native-infra/definition/</link><pubDate>Sun, 18 Jan 2026 05:43:57 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/definition/</guid><description>Core definition, boundaries, and evaluation criteria for AI-native infrastructure, focusing on model behavior, compute scarcity, and uncertainty governance.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The essence of AI-native infrastructure is to make model behavior, compute scarcity, and uncertainty governable system boundaries.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;AI-native infrastructure is not a simple checklist of technologies, but rather a new operating order designed for a world where &amp;ldquo;models become actors, compute becomes scarce, and uncertainty is the system default.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The core of AI-native infrastructure is not faster inference or cheaper GPUs, but providing governable, measurable, and evolvable system boundaries for model behavior, compute scarcity, and uncertainty—making AI systems deliverable, governable, and evolvable in production environments.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="why-we-need-a-more-rigorous-definition"&gt;Why We Need a More Rigorous Definition&lt;/h2&gt;
&lt;p&gt;The term &amp;ldquo;AI-native infrastructure/architecture&amp;rdquo; is being adopted by an increasing number of vendors, but its meaning is often oversimplified as &amp;ldquo;data centers better suited for AI&amp;rdquo; or &amp;ldquo;more complete AI platform delivery.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In practice, different vendors emphasize different aspects of AI-native infrastructure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cisco&lt;/strong&gt; emphasizes delivering AI-native infrastructure across &lt;strong&gt;edge/cloud/data center&lt;/strong&gt; domains, highlighting delivery paths where &amp;ldquo;open &amp;amp; disaggregated&amp;rdquo; and &amp;ldquo;fully integrated systems&amp;rdquo; coexist (e.g., Cisco Validated Designs).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HPE&lt;/strong&gt; emphasizes an &lt;strong&gt;open, full-stack AI-native architecture&lt;/strong&gt; for the entire AI lifecycle, model development, and deployment.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;NVIDIA&lt;/strong&gt; explicitly proposes an &lt;strong&gt;AI-native infrastructure tier&lt;/strong&gt; to support inference context reuse for long-context and agentic workloads.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For CTOs/CEOs, a definition that can guide strategy and organizational design must meet two criteria:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Clarify &lt;strong&gt;how the first-principles constraints of infrastructure have changed in the AI era&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Converge &amp;ldquo;AI-native&amp;rdquo; from a marketing adjective into &lt;strong&gt;verifiable architectural properties and operating mechanisms&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="authoritative-one-sentence-definition"&gt;Authoritative One-Sentence Definition&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;AI-native infrastructure&lt;/strong&gt; is:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;An infrastructure system and operating mechanism premised on &amp;ldquo;models/agents as execution subjects, compute as scarce assets, and uncertainty as the norm,&amp;rdquo; which closes the loop on &amp;ldquo;intent (API/Agent) → execution (Runtime) → resource consumption (Accelerator/Network/Storage) → economic and risk outcomes&amp;rdquo; through compute governance.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This definition contains two layers of meaning:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;: Not just a software/hardware stack, but also includes scaled delivery and systemic capabilities (consistent with vendors&amp;rsquo; emphasis on &amp;ldquo;full-stack integration/reference architectures/lifecycle delivery&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operating Model&lt;/strong&gt;: It inevitably rewrites organizational and operational methods, not just a technical upgrade—budget, risk, and release rhythm are strongly bound to the same governance loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="three-premises"&gt;Three Premises&lt;/h2&gt;
&lt;p&gt;The core premises of AI-native infrastructure are as follows. The diagram below illustrates the correspondence between these three premises and governance boundaries.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/definition/ai-native-infra-constitution-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/definition/ai-native-infra-constitution-en.svg" alt="Figure 1: Three constitutional premises of AI-native infrastructure" data-caption="Figure 1: Three constitutional premises of AI-native infrastructure"
width="1576"
height="696"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Three constitutional premises of AI-native infrastructure&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model-as-Actor&lt;/strong&gt;: Models/agents become &amp;ldquo;execution subjects&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute-as-Scarcity&lt;/strong&gt;: Compute (accelerators, interconnects, power consumption, bandwidth) becomes the core scarce asset&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Uncertainty-by-Default&lt;/strong&gt;: Behavior and resource consumption are highly uncertain (especially in agentic and long-context scenarios)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These three points collectively determine: the core task of AI-native infrastructure is not to &amp;ldquo;make systems more elegant,&amp;rdquo; but to &lt;strong&gt;make systems controllable, sustainable, and capable of scaled delivery under uncertain behavior.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="boundaries-what-ai-native-infrastructure-manages-and-what-it-doesnt"&gt;Boundaries: What AI-Native Infrastructure Manages and What It Doesn&amp;rsquo;t&lt;/h2&gt;
&lt;p&gt;In practical engineering, defining boundaries helps focus resources and capability development. The table below summarizes what AI-native infrastructure focuses on versus what it doesn&amp;rsquo;t:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Not focused on:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prompt design and business-level agent logic&lt;/li&gt;
&lt;li&gt;Individual model capabilities and training secrets&lt;/li&gt;
&lt;li&gt;Application-layer product features themselves&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Focused on:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Compute Governance&lt;/strong&gt;: Quotas, budgets, isolation/sharing, topology and interconnects, preemption and priorities, throughput/latency versus cost tradeoffs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Execution Form Engineering&lt;/strong&gt;: Unified operation, scheduling, and observability for training/fine-tuning/inference/batch processing/agentic workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Closed-Loop Mechanisms&lt;/strong&gt;: How intent is constrained, measured, and mapped to controllable resource consumption and economic/risk outcomes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="verifiable-architectural-properties-three-planes--one-loop"&gt;Verifiable Architectural Properties: Three Planes + One Loop&lt;/h2&gt;
&lt;p&gt;To facilitate understanding, the following sections introduce the core architectural properties of AI-native infrastructure.&lt;/p&gt;
&lt;p&gt;The diagram below shows the visualization of the three planes and the closed loop, facilitating rapid boundary alignment during reviews.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/definition/three-planes-one-loop-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/definition/three-planes-one-loop-en.svg" alt="Figure 2: Three Planes and One Loop reference architecture" data-caption="Figure 2: Three Planes and One Loop reference architecture"
width="1576"
height="476"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Three Planes and One Loop reference architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;Three Planes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Intent Plane&lt;/strong&gt;: APIs, MCP, Agent workflows, policy expressions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Execution Plane&lt;/strong&gt;: Training/inference/serving/runtime (including tool calls and state management)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Governance Plane&lt;/strong&gt;: Accelerator orchestration, isolation/sharing, quotas/budgets, SLO and cost control, risk policies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;The Loop:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Only with an &amp;ldquo;intent → consumption → cost/risk outcome&amp;rdquo; closed loop can it be called AI-native.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is also why NVIDIA elevates the sharing and reuse of &amp;ldquo;new state assets&amp;rdquo; like inference context to an independent AI-native infrastructure layer: essentially bringing the resource consequences of agentic/long-context into governable system boundaries.&lt;/p&gt;
&lt;h2 id="ai-native-vs-cloud-native-where-the-differences-lie"&gt;AI-Native vs Cloud Native: Where the Differences Lie&lt;/h2&gt;
&lt;p&gt;Cloud Native focuses on delivering services in distributed environments with portability, elasticity, observability, and automation. Its governance objects are primarily &lt;strong&gt;service/instance/request&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;AI-native infrastructure addresses a different set of structural problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Execution unit shift&lt;/strong&gt;: From service request/response to agent action/decision/side effect&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource constraint shift&lt;/strong&gt;: From elastic CPU/memory to hard GPU/throughput/token constraints and cost ceilings&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reliability pattern shift&lt;/strong&gt;: From &amp;ldquo;reliable delivery of deterministic systems&amp;rdquo; to &amp;ldquo;controllable operation of non-deterministic systems&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, AI-native is not &amp;ldquo;adding a model layer on top of cloud native,&amp;rdquo; but rather shifting the governance center from &lt;strong&gt;deployment&lt;/strong&gt; to &lt;strong&gt;governance&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="bringing-it-to-engineering-what-capabilities-ai-native-infrastructure-must-have"&gt;Bringing It to Engineering: What Capabilities AI-Native Infrastructure Must Have&lt;/h2&gt;
&lt;p&gt;To avoid &amp;ldquo;right concept, misaligned execution,&amp;rdquo; the following minimum closed-loop capabilities are listed.&lt;/p&gt;
&lt;h3 id="resource-model-making-gpu-context-and-token-first-class-resources"&gt;Resource Model: Making GPU, Context, and Token First-Class Resources&lt;/h3&gt;
&lt;p&gt;Cloud native abstracts CPU/memory into schedulable resources; AI-native must further bring the following resources under governance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GPU/Accelerator Resources&lt;/strong&gt;: Scheduled and governed by partitioning, sharing, isolation, and preemption&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context Resources&lt;/strong&gt;: Context windows, retrieval paths, cache hits, KV/inference state asset reuse, etc., which directly affect tokens and costs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Token/Throughput&lt;/strong&gt;: Become measurable capacity and cost carriers (can enter budgets, SLOs, and product strategies)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When tokens become &amp;ldquo;capacity units,&amp;rdquo; the platform is no longer just running services, but operating an &amp;ldquo;AI factory.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="budgets-and-policies-binding-costrisk-to-organizational-decisions"&gt;Budgets and Policies: Binding &amp;ldquo;Cost/Risk&amp;rdquo; to Organizational Decisions&lt;/h3&gt;
&lt;p&gt;AI systems cannot operate with a &amp;ldquo;ship and done&amp;rdquo; approach. Budgets and policies must become the control plane:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Trigger rate limiting/degradation when budgets are exceeded&lt;/li&gt;
&lt;li&gt;Trigger stricter verification or disable high-risk tools when risk increases&lt;/li&gt;
&lt;li&gt;Version releases and experiments are constrained by &amp;ldquo;budget/risk headroom&amp;rdquo; (institutionalizing release rhythm)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key is &lt;strong&gt;infrastructure solidifying organizational rules into executable policies&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="observability-and-audit-making-model-behavior-accountable-and-observable"&gt;Observability and Audit: Making Model Behavior Accountable and Observable&lt;/h3&gt;
&lt;p&gt;Traditional observability focuses on latency/error/traffic; AI-native must add at least three types of signals:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Behavior Signals&lt;/strong&gt;: Which tools the model called, which systems it read/wrote, what actions it took, what side effects it caused&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Signals&lt;/strong&gt;: Tokens, GPU time, cache hits, queue wait, interconnect bottlenecks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quality and Safety Signals&lt;/strong&gt;: Output quality, violation/over-privilege risks, rollback frequency and reasons&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without &amp;ldquo;behavior observability,&amp;rdquo; governance cannot be implemented.&lt;/p&gt;
&lt;h3 id="risk-governance-bringing-high-risk-capabilities-under-continuous-assessment-and-control"&gt;Risk Governance: Bringing High-Risk Capabilities Under Continuous Assessment and Control&lt;/h3&gt;
&lt;p&gt;When model capabilities approach thresholds that can &amp;ldquo;cause serious harm,&amp;rdquo; organizations need a systematic risk governance framework, not relying on single-point prompts or manual reviews.&lt;/p&gt;
&lt;p&gt;Can be split into two layers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;System-Level Trustworthiness Goals&lt;/strong&gt;: Organizational-level requirements for security, transparency, explainability, and accountability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Frontier Capability Readiness Assessment&lt;/strong&gt;: Tiered assessment of high-risk capabilities, launch thresholds, and mitigation measures&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The value lies in: transforming &amp;ldquo;safety/risk&amp;rdquo; from concepts into executable launch thresholds and operational policies.&lt;/p&gt;
&lt;h2 id="takeaways--checklist"&gt;Takeaways / Checklist&lt;/h2&gt;
&lt;p&gt;The following checklist can be used to determine whether an organization has entered the AI-native stage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do we treat models as &amp;ldquo;agents that act,&amp;rdquo; not as replaceable APIs?&lt;/li&gt;
&lt;li&gt;Do we bring compute and budgets into business SLAs and decision processes?&lt;/li&gt;
&lt;li&gt;Do we treat uncertainty as the default premise, not as an exception?&lt;/li&gt;
&lt;li&gt;Do we have audit, rollback, and accountability for model behavior?&lt;/li&gt;
&lt;li&gt;Do we have cross-team AI governance mechanisms, not single-point engineering optimizations?&lt;/li&gt;
&lt;li&gt;Can we explain the system&amp;rsquo;s operating boundaries, cost boundaries, and risk boundaries?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The essence of AI-native infrastructure lies in: taking models as behavior subjects, compute as scarce assets, and uncertainty as the norm, achieving deliverable, governable, and evolvable AI systems through governance and closed-loop mechanisms. Only by engineering these capabilities can organizations truly step into the AI-native stage.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.cisco.com/site/us/en/solutions/artificial-intelligence/infrastructure/index.html" target="_blank" rel="noopener"&gt;Cisco AI-Native Infrastructure - cisco.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.hpe.com/us/en/newsroom/blog-post/2023/12/introducing-an-ai-native-architecture-for-ai-driven-transformation.html" target="_blank" rel="noopener"&gt;HPE AI-native architecture - hpe.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.nvidia.com/blog/inside-the-nvidia-rubin-platform-six-new-chips-one-ai-supercomputer/" target="_blank" rel="noopener"&gt;NVIDIA Rubin: AI-native infrastructure tier - developer.nvidia.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lfnetworking.org/40362-2/" target="_blank" rel="noopener"&gt;LF Networking: becoming AI-native is a redefinition of the operating model - lfnetworking.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nist.gov/itl/ai-risk-management-framework" target="_blank" rel="noopener"&gt;NIST AI Risk Management Framework - nist.gov&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sre.google/workbook/error-budget-policy/" target="_blank" rel="noopener"&gt;Google SRE Workbook - Error Budgets - sre.google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/safety/preparedness/" target="_blank" rel="noopener"&gt;OpenAI Preparedness Framework - openai.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>AI-Native Infrastructure One-Page Reference Architecture: Three Planes + One Loop</title><link>https://jimmysong.io/book/ai-native-infra/reference-architecture/</link><pubDate>Sun, 18 Jan 2026 00:00:00 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/reference-architecture/</guid><description>Three planes (Intent, Execution, Governance) + closed-loop feedback for AI-native infrastructure architecture alignment.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The true value of architecture is enabling organizational consensus on complex systems within five minutes, not creating another new technology stack.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Industry-leading vendors emphasize different aspects: Cisco focuses more on AI-native infrastructure and reference designs such as Cisco Validated Designs; HPE emphasizes an open, full-stack AI-native architecture across the AI full lifecycle; NVIDIA explicitly proposes adding a new AI-native infrastructure tier for inference context reuse in long-context and agentic workloads. This chapter converges these perspectives into a &lt;strong&gt;verifiable architecture framework&lt;/strong&gt;: Three Planes + One Loop.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Note
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
The &amp;ldquo;architecture&amp;rdquo; in this chapter serves as a review framework, not a component checklist. The goal is to unify organizational language and review boundaries, not to reinvent the technology stack.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="one-page-architecture-overview"&gt;One-Page Architecture Overview&lt;/h2&gt;
&lt;p&gt;Below is the detailed reference architecture diagram for the three planes and closed loop of AI-native infrastructure, helping readers quickly establish an overall understanding:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/reference-architecture/three-planes-detailed-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/reference-architecture/three-planes-detailed-en.svg" alt="Figure 1: Three planes detailed architecture diagram" data-caption="Figure 1: Three planes detailed architecture diagram"
width="1783"
height="1485"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Three planes detailed architecture diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This diagram can be understood as: &lt;strong&gt;The new control plane (Intent) of AI-native infrastructure must be constrained by the Governance Plane and produce measurable resource consequences in the Execution Plane.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is also the common ground behind different vendor narratives: Cisco uses reference designs and delivery frameworks to make infrastructure capabilities scalable and replicable; HPE uses open/full-stack to cover lifecycle delivery; NVIDIA elevates the reuse of &amp;ldquo;context state assets&amp;rdquo; to an independent infrastructure layer. All three point to the same issue: &lt;strong&gt;incorporating AI resource consequences into governable system boundaries.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="core-capabilities-of-the-three-planes"&gt;Core Capabilities of the Three Planes&lt;/h2&gt;
&lt;p&gt;This section details the core capabilities of each of the three planes to help clarify focus areas during architecture reviews.&lt;/p&gt;
&lt;h3 id="intent-plane"&gt;Intent Plane&lt;/h3&gt;
&lt;p&gt;The Intent Plane is responsible for expressing &amp;ldquo;what I want,&amp;rdquo; including the following capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inference/Training APIs (entry points and contracts)&lt;/li&gt;
&lt;li&gt;MCP/Tool calling protocols and tool catalog (standardizing tool access as &amp;ldquo;declaratable capability boundaries&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;Agent/Workflow (breaking down tasks into executable steps)&lt;/li&gt;
&lt;li&gt;Policy as Intent: priorities, budgets, quotas, compliance/security constraints (front-loaded in the form of &amp;ldquo;intent&amp;rdquo;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Key Point&lt;/strong&gt;: The Intent Plane is not the starting point itself; the real starting point is—&lt;strong&gt;whether intent can be translated into executable and governable plans.&lt;/strong&gt; Otherwise, Agents/MCP will only amplify uncertainty: more tools, longer chains, larger state spaces, and more uncontrollable resource consumption.&lt;/p&gt;
&lt;p&gt;During architecture reviews, focus on these questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is intent declarable (contract) and rejectable (admission)?&lt;/li&gt;
&lt;li&gt;Does intent carry budget/priority/compliance constraints (policy as intent)?&lt;/li&gt;
&lt;li&gt;Is the translation from intent to execution traceable?&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="execution-plane"&gt;Execution Plane&lt;/h3&gt;
&lt;p&gt;The Execution Plane is responsible for landing intent into &amp;ldquo;actual execution,&amp;rdquo; mainly including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training, fine-tuning, inference serving, batch processing, agentic runtime&lt;/li&gt;
&lt;li&gt;&amp;ldquo;State and Context&amp;rdquo; services: cache/KV/vector/context memory, etc., for carrying inference context, retrieval results, and session state&lt;/li&gt;
&lt;li&gt;Full-chain observability hooks: token metering, GPU time, video memory, network traffic, storage I/O, queue wait times, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An industry trend worth emphasizing: as long-context and agentic workloads become widespread, &amp;ldquo;context&amp;rdquo; itself becomes a critical state asset and may even rise to become an independent infrastructure layer. NVIDIA explicitly proposes inference context memory storage in the Rubin platform, establishing an AI-native infrastructure tier to provide shared, low-latency inference context at the pod level to support reuse (for long-context and agentic workloads).&lt;/p&gt;
&lt;p&gt;Review points focus on three things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Is execution measurable&lt;/strong&gt;: Can attribution be done across token/GPU/network/storage dimensions?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Is state governable&lt;/strong&gt;: What are the lifecycle, reuse boundaries, and isolation strategies for context and cache?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Is observability closed-loop oriented&lt;/strong&gt;: Observability is not for &amp;ldquo;seeing,&amp;rdquo; but for &amp;ldquo;enabling governance to correct deviations.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="governance-plane"&gt;Governance Plane&lt;/h3&gt;
&lt;p&gt;The Governance Plane is the &amp;ldquo;core differentiator&amp;rdquo; of AI-native infrastructure, responsible for transforming resource scarcity and uncertainty into a controllable system:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Budget/quotas/billing: governing consumption across teams, tenants, projects, models, and agent tasks&lt;/li&gt;
&lt;li&gt;Isolation and sharing strategies: same-card sharing, video memory isolation, preemption, priorities, fairness&lt;/li&gt;
&lt;li&gt;Topology-aware scheduling: incorporating GPU, interconnect, network, and storage topology into placement (especially in training and high-throughput inference)&lt;/li&gt;
&lt;li&gt;Risk and compliance control: audits, policy enforcement points, sensitive data and access control&lt;/li&gt;
&lt;li&gt;Integration with FinOps/SRE/SecOps: incorporating cost, reliability, and risk into a single operational mechanism&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a vendor narrative perspective, this layer typically corresponds to &amp;ldquo;reference architecture + full-stack delivery&amp;rdquo;:
Cisco emphasizes accelerating and scaling delivery in AI infrastructure through &amp;ldquo;fully integrated systems + Cisco Validated Designs&amp;rdquo;; HPE emphasizes end-to-end delivery with &amp;ldquo;open, full-stack AI-native architecture&amp;rdquo; to support model development and deployment.&lt;/p&gt;
&lt;p&gt;The baseline question for Governance Plane reviews is: &lt;strong&gt;Can you make explainable resource allocation and degradation decisions under budget/risk constraints?&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="closed-loop-mechanism-explained"&gt;Closed-Loop Mechanism Explained&lt;/h2&gt;
&lt;p&gt;This section introduces the core workflow of the closed-loop mechanism to help understand the essential difference between AI-native and AI-ready.&lt;/p&gt;
&lt;div class="alert alert-tip-container"&gt;
&lt;div class="alert-tip-title px-2"&gt;
Significance of the Closed-Loop Mechanism
&lt;/div&gt;
&lt;div class="alert-tip px-2"&gt;
The closed loop is the most confusing yet critical dividing line between AI-native and &amp;ldquo;AI-ready.&amp;rdquo;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The minimal implementation of the closed loop includes four steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Admission&lt;/strong&gt;: Bind intent with policy at the entry point (budget, priority, compliance)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Translation&lt;/strong&gt;: Translate intent into executable plans (select runtime, resource specifications, topology preferences)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Metering&lt;/strong&gt;: End-to-end metering and attribution across tokens/GPU/network/storage&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enforcement&lt;/strong&gt;: Budget triggers degradation/rate limiting/preemption; risk triggers isolation/audits; SLO triggers scaling/routing&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In other words: &lt;strong&gt;The closed loop is not a &amp;ldquo;monitoring dashboard,&amp;rdquo; but a &amp;ldquo;governance-driven real-time correction mechanism.&amp;rdquo;&lt;/strong&gt;
If there is no closed loop for &amp;ldquo;intent → consumption → cost/risk outcomes,&amp;rdquo; systems can easily spin out of control across cost, risk, quality, and other dimensions.&lt;/p&gt;
&lt;p&gt;This is also why &amp;ldquo;AI-native&amp;rdquo; is often accompanied by changes in operating model: when system execution speed and resource consumption are amplified by models/agents, organizations must front-load governance mechanisms and institutionalize them. LF Networking also explicitly points out: becoming AI-native is not just a technical migration, but a redefinition of the operating model.&lt;/p&gt;
&lt;h2 id="practical-usage-of-the-one-page-architecture"&gt;Practical Usage of the One-Page Architecture&lt;/h2&gt;
&lt;p&gt;In subsequent chapters, this &amp;ldquo;one-page architecture&amp;rdquo; can be repeatedly reused as a review template:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Discussing MCP/Agent: Position them in the &lt;strong&gt;Intent Plane&lt;/strong&gt; and constrain with the closed loop (admission/translation) to avoid &amp;ldquo;intent proliferation&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Discussing runtime and platforms: Place in the &lt;strong&gt;Execution Plane&lt;/strong&gt;, focusing on observable, attributable, governable state assets (context/cache/KV/vector)&lt;/li&gt;
&lt;li&gt;Discussing GPUs, scheduling, costs: Ground in the &lt;strong&gt;Governance Plane&lt;/strong&gt;, using budget/isolation/topology/metering as leverage points&lt;/li&gt;
&lt;li&gt;Discussing enterprise implementation: Use the &lt;strong&gt;closed loop&lt;/strong&gt; to examine if it&amp;rsquo;s &amp;ldquo;truly AI-native&amp;rdquo; (whether cost/risk outcomes can be written back as executable policies)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you can only remember one sentence:
&lt;strong&gt;The determination of AI-native is not in &amp;ldquo;how many AI components are used,&amp;rdquo; but in &amp;ldquo;whether there exists an executable governance closed loop that constrains intent to controllable resource consequences and economic/risk outcomes.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The one-page reference architecture provides a unified systems language and review framework for AI-native infrastructure. Through the three planes of Intent, Execution, and Governance, combined with the closed-loop mechanism, organizations can achieve efficient collaboration in architecture design, resource governance, and risk control. Looking ahead, as AI-native capabilities continue to mature, the governance closed loop will become a core competitive advantage for enterprises implementing AI.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.nvidia.com/en-us/data-center/products/ai-enterprise/" target="_blank" rel="noopener"&gt;NVIDIA AI Enterprise Reference Architecture - nvidia.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/ai-infrastructure" target="_blank" rel="noopener"&gt;Google Cloud AI Infrastructure - cloud.google.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/architecture/well-architected/" target="_blank" rel="noopener"&gt;AWS Well-Architected Framework - aws.amazon.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Why Start with Compute Governance, Not API Design</title><link>https://jimmysong.io/book/ai-native-infra/compute-governance/</link><pubDate>Sun, 18 Jan 2026 05:45:13 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/compute-governance/</guid><description>Discussing Intent vs Consequence, why compute and cost are the first-order constraints of AI-native infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Compute and governance boundaries are the true foundation of AI-native infrastructure architecture.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The previous chapter presented a &amp;ldquo;Three Planes + One Closed Loop&amp;rdquo; reference architecture. This chapter focuses on a core CTO/CEO-level question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How should AI-native infrastructure be layered? What belongs in the &amp;ldquo;control plane&amp;rdquo; of APIs/Agents, what belongs in the &amp;ldquo;execution plane&amp;rdquo; of runtime, and what must be pushed down to the &amp;ldquo;governance plane (compute and economic constraints)&amp;rdquo;?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This question is critical because over the past year, many platform companies &amp;ldquo;pivoting to AI&amp;rdquo; have fallen into a common trap: &lt;strong&gt;treating AI as an API morphology change rather than a system constraint change&lt;/strong&gt;. When your system shifts from &amp;ldquo;serving requests&amp;rdquo; to &amp;ldquo;model behavior&amp;rdquo; (multi-step Agent actions with side effects), what truly determines system boundaries is often not the elegance of API design, but rather: &lt;strong&gt;whether compute, context, and economic constraints are institutionalized as enforceable governance boundaries&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The core argument of this chapter can be summarized as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI-native infrastructure must be designed starting from &amp;ldquo;Consequence&amp;rdquo; rather than stacking capabilities from &amp;ldquo;Intent&amp;rdquo;; the control plane is responsible for expressing intent, but the governance plane is responsible for bounding consequences.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-purpose-of-layering-engineering-the-binding-between-intent-and-resource-consequences"&gt;The Purpose of Layering: Engineering the Binding Between &amp;ldquo;Intent&amp;rdquo; and &amp;ldquo;Resource Consequences&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;In AI-native infrastructure, mechanisms like MCP, Agents, and Tool Calling enhance system capabilities while also introducing higher risks. These risks are not abstract &amp;ldquo;uncontrollability,&amp;rdquo; but rather engineering &amp;ldquo;unbudgetable consequences&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Path explosion in behavior, long contexts, and multi-round reasoning bring long-tail resource consumption;&lt;/li&gt;
&lt;li&gt;The same &amp;ldquo;intent&amp;rdquo; can lead to orders-of-magnitude differences in tokens, GPU time, and network/storage pressure;&lt;/li&gt;
&lt;li&gt;Without governance closed loops, systems will move toward &amp;ldquo;cost and risk runaway&amp;rdquo; while becoming &amp;ldquo;more capable.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, the fundamental purpose of layering is not abstract aesthetics, but achieving a hard constraint goal:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Ensure each layer can translate upper-layer &amp;ldquo;intent&amp;rdquo; into executable plans and produce measurable, attributable, and constrainable resource consequences.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words, layering is not about making architecture diagrams clearer, but about encoding &amp;ldquo;who expresses intent, who executes, and who bears consequences&amp;rdquo; into system structure.&lt;/p&gt;
&lt;h2 id="ai-native-infrastructure-five-layer-structure-and-three-planes-mapping"&gt;AI-Native Infrastructure Five-Layer Structure and &amp;ldquo;Three Planes&amp;rdquo; Mapping&lt;/h2&gt;
&lt;p&gt;To help understand the layering logic, the diagram below refines the &amp;ldquo;Three Planes&amp;rdquo; architecture from the previous chapter, proposing a more actionable &amp;ldquo;five-layer structure&amp;rdquo;:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/compute-governance/intent-consequence-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/compute-governance/intent-consequence-en.svg" alt="Figure 1: Layered governance relationship from intent to consequence" data-caption="Figure 1: Layered governance relationship from intent to consequence"
width="1576"
height="156"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Layered governance relationship from intent to consequence&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;Top two layers = &lt;strong&gt;Intent Plane&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Middle two layers = &lt;strong&gt;Execution Plane&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Bottom layer = &lt;strong&gt;Governance Plane&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below is a detailed expansion of the five-layer architecture, showing the primary responsibilities and typical capabilities of each layer:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/compute-governance/five-layers-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/compute-governance/five-layers-en.svg" alt="Figure 2: Five-layer architecture diagram" data-caption="Figure 2: Five-layer architecture diagram"
width="2131"
height="1476"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Five-layer architecture diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;It is important to note that &lt;strong&gt;MCP belongs to Layer 4 (Intent and Orchestration Layer), not Layer 1&lt;/strong&gt;. The reason is that MCP primarily defines &amp;ldquo;how capabilities are exposed to models/Agents and how they are invoked,&amp;rdquo; addressing control plane consistency and composability, but does not directly take responsibility for &amp;ldquo;how the resource consequences of capability invocations are metered, constrained, and attributed.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="mcpagent-is-the-new-control-plane-but-must-be-constrained-by-the-governance-layer"&gt;MCP/Agent is the &amp;ldquo;New Control Plane,&amp;rdquo; But Must Be Constrained by the Governance Layer&lt;/h2&gt;
&lt;p&gt;MCP/Agent is called the &amp;ldquo;new control plane&amp;rdquo; because it moves system &amp;ldquo;decisions&amp;rdquo; from static code to dynamic processes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Tool catalogs + schemas + invocations&amp;rdquo; form a composable capability surface;&lt;/li&gt;
&lt;li&gt;Agents complete tasks by selecting tools, invoking tools, and iterating reasoning;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Policy&amp;rdquo; is no longer just in code branches but expressed as routing, priorities, budgets, and compliance intent.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, it is crucial to emphasize an infrastructure stance, which is also the foundation of this chapter:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;MCP/Agent can express intent, but the key to AI-native is: intent must be translated into governable execution plans and metered and constrained within economically viable boundaries.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This statement aims to correct two common misconceptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Control plane is not the starting point&lt;/strong&gt;: Treating MCP/Agent as &amp;ldquo;the entry point for AI platform upgrades&amp;rdquo; easily leads systems down a &amp;ldquo;capability-first&amp;rdquo; path;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Governance plane is the baseline&lt;/strong&gt;: When compute and tokens become capacity units, any unconstrained &amp;ldquo;intent expression&amp;rdquo; will leak as cost, latency, or risk.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, system layering should be clear: Layer 4 is responsible for &amp;ldquo;expression,&amp;rdquo; Layers 1/2/3 are responsible for &amp;ldquo;fulfillment and bearing consequences,&amp;rdquo; and the governance loop is responsible for &amp;ldquo;correction.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="context-is-rising-to-a-new-infrastructure-layer"&gt;&amp;ldquo;Context&amp;rdquo; Is Rising to a New Infrastructure Layer&lt;/h2&gt;
&lt;p&gt;In traditional cloud-native systems, request states are mostly short-lived, relying more on application-layer state management. Infrastructure typically only handles &amp;ldquo;computation and networking&amp;rdquo; without needing to understand the economic value of &amp;ldquo;request context.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;AI-native infrastructure is different. Long-context, multi-turn dialogue, and multi-agent reasoning mean &lt;strong&gt;inference state often survives across requests&lt;/strong&gt; and directly determines throughput and cost. In particular, KV cache and context reuse are evolving from &amp;ldquo;performance optimization techniques&amp;rdquo; to &amp;ldquo;platform capacity structures.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This can be summarized as an infrastructure law:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;When a state asset (context/state) becomes a determinant variable of system cost and throughput, it rises from application detail to infrastructure layer.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This trend is gradually appearing in the industry: inference context and KV reuse are explicitly elevated to &amp;ldquo;infrastructure layer&amp;rdquo; capability development directions. Future expansion will include distributed KV, parameter caching, inference routing state, Agent memory, and a series of &amp;ldquo;state assets.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-foundation-of-ai-native-infrastructure-reference-designs-and-delivery-systems"&gt;The Foundation of AI-Native Infrastructure: Reference Designs and Delivery Systems&lt;/h2&gt;
&lt;p&gt;AI-native infrastructure is far more than &amp;ldquo;buying a few GPUs.&amp;rdquo; Compared to traditional internet services, AI workloads have three characteristics that make the &amp;ldquo;foundation&amp;rdquo; more engineered and productized:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Stronger topology dependencies&lt;/strong&gt;: Network fabric, interconnects, storage tiers, and GPU affinity determine available throughput;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Harder scarcity constraints&lt;/strong&gt;: GPU and token throughput boundaries are less &amp;ldquo;elastic&amp;rdquo; than CPU/memory;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Higher delivery complexity&lt;/strong&gt;: Multi-cluster, multi-tenant, multi-model/multi-framework coexistence means only &amp;ldquo;replicable delivery&amp;rdquo; can scale.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, AI Infra is not just a component list, but must include &amp;ldquo;scalable delivery and repeatable operation&amp;rdquo; system capabilities:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Reference Designs (validated designs)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Codify &amp;ldquo;correct topology and ratios&amp;rdquo; into reusable solutions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Automated Delivery&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Institutionalize deployment, upgrade, scaling, rollback, and capacity planning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Governance Implementation&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make budgeting, isolation, metering, and auditing default capabilities rather than after-the-fact patches.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a CTO/CEO perspective, this means: what you purchase is not &amp;ldquo;hardware&amp;rdquo; but a &amp;ldquo;delivery system for predictable capacity.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="layered-responsibility-boundaries-from-a-ctoceo-perspective"&gt;&amp;ldquo;Layered Responsibility Boundaries&amp;rdquo; from a CTO/CEO Perspective&lt;/h2&gt;
&lt;p&gt;To facilitate internal alignment on &amp;ldquo;who is responsible for what and what is the cost of failure,&amp;rdquo; the table below maps &amp;ldquo;technical layers&amp;rdquo; to &amp;ldquo;organizational responsibilities,&amp;rdquo; avoiding the scenario where platform teams only build control planes while no one bears consequence boundaries.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Typical Capabilities&lt;/th&gt;
&lt;th&gt;Primary Owner (Recommended)&lt;/th&gt;
&lt;th&gt;Cost of Failure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Layer 5 Business Interface&lt;/td&gt;
&lt;td&gt;SLA, product experience, business goals&lt;/td&gt;
&lt;td&gt;Product / Business&lt;/td&gt;
&lt;td&gt;Customer experience and revenue impact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layer 4 Intent/Orchestration (MCP/Agent)&lt;/td&gt;
&lt;td&gt;Capability catalogs, workflow, policy expression&lt;/td&gt;
&lt;td&gt;App / Platform / AI Eng&lt;/td&gt;
&lt;td&gt;Behavior runaway, tool abuse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layer 3 Execution (Runtime)&lt;/td&gt;
&lt;td&gt;Serving, batching, routing, caching policies&lt;/td&gt;
&lt;td&gt;AI Platform / Infra&lt;/td&gt;
&lt;td&gt;Insufficient throughput, latency jitter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layer 2 Context/State&lt;/td&gt;
&lt;td&gt;KV/cache/context tier&lt;/td&gt;
&lt;td&gt;Infra + AI Platform&lt;/td&gt;
&lt;td&gt;Token cost spike, throughput collapse&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Layer 1 Compute/Governance&lt;/td&gt;
&lt;td&gt;Quotas, isolation, topology scheduling, metering&lt;/td&gt;
&lt;td&gt;Infra / FinOps / SRE&lt;/td&gt;
&lt;td&gt;Budget explosion, resource contention, incident spillover&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: AI-Native Infrastructure Layer and Organizational Responsibility Mapping
&lt;/figcaption&gt;
&lt;p&gt;As you can see, &lt;strong&gt;the organizational challenge of AI-native is not in &amp;ldquo;whether we have agents,&amp;rdquo; but in &amp;ldquo;whether inter-layer closed loops are established&amp;rdquo;&lt;/strong&gt;. When model-driven amplification of consequences occurs, organizations must institutionalize governance mechanisms as platform capabilities: executable budgets, explainable consequences, attributable anomalies, and rewritable policies. This is the true meaning of &amp;ldquo;starting from compute governance&amp;rdquo; rather than &amp;ldquo;starting from API design.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The layered design of AI-native infrastructure centers on engineering the binding between &amp;ldquo;intent&amp;rdquo; and &amp;ldquo;resource consequences.&amp;rdquo; The control plane is responsible for expressing intent, while the governance plane is responsible for bounding consequences. Only by institutionalizing governance mechanisms as platform capabilities can we ensure cost, risk, and capacity remain controllable while enhancing capabilities. As context, state assets, and other new variables become infrastructure, AI Infra delivery systems will continue to evolve, becoming the foundation for sustainable enterprise innovation.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://sre.google/sre-book/capacity-planning/" target="_blank" rel="noopener"&gt;Google SRE - Capacity Planning - sre.google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/architecture/well-architected/" target="_blank" rel="noopener"&gt;AWS Well-Architected - Cost Optimization - aws.amazon.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/architecture/guide/cost-management/finops/" target="_blank" rel="noopener"&gt;Microsoft FinOps for AI - learn.microsoft.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Operating and Governing AI-Native Infrastructure: Metrics, Budget, Isolation, Sharing, SLO to Cost</title><link>https://jimmysong.io/book/ai-native-infra/metrics-budget-isolation/</link><pubDate>Sun, 18 Jan 2026 04:17:45 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/metrics-budget-isolation/</guid><description>Analyzing the closed-loop governance of metrics, budgets, isolation, and sharing in AI-native infrastructure, and explaining how SLO maps to cost and risk.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The key to governing AI-native infrastructure lies in how to institutionalize the closed-loop management of costs and risks arising from uncertainty.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In the cloud-native era, system operations were typically considered &amp;ldquo;basically deterministic&amp;rdquo;: request paths were predictable, resource curves were relatively stable, and scaling could respond promptly to load changes. However, entering the AI era, this assumption no longer holds—&lt;strong&gt;uncertainty has become the norm&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This chapter aims to provide CTOs/CEOs with key conclusions for architecture reviews:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The starting point of AI-native infrastructure is to treat uncertainty as the default input;&lt;/strong&gt;
&lt;strong&gt;The goal is to achieve closed-loop governance of the resource consequences (cost, risk, experience) arising from uncertainty.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is also why &amp;ldquo;becoming AI-native&amp;rdquo; in organizational contexts increasingly points to the reshaping of operational methods and governance models: when system consequences are amplified, governance must be institutionalized.&lt;/p&gt;
&lt;h2 id="what-is-an-uncertain-system"&gt;What is an &amp;ldquo;Uncertain System&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;In this handbook, &amp;ldquo;uncertainty&amp;rdquo; does not refer to randomness in the probabilistic sense, but to three types of phenomena in engineering practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unpredictable behavior&lt;/strong&gt;: execution paths change dynamically with model inference, especially evident in agentic processes (Agent intelligent workflows).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unpredictable resource consumption&lt;/strong&gt;: tokens, KV cache, tool calls, I/O, and network overhead exhibit long-tail and burst characteristics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Non-linear consequences&lt;/strong&gt;: the same &amp;ldquo;intent&amp;rdquo; can produce cost and risk outcomes differing by orders of magnitude.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, the infrastructure problem of AI-native infrastructure has shifted from &amp;ldquo;how to make the system more elegant&amp;rdquo; to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How to ensure the system maintains economic viability, controllability, and recoverability when worst-case scenarios occur.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;During architecture reviews, if you cannot answer &amp;ldquo;what is the worst case, where are the upper bounds, and how to degrade/rollback when triggered,&amp;rdquo; you are still reviewing the 惯性 extension of deterministic systems, not true AI-native systems.&lt;/p&gt;
&lt;h2 id="major-sources-of-uncertainty"&gt;Major Sources of Uncertainty&lt;/h2&gt;
&lt;p&gt;The following table summarizes common sources of uncertainty in AI-native infrastructure and their specific manifestations, facilitating quick reference for CTOs/CEOs.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Manifestations&lt;/th&gt;
&lt;th&gt;Impact Areas&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Behavior Uncertainty&lt;/td&gt;
&lt;td&gt;Agent task decomposition path changes, tool selection and call sequence changes, failure retry and reflection&lt;/td&gt;
&lt;td&gt;Cost, Risk, Resilience&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Demand Uncertainty&lt;/td&gt;
&lt;td&gt;Concurrency and burst, long-tail requests, multi-tenant interference (noisy neighbor)&lt;/td&gt;
&lt;td&gt;Resource pools, Experience, Isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State Uncertainty&lt;/td&gt;
&lt;td&gt;Context reuse across requests, KV cache migration and sharing&lt;/td&gt;
&lt;td&gt;Performance, Cost, Governance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure Uncertainty&lt;/td&gt;
&lt;td&gt;High sensitivity to network/storage/interconnect, congestion and jitter amplified into tail latency&lt;/td&gt;
&lt;td&gt;Experience, Cost, Stability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Sources and Manifestations of Uncertainty in AI-Native Infrastructure
&lt;/figcaption&gt;
&lt;h3 id="behavior-uncertainty"&gt;Behavior Uncertainty&lt;/h3&gt;
&lt;p&gt;Behavior uncertainty is mainly reflected in changes to agent task decomposition paths, dynamic adjustment of tool selection and call sequences, and path explosion caused by failure retry, reflection, and multi-round planning. Tools and contexts are combined through standard interfaces (such as MCP protocol integration), significantly expanding system capability surfaces while making branch space a governance challenge.&lt;/p&gt;
&lt;p&gt;More critically, tool calls are not &amp;ldquo;free external functions&amp;rdquo; - they occupy context windows and consume token budgets, amplifying cost and tail latency pressures. Therefore, behavior uncertainty is not merely &amp;ldquo;feature flexibility&amp;rdquo; at the product layer, but &amp;ldquo;cost and risk elasticity&amp;rdquo; at the platform layer, which must be budgeted, capped, and made auditable.&lt;/p&gt;
&lt;h3 id="demand-uncertainty"&gt;Demand Uncertainty&lt;/h3&gt;
&lt;p&gt;Demand uncertainty includes concurrency and burst (peaks), long-tail requests (ultra-long contexts, complex reasoning), and mutual interference under multi-tenancy (noisy neighbor). This drives capacity planning from &amp;ldquo;average capacity&amp;rdquo; to &amp;ldquo;tail capacity + governance strategies.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In AI-native infrastructure, experience and cost are often determined not by average requests, but by the combination of tail requests: a small number of long-chain, long-context, tool-intensive requests can overwhelm shared resource pools. Therefore, demand uncertainty requires answering: &lt;strong&gt;which requests deserve guarantees, which must be throttled, and which should be isolated.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="statecontext-uncertainty"&gt;State/Context Uncertainty&lt;/h3&gt;
&lt;p&gt;State uncertainty is the most underestimated category in the AI era: &lt;strong&gt;context is a state asset&lt;/strong&gt;, and it often exists across requests. When inference state / KV cache is elevated to a reusable, shareable, migratable system capability, it is no longer an application detail but a decisive variable for throughput and unit cost. NVIDIA in public materials identifies &lt;em&gt;Inference Context Memory Storage&lt;/em&gt; as a new infrastructure layer, pointing to state reuse and sharing requirements for long-context and agentic workloads.&lt;/p&gt;
&lt;p&gt;The conclusion is: &lt;strong&gt;&amp;ldquo;context/state&amp;rdquo; has changed from optional optimization to a critical infrastructure asset that must be meterable, allocable, and governable.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="infrastructure-uncertainty"&gt;Infrastructure Uncertainty&lt;/h3&gt;
&lt;p&gt;AI workloads are far more sensitive to network, interconnect, and storage than traditional microservice workloads. Congestion, packet loss, and I/O jitter are amplified into tail latency and job completion time instability, creating &amp;ldquo;non-linear consequences&amp;rdquo; for experience and cost.&lt;/p&gt;
&lt;p&gt;This type of uncertainty usually cannot be solved through &amp;ldquo;component selection&amp;rdquo; but requires &lt;strong&gt;end-to-end path engineering constraints&lt;/strong&gt;: from topology, bandwidth, and queuing, to transport protocols, isolation strategies, and congestion control—all must be incorporated into the governance plane, not just the operations plane.&lt;/p&gt;
&lt;h2 id="how-uncertainty-amplifies-across-layers"&gt;How Uncertainty Amplifies Across Layers&lt;/h2&gt;
&lt;p&gt;The diagram below illustrates the closed-loop relationship between metrics, budgets, and isolation strategies, emphasizing that governance must be rewritable.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/metrics-budget-isolation/slo-to-cost-loop-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/metrics-budget-isolation/slo-to-cost-loop-en.svg" alt="Figure 1: SLO to cost feedback loop" data-caption="Figure 1: SLO to cost feedback loop"
width="1656"
height="456"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: SLO to cost feedback loop&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The flowchart below demonstrates the cross-layer amplification path of uncertainty in AI-native infrastructure:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/metrics-budget-isolation/uncertainty-amplification-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/metrics-budget-isolation/uncertainty-amplification-en.svg" alt="Figure 2: Uncertainty amplification across layers" data-caption="Figure 2: Uncertainty amplification across layers"
width="1579"
height="1305"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Uncertainty amplification across layers&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Typical phenomena include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Agent branch explosion&lt;/strong&gt;: more tools and composable paths make tail costs increasingly uncontrollable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context inflation&lt;/strong&gt;: long contexts and multi-round reasoning make KV cache a performance bottleneck and cost black hole.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource contention distortion&lt;/strong&gt;: GPU/network contention under multi-tenancy makes &amp;ldquo;average performance&amp;rdquo; meaningless—tails must be governed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, the core of AI-native is not &amp;ldquo;making execution stronger,&amp;rdquo; but enabling you to stably answer three questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Where are the upper bounds&lt;/strong&gt; (budgets, steps, call counts, state occupancy)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What to do when crossing boundaries&lt;/strong&gt; (degradation, rollback, isolation, blocking)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How results are rewritten&lt;/strong&gt; (policy iteration and cost correction)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="engineering-response-of-ai-native-infrastructure"&gt;Engineering Response of AI-Native Infrastructure&lt;/h2&gt;
&lt;p&gt;Enterprises can refer to the following five &amp;ldquo;hard standards&amp;rdquo; during reviews—missing any one means inability to achieve closed-loop governance of uncertainty.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Admission: Ingress Admission Control&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implement tiered admission for requests with ultra-long contexts, oversized tool graphs, or ultra-high budgets&lt;/li&gt;
&lt;li&gt;Bind &amp;ldquo;budget, priority, compliance&amp;rdquo; as part of intent (policy as intent)&lt;/li&gt;
&lt;li&gt;Clearly communicate rejection reasons and explain why requests are denied&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Key Point
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
The responsibility of admission is not &amp;ldquo;to allow features,&amp;rdquo; but to write consequence constraints into the contract.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Translation: Intent Translation to Governable Execution Plans&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Select runtime, routing/batching strategies, and caching strategies for requests&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Cap&amp;rdquo; agent workflows: maximum steps, maximum tool calls, maximum tokens&lt;/li&gt;
&lt;li&gt;Include fallback paths: deterministic alternatives, cached answers, manual/rule-based fallbacks&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Key Point
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
Upgrade from &amp;ldquo;prompt-driven execution&amp;rdquo; to &amp;ldquo;plan-driven execution&amp;rdquo;—plans must be understandable and constrainable by the governance plane.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Metering: End-to-End Metering and Attribution&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Meter tokens, GPU time, KV cache footprint, I/O, and network for each request/agent task&lt;/li&gt;
&lt;li&gt;Attribute by tenant, project, model, and tool to form cost and quality metrics&lt;/li&gt;
&lt;li&gt;Separately label &amp;ldquo;tail overhead&amp;rdquo; so long-tail costs no longer hide in averages&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Key Point
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
No ledger means no budget; no attribution means no governance, let alone ROI.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Enforcement: Budget and Degradation Mechanisms&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Budget triggers: rate limiting, degradation, preemption, queuing (by priority and tenant isolation)&lt;/li&gt;
&lt;li&gt;Risk triggers: isolation&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The core of AI-native infrastructure governance lies in front-loading uncertainty, layered metering, policy feedback, and institutionalized constraints to form a closed loop of cost and risk. Only with engineering mechanisms such as Admission, Translation, Metering, and Enforcement can systems achieve economically viable, controllable, and recoverable operations under normalized uncertainty.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://sre.google/sre-book/service-level-objectives/" target="_blank" rel="noopener"&gt;Google SRE Book - Service Level Objectives - google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.finops.org/" target="_blank" rel="noopener"&gt;FinOps Foundation - AI Cost Management - finops.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/policies/usage-policies/" target="_blank" rel="noopener"&gt;OpenAI Usage Policies - openai.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Organization and Culture: How the Operating Model Changes</title><link>https://jimmysong.io/book/ai-native-infra/operating-model/</link><pubDate>Sun, 18 Jan 2026 04:18:02 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/operating-model/</guid><description>Redrawing boundaries across platform, infra, ML, and security, and transforming accountability and collaboration in the AI era.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The compute governance closed loop is the foundational safeguard for sustainable innovation in AI-native organizations.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Perspective
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
API/Agent/MCP solve &amp;ldquo;how intent is expressed,&amp;rdquo; while compute governance addresses &amp;ldquo;whether the resource consequences of intent are economically viable and risk-controllable.&amp;rdquo; In the AI era, the latter becomes a prerequisite for the former. API-first without governance only amplifies costs and uncertainty, pushing organizations into the trap of &amp;ldquo;functional but unsustainable.&amp;rdquo;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The FinOps Foundation states directly in &amp;ldquo;Scaling Kubernetes for AI/ML Workloads with FinOps&amp;rdquo; that Kubernetes elasticity can easily evolve into a &lt;strong&gt;runaway cost problem&lt;/strong&gt;. Therefore, FinOps should not be just cost reporting, but must become a shared operating model where every scaling decision simultaneously answers two questions: &lt;strong&gt;Are performance SLOs met?&lt;/strong&gt;, and &lt;strong&gt;Is it affordable?&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="the-challenge-of-api-first-implicit-assumptions-in-the-ai-era"&gt;The Challenge of API-first &amp;ldquo;Implicit Assumptions&amp;rdquo; in the AI Era&lt;/h2&gt;
&lt;p&gt;The diagram below shows the boundary relationships and accountability chains between platform, ML, and security teams.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/operating-model/operating-model-boundaries-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/operating-model/operating-model-boundaries-en.svg" alt="Figure 1: Organizational boundaries and accountability chain" data-caption="Figure 1: Organizational boundaries and accountability chain"
width="1616"
height="456"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Organizational boundaries and accountability chain&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The intuitive path of API-first is: first make the interfaces and workflows work, then gradually optimize performance and cost through engineering. In AI-native infrastructure, this path often fails because it relies on three implicit assumptions that no longer hold in the AI era.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Assumption 1: Resources are not the core scarcity&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Traditional software bets scarcity on engineering efficiency, throughput, and stability; whereas AI-native infrastructure scarcity comes primarily from asset boundaries like &lt;strong&gt;GPU/interconnect/power consumption&lt;/strong&gt;. Scarcity is no longer &amp;ldquo;slow to scale,&amp;rdquo; but &amp;ldquo;hard to scale and expensive,&amp;rdquo; constrained by both supply chain and datacenter conditions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Assumption 2: Request costs are predictable&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Traditional request cost distribution is relatively stable; AI requests are inherently long-tailed: branching in agentic tasks, inflation of long contexts, and chain amplification of tool calls all make tokens and GPU time into random variables that cannot be linearly extrapolated. You think you&amp;rsquo;re scaling &amp;ldquo;QPS,&amp;rdquo; but actually you&amp;rsquo;re scaling &amp;ldquo;total cost of tail probability events.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Assumption 3: State is ephemeral and discardable&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The cloud-native era emphasized stateless scaling with externalized state; but on the inference side, &lt;strong&gt;inference state/context reuse&lt;/strong&gt; often determines whether unit costs are controllable. NVIDIA describes this in Rubin&amp;rsquo;s ICMS (Inference Context Memory Storage) as the &amp;ldquo;context storage challenge brought by new inference paradigms&amp;rdquo;: KV cache needs reuse across sessions/services, sequence length growth causes linear KV cache inflation, forcing persistence and shared access, forming a &amp;ldquo;new context tier,&amp;rdquo; and proving with TPS and energy efficiency gains that this is not a nice-to-have, but a threshold for scalability.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Conclusion
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
In AI-native infrastructure, state and compute governance have become prerequisites for &amp;ldquo;whether it can run,&amp;rdquo; not post-optimization items.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="the-nature-of-compute-governance-what-is-being-governed"&gt;The Nature of Compute Governance: What is Being Governed&lt;/h2&gt;
&lt;p&gt;&amp;ldquo;Compute governance&amp;rdquo; is often misunderstood as &amp;ldquo;managing GPUs,&amp;rdquo; but what truly needs governance is &lt;strong&gt;the resource consequences of intent&lt;/strong&gt;. More precisely, it&amp;rsquo;s governing the combined effects of four types of objects:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Token Economics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each request/task&amp;rsquo;s token consumption, context inflation, implicit token tax from tool definitions and intermediate results, ultimately directly mapping to cost and latency.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Accelerator Time&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPU time, memory footprint, batching strategies, and the impact of routing and cache hits on effective throughput. The key is not &amp;ldquo;whether there are GPUs,&amp;rdquo; but &amp;ldquo;whether output per unit GPU time is controllable.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Interconnect and Storage (Fabric &amp;amp; Storage)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Network and storage pressures from training all-reduce, inference KV/cache sharing, and cross-service data movement. AI performance and cost are often amplified by fabric, not by APIs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Organizational Budget and Risk (Budget &amp;amp; Risk)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-tenant isolation, fairness, audit, compliance, and accountability. These determine whether the system can scale to multiple teams/business lines, not just scaling demos to more instances.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The FinOps Foundation also emphasizes: AI/ML cost drivers are not just GPUs; storage (checkpoints/embeddings/artifacts), network (distributed training/cross-AZ), and additional licensing and marketplace fees often &amp;ldquo;quietly exceed compute.&amp;rdquo; Therefore, governance objects must cover end-to-end, not just stare at inference bills.&lt;/p&gt;
&lt;h2 id="mcpagent-amplification-effects-under-governance-gaps"&gt;MCP/Agent: Amplification Effects Under Governance Gaps&lt;/h2&gt;
&lt;p&gt;MCP/Agent expand the &amp;ldquo;capability surface,&amp;rdquo; but simultaneously make cost curves steeper, especially showing exponential amplification when governance is missing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;More tools, more branches&lt;/strong&gt;: Planning space expands, tail probability rises, cost volatility becomes uncontrollable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool definitions and intermediate results consume context&lt;/strong&gt;: Directly consuming context window and tokens, translating to cost and latency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stronger tool usage triggers more external I/O&lt;/strong&gt;: External system calls, network round trips, and data movement all enter the overall cost function.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Anthropic explicitly states in &amp;ldquo;Code execution with MCP&amp;rdquo; that direct tool calls increase cost and latency due to tool definitions and intermediate results consuming context window; when tool numbers rise to hundreds or thousands, this becomes a scalability bottleneck, thus proposing code execution forms to improve efficiency and reduce token consumption.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Conclusion
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
In the MCP/Agent era, governance is not about suppressing innovation, but making innovation sustainable within budget boundaries. Without governance, agents are not productivity tools, but cost amplifiers.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="minimal-implementation-path-for-compute-governance-first"&gt;Minimal Implementation Path for &amp;ldquo;Compute Governance First&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;You don&amp;rsquo;t have to bind to any vendor, but you must implement a &amp;ldquo;minimum viable governance stack.&amp;rdquo; The goal is not perfection, but giving the system controllable boundary conditions from day one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Admission and Budget (Admission + Budget)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set budgets and priorities for workload types (training/inference/agent tasks).&lt;/li&gt;
&lt;li&gt;Include budget, max steps, max tokens, max tool calls in policy-as-intent, and enforce at the entry point.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="alert alert-tip-container"&gt;
&lt;div class="alert-tip-title px-2"&gt;
Practice Recommendation
&lt;/div&gt;
&lt;div class="alert-tip px-2"&gt;
FinOps&amp;rsquo; core view is: embed FinOps early into architecture, making every scaling decision simultaneously answer &amp;ldquo;performance&amp;rdquo; and &amp;ldquo;affordability,&amp;rdquo; otherwise bills only get attention when incidents occur.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;End-to-End Metering and Attribution (Metering + Attribution)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;At minimum achieve one traceable chain: request/agent → tokens → GPU time/memory → network/storage → cost attribution (tenant/project/model/tool).&lt;/li&gt;
&lt;li&gt;Without attribution, there is no governance; without governance, enterprise scaling is impossible, because costs and responsibilities cannot align, and organizations will internally 消耗 on &amp;ldquo;who consumed the budget.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Isolation and Sharing (Isolation + Sharing)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Sharing&lt;/strong&gt; for improving utilization; &lt;strong&gt;isolation&lt;/strong&gt; for reducing risk. Both must exist simultaneously, not either/or.&lt;/li&gt;
&lt;li&gt;CNCF&amp;rsquo;s Cloud Native AI report notes: GPU virtualization and sharing (like MIG, MPS, DRA, etc.) can improve utilization and reduce costs, but requires careful orchestration and management, and demands collaboration between AI and cloud-native engineering teams.&lt;/li&gt;
&lt;li&gt;The key to governance is not choosing sharing or isolation, but making it an executable policy: who shares under what conditions, who isolates under what conditions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Topology and Network as First-Class Citizens (Topology + Fabric First)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI training and high-throughput inference are highly sensitive to network characteristics.&lt;/li&gt;
&lt;li&gt;Cisco&amp;rsquo;s AI-ready infrastructure design guides and related CVD/Design Zone emphasize: building high-performance, lossless Ethernet fabric for AI/ML workloads, and delivering reference architectures and deployment guides through validated designs.&lt;/li&gt;
&lt;li&gt;This means topology is not &amp;ldquo;the datacenter team&amp;rsquo;s business,&amp;rdquo; but a core variable determining whether JCT, tail latency, and capacity models hold.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Context/State Becomes a Governance Object (Context as a Governed Asset)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When long-context and agentic become mainstream, KV cache and inference context reuse will directly determine unit costs.&lt;/li&gt;
&lt;li&gt;NVIDIA&amp;rsquo;s ICMS defines this as a &amp;ldquo;new context tier&amp;rdquo; for solving KV cache reuse and shared access, emphasizing TPS/energy efficiency gains.&lt;/li&gt;
&lt;li&gt;In this era, treating context as a temporary variable is actively relinquishing cost control.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="anti-pattern-checklist"&gt;Anti-Pattern Checklist&lt;/h2&gt;
&lt;p&gt;The following anti-patterns are not &amp;ldquo;engineering inelegance,&amp;rdquo; but will trigger organizational 失控，worth vigilance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;API-first, treating governance as post-optimization&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Result: System launches first, only to discover unit costs and tail latency are uncontrollable, can only &amp;ldquo;hard brake&amp;rdquo; through feature limiting/rate limiting, ultimately locking the product roadmap.&lt;/li&gt;
&lt;li&gt;Contrast: FinOps points out elasticity easily becomes runaway costs, must advance cost governance into architecture decisions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Treating MCP/Agent as capability accelerators, not cost amplifiers&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Result: More tools make it &amp;ldquo;smarter,&amp;rdquo; but token and external call costs rise exponentially, engineering teams forced to fight systemic amplification with &amp;ldquo;more complex prompts and rules.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Contrast: Anthropic notes tool definitions and intermediate results consume context, increase cost and latency, proposing more efficient execution forms as the scalability path.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Only buying GPUs, without sharing/isolation and orchestration&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Result: Low utilization, severe contention, budget explosion, organizations internally blame each other &amp;ldquo;who&amp;rsquo;s grabbing resources, who&amp;rsquo;s burning money.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Contrast: CNCF Cloud Native AI report emphasizes sharing/virtualization improves utilization, but must match orchestration and collaboration mechanisms.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Ignoring network and topology, treating AI as ordinary microservices&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Result: Training JCT and inference tail latency amplified by network, capacity planning and cost models fail, more scaling makes it more unstable.&lt;/li&gt;
&lt;li&gt;Contrast: Cisco in AI-ready network design and validated designs makes requirements like lossless Ethernet fabric critical foundations for AI/ML.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The first-principle entry point for AI-native is the compute governance closed loop: budget and admission, metering and attribution, sharing and isolation, topology and network, context assetization. API/Agent/MCP remain important, but must be constrained by this closed loop, otherwise the system can only oscillate between &amp;ldquo;smarter&amp;rdquo; and &amp;ldquo;more bankrupt.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://sloanreview.mit.edu/" target="_blank" rel="noopener"&gt;MIT Sloan - sloanreview.mit.edu&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/ai/responsible-ai" target="_blank" rel="noopener"&gt;Google Cloud - cloud.google.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://oecd.ai/en/ai-principles" target="_blank" rel="noopener"&gt;OECD AI Principles - oecd.ai&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Migration Roadmap: From Cloud Native to AI Native</title><link>https://jimmysong.io/book/ai-native-infra/migration-roadmap/</link><pubDate>Sun, 18 Jan 2026 04:20:11 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/migration-roadmap/</guid><description>An actionable roadmap for AI-native migration, covering bypass pilot, domain isolation, AI-first refactoring, and anti-patterns, with focus on governance loops and organizational contracts.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Migration is not &amp;ldquo;rebuilding the platform,&amp;rdquo; but using governance loops and organizational contracts to transform uncertainty into controllable engineering capabilities.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The previous five chapters have established: &lt;strong&gt;AI-native infrastructure is Uncertainty-by-Default&lt;/strong&gt;. Therefore, the architectural starting point must be &lt;strong&gt;compute governance loops&lt;/strong&gt;, not &amp;ldquo;connect a model and call migration complete.&amp;rdquo; Otherwise, systems easily spiral out of control in three dimensions: &lt;strong&gt;cost&lt;/strong&gt; (runaway cost), &lt;strong&gt;risk&lt;/strong&gt; (unauthorized actions/side effects), and &lt;strong&gt;tail performance&lt;/strong&gt; (P95/P99 and queue tail behavior).&lt;/p&gt;
&lt;p&gt;This explains why the FinOps Foundation emphasizes: running AI/ML on Kubernetes, &amp;ldquo;elasticity&amp;rdquo; easily evolves into uncontrollable cost overflow. &lt;strong&gt;FinOps must be incorporated into architecture and organization upfront as a shared operating model, not as an after-the-fact reconciliation exercise.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This article presents an &lt;strong&gt;actionable migration roadmap&lt;/strong&gt;, covering both technical evolution paths and organizational implementation approaches. You don&amp;rsquo;t need to &amp;ldquo;rebuild an AI platform&amp;rdquo; all at once, but you must establish working &lt;strong&gt;governance loops&lt;/strong&gt; at each stage: budget/admission, metering/attribution, sharing/isolation, topology/networking, and context assetization.&lt;/p&gt;
&lt;h2 id="the-north-star-from-platform-delivery-to-governance-loops"&gt;The North Star: From Platform Delivery to Governance Loops&lt;/h2&gt;
&lt;p&gt;The diagram below shows the migration path from bypass pilot to AI-first refactoring.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/migration-roadmap/migration-roadmap-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/migration-roadmap/migration-roadmap-en.svg" alt="Figure 1: AI-native migration roadmap" data-caption="Figure 1: AI-native migration roadmap"
width="1663"
height="203"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI-native migration roadmap&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Cloud-native migration typically centers on &amp;ldquo;capability delivery&amp;rdquo;: CI/CD, self-service platforms, service governance, and auto-scaling. Its default assumptions: systems are deterministic, costs grow linearly with requests, and scaling doesn&amp;rsquo;t significantly alter system boundaries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI-native migration must center on &amp;ldquo;governance loops&amp;rdquo;&lt;/strong&gt;, focusing on cost, risk, tail performance, and state assets. Its default assumptions are precisely the opposite: systems are inherently uncertain, and the &amp;ldquo;actions and consequences&amp;rdquo; of inference/agents drive costs and risks into nonlinear territory.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
North Star Definition
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
&lt;strong&gt;AI-Native Migration = Establish an AI Landing Zone + Compute Governance Loop + Context Tier, and ensure all agents/APIs/runtimes operate within this loop.&lt;/strong&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Elevating &amp;ldquo;Landing Zone&amp;rdquo; to North Star level here isn&amp;rsquo;t chasing trends—it&amp;rsquo;s because it naturally serves an organizational-level task: &lt;strong&gt;delineating responsibility boundaries between platform teams and workload teams&lt;/strong&gt;. Major cloud providers universally use Landing Zones to host &amp;ldquo;shared governance baselines&amp;rdquo; (networking, identity, policies, auditing, quota/budget), while business teams iteratively build applications within controlled boundaries. For AI, this boundary is the carrier of the governance loop.&lt;/p&gt;
&lt;h2 id="migration-prerequisites-build-three-foundations-first-then-scale-applications"&gt;Migration Prerequisites: Build Three Foundations First, Then Scale Applications&lt;/h2&gt;
&lt;p&gt;You can run PoCs and build applications in parallel, but if these three foundations are missing, any &amp;ldquo;application explosion&amp;rdquo; can easily transform into platform firefighting and financial disputes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Foundation A: FinOps / Quotas as Control Plane (Finance and Quotas as Control Plane)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The first migration step is not &amp;ldquo;launch the first agent,&amp;rdquo; but incorporating &lt;strong&gt;budgets, alerts, showback/chargeback, and quotas&lt;/strong&gt; into the infrastructure control plane:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Budgets and alerts are not just financial reports, but triggers for runtime policies (rate limiting, degradation, queuing, preemption).&lt;/li&gt;
&lt;li&gt;showback/chargeback is not just accounting, but binding &amp;ldquo;cost consequences&amp;rdquo; to organizational decisions and product boundaries.&lt;/li&gt;
&lt;li&gt;Quotas are not static limits, but evolvable governance instruments (dynamic budgets and priorities by tenant/team/use-case).&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Migration Threshold
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
If you cannot attribute the primary consumption of each agent/job to team/project/model/use-case (at minimum covering tokens, GPU time, KV footprint, key network/storage), you haven&amp;rsquo;t reached the &amp;ldquo;scale&amp;rdquo; starting line. Piloting is acceptable, but expansion is not advisable.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Foundation B: Resource Governance (GPU Sharing/Isolation and Orchestration Capabilities)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;elasticity&amp;rdquo; of AI-native infrastructure is constrained by how scarce compute is governed. Treating GPUs as ordinary resources typically results in &lt;strong&gt;low utilization&lt;/strong&gt; and &lt;strong&gt;uncontrolled contention&lt;/strong&gt;. Therefore, you need viable combinations of sharing/isolation and orchestration capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Sharing/partitioning&lt;/strong&gt;: MIG/MPS/vGPU paths transform &amp;ldquo;exclusive&amp;rdquo; into &amp;ldquo;pooled.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduling upgrades&lt;/strong&gt;: Introduce explicit modeling of topology, queues, fairness, preemption, and cost tiers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Orchestration loop&lt;/strong&gt;: Solidify isolation, preemption, and priority policies into executable rules.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key is not which partitioning technology you choose, but whether you can elevate GPUs from &amp;ldquo;machine assets&amp;rdquo; to &lt;strong&gt;first-class governance resources&lt;/strong&gt; and incorporate them into budget and admission systems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Foundation C: Fabric as a First-Class Constraint (Network/Interconnect as First-Class Constraint)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Training and high-throughput inference are extremely sensitive to congestion, packet loss, and tail latency. Ignoring networking and topology leads to &amp;ldquo;seemingly sporadic but actually structural&amp;rdquo; problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training JCT is amplified by tail behavior, invalidating capacity planning;&lt;/li&gt;
&lt;li&gt;Inference P99 and queue tails are amplified, making SLOs difficult to honor.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, you need to build reusable AI-ready network baselines: capacity assumptions, lossless strategies, isolation domain 划分，measurement and acceptance criteria. Networking is not &amp;ldquo;optimize later,&amp;rdquo; but baseline engineering that must land in Days 31–60.&lt;/p&gt;
&lt;h2 id="migration-path-selection-layered-by-organizational-risk-and-technical-debt"&gt;Migration Path Selection: Layered by Organizational Risk and Technical Debt&lt;/h2&gt;
&lt;p&gt;Migration isn&amp;rsquo;t &amp;ldquo;pick one path and see it through,&amp;rdquo; but mapping organizations with different risk appetites and debt structures to different starting approaches and exit criteria. Paths can advance in parallel, but each needs defined &lt;strong&gt;applicable conditions&lt;/strong&gt; and &lt;strong&gt;exit criteria&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Path 1: Bypass Pilot / Skunkworks&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Applicable when cloud-native platforms are running stably, but AI demand is just emerging, organizational uncertainty is high, and governance mechanisms are not yet mature.&lt;/p&gt;
&lt;p&gt;The approach is establishing an &amp;ldquo;AI minimum closed-loop sandbox&amp;rdquo; alongside the existing platform. The goal is not &amp;ldquo;feature completeness,&amp;rdquo; but &amp;ldquo;making the loop work&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Independent GPU pool (or at least independent queue) + basic admission and budget&lt;/li&gt;
&lt;li&gt;Minimal token/GPU metering and attribution&lt;/li&gt;
&lt;li&gt;Controlled inference/agent entry points (max context / max steps / max tool calls)&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Failure-acceptable&amp;rdquo; SLOs and cost caps (define boundaries first, then discuss experience)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Exit criteria:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cost curve is explainable (at minimum attributable to team/use-case)&lt;/li&gt;
&lt;li&gt;GPU utilization and isolation strategies form reusable templates&lt;/li&gt;
&lt;li&gt;Pilot capabilities can be 下沉 as platform capabilities (enter Path 2)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Path 2: Domain-Isolated Platform&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Applicable when AI has entered multi-team, multi-tenant stages, requiring &amp;ldquo;pilot assets&amp;rdquo; to be solidified into platform capabilities to prevent cost and risk from spreading across domains.&lt;/p&gt;
&lt;p&gt;The approach is building an AI Landing Zone, where the platform team centrally manages shared governance capabilities, and workload teams iteratively build applications within controlled boundaries.&lt;/p&gt;
&lt;p&gt;Platform-side essential modules (recommend organizing by &amp;ldquo;governance loop&amp;rdquo;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Identity/Policy&lt;/strong&gt;: Unified identity, policy distribution, and auditing (policy-as-intent)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Network/Fabric baseline&lt;/strong&gt;: AI-ready network baseline and automated acceptance&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute governance&lt;/strong&gt;: Quotas, budgets, preemption, fairness, isolation/sharing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Observability &amp;amp; Chargeback&lt;/strong&gt;: End-to-end metering, alerts, showback/chargeback&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime catalog&lt;/strong&gt;: &amp;ldquo;Golden paths&amp;rdquo; and templated delivery for inference/training runtimes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Exit criteria: Platform provides &amp;ldquo;replicable AI workload landing approaches&amp;rdquo; and can scale use case count under budget constraints, rather than relying on manual firefighting to maintain stability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Path 3: AI-First Refactor (AI Factory / Replatform)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Applicable when AI is core business, requiring infrastructure to be treated as a &amp;ldquo;production line&amp;rdquo; rather than a &amp;ldquo;cluster,&amp;rdquo; and optimization objectives to switch from &amp;ldquo;shipping features&amp;rdquo; to &amp;ldquo;throughput/unit cost/energy efficiency.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The approach centers on &amp;ldquo;state assets + unit cost&amp;rdquo; refactoring:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Context/state&lt;/strong&gt; of inference/agents is explicitly governed and reused (no longer application-level tricks)&lt;/li&gt;
&lt;li&gt;Introduce &lt;strong&gt;Context Tier&lt;/strong&gt; architectural assumptions: long context and agentic inference require inference state / KV cache to be reusable across nodes and sessions&lt;/li&gt;
&lt;li&gt;Drive platform evolution with &amp;ldquo;unit token cost, tail latency, throughput/energy efficiency,&amp;rdquo; not &amp;ldquo;number of new components&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Exit criteria: Can consistently make engineering decisions using &amp;ldquo;unit cost and tail performance,&amp;rdquo; and treat context reuse as a platform capability rather than application team trick caching.&lt;/p&gt;
&lt;h2 id="90-day-actionable-plan-ai-landing-zone--minimum-governance-loop"&gt;90-Day Actionable Plan: AI Landing Zone + Minimum Governance Loop&lt;/h2&gt;
&lt;p&gt;The goal is to establish &amp;ldquo;AI Landing Zone + minimum governance loop&amp;rdquo; within 90 days, forming a replicable template. The key is not covering all scenarios, but connecting the &lt;strong&gt;admission—metering—enforcement—feedback&lt;/strong&gt; loop.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Day 0–30: Establish the Ledger (Cost &amp;amp; Usage Ledger)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;First, define attribution dimensions, establish budgets/alerts and baseline reports, and implement quotas/usage controls.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Attribution dimensions: tenant/team/project/model/use-case/tool&lt;/li&gt;
&lt;li&gt;Establish budgets and alerts, baseline reports (cost + business value metrics)&lt;/li&gt;
&lt;li&gt;Implement quotas and usage controls (at minimum covering GPU quotas and key service quotas)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Deliverables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cost and usage dashboard (weekly-level, traceable)&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Admission Policy v0&amp;rdquo; (max context / max steps / max budget)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Day 31–60: Establish Resource Governance (GPU Governance + Scheduling)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This phase requires evaluating GPU sharing/isolation strategies, introducing topology/networking constraints, and forming two golden paths for inference and training.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPU sharing/isolation strategy: MIG/MPS/vGPU/DRA path evaluation and PoC (executable strategy as acceptance criteria)&lt;/li&gt;
&lt;li&gt;Introduce topology/networking constraints, form AI-ready network baseline and capacity assumptions (including acceptance criteria)&lt;/li&gt;
&lt;li&gt;Form two templated delivery paths for inference/training&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Deliverables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Workload templates (1 each for inference and training)&lt;/li&gt;
&lt;li&gt;Scheduling and isolation strategies (whitelisted, auditable)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Day 61–90: Establish the Loop (Enforcement + Feedback)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The final phase requires executing budget policies, migrating pilot use cases to the landing zone, and solidifying organizational interfaces.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Execute budgets: rate limiting/queuing/preemption/degradation strategies, linked to SLOs&lt;/li&gt;
&lt;li&gt;Migrate pilot use cases to landing zone (or service landing zone capabilities)&lt;/li&gt;
&lt;li&gt;Solidify &amp;ldquo;organizational interface&amp;rdquo;: platform team vs workload team responsibility boundaries (forming executable contracts)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Deliverables:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;AI Platform Runbook v1&amp;rdquo; (including oncall, changes, cost auditing)&lt;/li&gt;
&lt;li&gt;Two replicable use case landing paths (new use cases ≤ 30 minutes to golden path)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="operating-model-the-contract-between-platform-teams-and-workload-teams"&gt;Operating Model: The &amp;ldquo;Contract&amp;rdquo; Between Platform Teams and Workload Teams&lt;/h2&gt;
&lt;p&gt;Migration success depends on establishing clear, executable &amp;ldquo;organizational contracts.&amp;rdquo; The contract essence: who is responsible for &amp;ldquo;capability provision,&amp;rdquo; who is responsible for &amp;ldquo;behavioral consequences.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Platform teams provide (must be stable)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Landing zone, network baseline, identity and policies, budget/quota systems, metering/attribution, GPU governance capabilities, runtime golden paths&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Workload teams own (must be self-service)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Model selection, prompt/agent logic, tool integration, SLO definition, business value measurement, use case risk classification and rollback paths&lt;/p&gt;
&lt;p&gt;This is also why the FinOps Framework emphasizes operating model (personas, capabilities, maturity) rather than just tools: without &amp;ldquo;contracts,&amp;rdquo; budgets are difficult to execute; if budgets cannot execute, loops cannot form.&lt;/p&gt;
&lt;h2 id="migration-anti-patterns"&gt;Migration Anti-Patterns&lt;/h2&gt;
&lt;p&gt;Below are common migration anti-patterns and their consequences:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Anti-Pattern&lt;/th&gt;
&lt;th&gt;Typical Consequences&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Build only API/Agent platform, without ledger and budget&lt;/td&gt;
&lt;td&gt;runaway cost (most common, and difficult to remediate afterwards)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Treat GPUs as ordinary resources, without sharing/isolation and scheduling upgrades&lt;/td&gt;
&lt;td&gt;Low utilization + uncontrolled contention, platform forced to allocate compute via &amp;ldquo;administrative means&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ignore networking and topology&lt;/td&gt;
&lt;td&gt;Tail latency and training JCT amplified, capacity planning fails, SLOs difficult to honor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context not assetized (only &amp;ldquo;tricky caching&amp;rdquo; within applications)&lt;/td&gt;
&lt;td&gt;Unit cost out of control in long context/agentic era, reuse capabilities difficult to solidify as platform capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Common Migration Anti-Patterns and Consequences
&lt;/figcaption&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The core of AI-native migration is not a &amp;ldquo;migration checklist,&amp;rdquo; but &lt;strong&gt;under uncertainty premises, incorporating cost, risk, and tail performance into a unified governance loop, using Landing Zone to carry organizational contracts, and using Context Tier to implement state reuse infrastructure capabilities&lt;/strong&gt;. Only in this way can platform and business maintain controllability and efficiency during scaled evolution.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.mckinsey.com/capabilities/quantumblack/our-insights" target="_blank" rel="noopener"&gt;McKinsey on AI Strategy - mckinsey.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.thoughtworks.com/radar" target="_blank" rel="noopener"&gt;Thoughtworks Technology Radar - thoughtworks.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/adoption-framework" target="_blank" rel="noopener"&gt;Google Cloud Adoption Framework - cloud.google.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Glossary</title><link>https://jimmysong.io/book/ai-native-infra/glossary/</link><pubDate>Sun, 18 Jan 2026 05:19:24 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/glossary/</guid><description>Bilingual glossary of core AI-native infrastructure terminology for aligning organizational language.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Unified terminology is the first step toward organizational consensus.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Conclusion first: in the context of AI-native infrastructure, key terms must remain consistent; otherwise, both governance and communication lose focus.&lt;/p&gt;
&lt;p&gt;The following glossary serves to align cross-team terminology.&lt;/p&gt;
&lt;h2 id="core-terms"&gt;Core Terms&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;AI Native Infrastructure&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;An infrastructure system premised on &amp;ldquo;models/agents as execution entities, compute as scarce assets, and uncertainty as the norm,&amp;rdquo; closed-looping &amp;ldquo;intent → execution → resource consumption → economic and risk outcomes&amp;rdquo; through compute governance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Model-as-Actor&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Models/agents become &amp;ldquo;execution entities&amp;rdquo; with action capabilities, capable of invoking tools, modifying system state, and producing side effects, thus requiring governance and audit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compute-as-Scarcity&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Compute (GPU, interconnects, power consumption, bandwidth) becomes the core scarce asset, with expansion constrained by supply chain and data center conditions, and costs that cannot be made elastic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Uncertainty-by-Default&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Behavior and resource consumption are highly uncertain (especially in agentic and long-context scenarios), requiring verification and fallback mechanisms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Intent Plane&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The API, Agent, and policy expression layer responsible for expressing &amp;ldquo;what I want,&amp;rdquo; including priorities, budgets, compliance, and other policies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Execution Plane&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The training/inference/serving/runtime layer responsible for translating intent into actual execution, including state management, tool invocation, model routing, and so on.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Governance Plane&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The quota/budget, isolation/sharing, and cost control layer responsible for bounding resource consequences, including topology-aware scheduling, SLO and risk policies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Loop&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Possessing a closed loop of &amp;ldquo;intent → consumption → cost/risk outcomes,&amp;rdquo; comprising four steps: Admission, Translation, Metering, and Enforcement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compute Governance&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Governing the resource consequences of intent, including four categories of objects: token economics, accelerator time, interconnect and storage, and organizational budgets and risks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FinOps / Financial Operations&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Embedding cost governance early into architecture so that every scaling decision simultaneously answers &amp;ldquo;whether performance meets requirements&amp;rdquo; and &amp;ldquo;whether it is affordable.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;An execution entity that completes tasks by selecting tools, invoking tools, and iterating reasoning, with uncertain behavioral paths and resource consumption.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;MCP / Model Context Protocol&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A protocol that standardizes tool access as &amp;ldquo;declaratable capability boundaries,&amp;rdquo; defining how capabilities are exposed to models/agents and how they are invoked.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Operating Model&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Institutional design for organization and operational methods, including responsibility boundaries, collaboration mechanisms, and decision-making processes, answering &amp;ldquo;who is responsible for what and what are the costs of failure.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.iso.org/standard/74296.html" target="_blank" rel="noopener"&gt;ISO/IEC 22989 AI Concepts and Terminology&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nist.gov/ai" target="_blank" rel="noopener"&gt;NIST AI Glossary&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://oecd.ai/en/ai-glossary" target="_blank" rel="noopener"&gt;OECD AI Glossary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Executive Checklist (10 Questions)</title><link>https://jimmysong.io/book/ai-native-infra/executive-checklist/</link><pubDate>Sun, 18 Jan 2026 05:21:59 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/book/ai-native-infra/executive-checklist/</guid><description>Ten critical questions for CEO/CTO to evaluate AI-native infrastructure readiness.</description><content:encoded>
&lt;p&gt;The following 10 questions assess whether an organization possesses the strategic and execution readiness for AI-native infrastructure. The diagram below categorizes these questions into three domains: strategy, governance, and execution, facilitating executive discussions.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/book/ai-native-infra/executive-checklist/executive-checklist-en.svg" data-img="https://assets.jimmysong.io/images/book/ai-native-infra/executive-checklist/executive-checklist-en.svg" alt="Figure 1: Executive checklist structure diagram" data-caption="Figure 1: Executive checklist structure diagram"
width="1616"
height="416"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Executive checklist structure diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ol&gt;
&lt;li&gt;Can you clearly define the &lt;strong&gt;unit cost&lt;/strong&gt; for each major AI workload (e.g., per 1M tokens, per agent task, per batch job)?&lt;/li&gt;
&lt;li&gt;Do you have &lt;strong&gt;budget/quota mechanisms&lt;/strong&gt; that can constrain team/project/tenant compute consumption within controllable bounds?&lt;/li&gt;
&lt;li&gt;Can you make &lt;strong&gt;explicit policy trade-offs&lt;/strong&gt; between &amp;ldquo;performance (throughput/latency)—cost—risk&amp;rdquo; (rather than relying on verbal constraints)?&lt;/li&gt;
&lt;li&gt;Can your platform handle &lt;strong&gt;uncertainty&lt;/strong&gt;: spikes, long-tail effects, and resource fluctuations caused by agent path explosions?&lt;/li&gt;
&lt;li&gt;Are agent/MCP &amp;ldquo;intents&amp;rdquo; mapped to &lt;strong&gt;actionable and billable/auditable resource consequences&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;Do you have clear &lt;strong&gt;resource isolation and sharing strategies&lt;/strong&gt; (same-card sharing, memory isolation, preemption, prioritization) to improve utilization?&lt;/li&gt;
&lt;li&gt;Can you achieve cross-layer observability: end-to-end tracing from &lt;strong&gt;request/agent → runtime → GPU/network/storage → cost&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;Does your infrastructure support rapid adoption of new hardware/interconnect/topology changes (heterogeneity and evolution are the norm)?&lt;/li&gt;
&lt;li&gt;Has the organization established &lt;strong&gt;&amp;ldquo;AI SRE/ModelOps + FinOps&amp;rdquo;&lt;/strong&gt; collaboration mechanisms and accountability boundaries (who owns cost and reliability)?&lt;/li&gt;
&lt;li&gt;When you say &amp;ldquo;we are AI-native,&amp;rdquo; can you provide &lt;strong&gt;three planes + one closed loop&lt;/strong&gt; architecture diagram and governance strategy on a single page?&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://hbr.org/topic/artificial-intelligence" target="_blank" rel="noopener"&gt;Harvard Business Review - AI Strategy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://sloanreview.mit.edu/tag/artificial-intelligence/" target="_blank" rel="noopener"&gt;MIT Sloan - Executive Guide to AI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.weforum.org/agenda/archive/artificial-intelligence/" target="_blank" rel="noopener"&gt;World Economic Forum - AI Governance&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>When GPUs Move Toward Open Scheduling: Structural Shifts in AI Native Infrastructure</title><link>https://jimmysong.io/blog/gpu-open-scheduling-hami-2025/</link><pubDate>Fri, 13 Feb 2026 14:32:46 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/gpu-open-scheduling-hami-2025/</guid><description>A CTO/VP view on open GPU scheduling: CDI, Kubernetes DRA, virtualization data planes, ecosystem governance, and lock-in risk.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The future of GPU scheduling isn&amp;rsquo;t about whose implementation is more &amp;ldquo;black-box&amp;rdquo;—it&amp;rsquo;s about who can standardize device resource contracts into something governable.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/banner.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/banner.webp" alt="Figure 1: GPU Open Scheduling" data-caption="Figure 1: GPU Open Scheduling"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: GPU Open Scheduling&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Have you ever wondered: why are GPUs so expensive, yet overall utilization often hovers around 10–20%?&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/underutilization.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/underutilization.webp" alt="Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization" data-caption="Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This isn&amp;rsquo;t a problem you solve with &amp;ldquo;better scheduling algorithms.&amp;rdquo; It&amp;rsquo;s a &lt;strong&gt;structural problem&lt;/strong&gt; - GPU scheduling is undergoing a shift from &amp;ldquo;proprietary implementation&amp;rdquo; to &amp;ldquo;open scheduling,&amp;rdquo; similar to how networking converged on CNI and storage converged on CSI.&lt;/p&gt;
&lt;p&gt;In the &lt;a href="https://dynamia.ai/blog/hami-2025-recap" target="_blank" rel="noopener"&gt;HAMi 2025 Annual Review&lt;/a&gt;, we noted: &amp;ldquo;HAMi 2025 is no longer just about GPU sharing tools—it&amp;rsquo;s a more structural signal: GPUs are moving toward open scheduling.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;By 2025, the signals of this shift became visible: Kubernetes Dynamic Resource Allocation (DRA) graduated to GA and became enabled by default, NVIDIA GPU Operator started defaulting to &lt;a href="https://github.com/cncf-tags/container-device-interface" target="_blank" rel="noopener"&gt;CDI&lt;/a&gt; (Container Device Interface), and HAMi&amp;rsquo;s production-grade case studies under CNCF are moving &amp;ldquo;GPU sharing&amp;rdquo; from experimental capability to operational excellence.&lt;/p&gt;
&lt;p&gt;This post analyzes this structural shift from an AI Native Infrastructure perspective, and what it means for &lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia&lt;/a&gt; and the industry.&lt;/p&gt;
&lt;h2 id="why-open-scheduling-matters"&gt;Why &amp;ldquo;Open Scheduling&amp;rdquo; Matters&lt;/h2&gt;
&lt;p&gt;In multi-cloud and hybrid cloud environments, GPU model diversity significantly amplifies operational costs. One large internet company&amp;rsquo;s platform spans H200/H100/A100/V100/4090 GPUs across five clusters. If you can only allocate &amp;ldquo;whole GPUs,&amp;rdquo; resource misalignment becomes inevitable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Open scheduling&amp;rdquo; isn&amp;rsquo;t a slogan—it&amp;rsquo;s a set of engineering contracts being solidified into the mainstream stack.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="standardized-resource-expression"&gt;Standardized Resource Expression&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; GPUs were extended resources. The scheduler didn&amp;rsquo;t understand if they represented memory, compute, or device types.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dra-evolution.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dra-evolution.webp" alt="Figure 3: Open Scheduling Standardization Evolution" data-caption="Figure 3: Open Scheduling Standardization Evolution"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Open Scheduling Standardization Evolution&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; Kubernetes DRA provides objects like DeviceClass, ResourceClaim, and ResourceSlice. This lets drivers and cluster administrators define device categories and selection logic (including CEL-based selectors), while Kubernetes handles the full loop: match devices → bind claims → place Pods onto nodes with access to allocated devices.&lt;/p&gt;
&lt;p&gt;Even more importantly, Kubernetes 1.34 stated that core APIs in the &lt;code&gt;resource.k8s.io&lt;/code&gt; group graduated to GA, DRA became stable and enabled by default, and the community committed to avoiding breaking changes going forward. This means the ecosystem can invest with confidence in a stable, standard API.&lt;/p&gt;
&lt;h3 id="standardized-device-injection"&gt;Standardized Device Injection&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Device injection relied on vendor-specific hooks and runtime class patterns.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; The Container Device Interface (CDI) abstracts device injection into an open specification. NVIDIA&amp;rsquo;s Container Toolkit explicitly describes CDI as an open specification for container runtimes, and NVIDIA GPU Operator 25.10.0 defaults to enabling CDI on install/upgrade—directly leveraging runtime-native CDI support (containerd, CRI-O, etc.) for GPU injection.&lt;/p&gt;
&lt;p&gt;This means &amp;ldquo;devices into containers&amp;rdquo; is also moving toward replaceable, standardized interfaces.&lt;/p&gt;
&lt;h2 id="hami-from-sharing-tool-to-governable-data-plane"&gt;HAMi: From &amp;ldquo;Sharing Tool&amp;rdquo; to &amp;ldquo;Governable Data Plane&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;On this standardization path, &lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt;&amp;rsquo;s role needs redefinition: &lt;strong&gt;it&amp;rsquo;s not about replacing Kubernetes—it&amp;rsquo;s about turning GPU virtualization and slicing into a declarative, schedulable, governable data plane.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="data-plane-perspective"&gt;Data Plane Perspective&lt;/h3&gt;
&lt;p&gt;HAMi&amp;rsquo;s core contribution expands the allocatable unit from &amp;ldquo;whole GPU integers&amp;rdquo; to finer-grained shares (memory and compute), forming a complete allocation chain:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Device discovery:&lt;/strong&gt; Identify available GPU devices and models&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduling placement:&lt;/strong&gt; Use Scheduler Extender to make native schedulers &amp;ldquo;understand&amp;rdquo; vGPU resource models (Filter/Score/Bind phases)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;In-container enforcement:&lt;/strong&gt; Inject share constraints into container runtime&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Metric export:&lt;/strong&gt; Provide observable metrics for utilization, isolation, and more&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This transforms &amp;ldquo;sharing&amp;rdquo; from ad-hoc &amp;ldquo;it runs&amp;rdquo; experimentation into engineering capability that can be declared in YAML, scheduled by policy, and validated by metrics.&lt;/p&gt;
&lt;h3 id="scheduling-mechanism-enhancement-not-replacement"&gt;Scheduling Mechanism: Enhancement, Not Replacement&lt;/h3&gt;
&lt;p&gt;HAMi&amp;rsquo;s scheduling doesn&amp;rsquo;t replace Kubernetes—it uses a &lt;strong&gt;Scheduler Extender&lt;/strong&gt; pattern to let the native scheduler understand vGPU resource models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Filter:&lt;/strong&gt; Filter nodes based on memory, compute, device type, topology, and other constraints&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Score:&lt;/strong&gt; Apply configurable policies like binpack, spread, topology-aware scoring&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bind:&lt;/strong&gt; Complete final device-to-Pod binding&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This architecture positions HAMi naturally as an execution layer under higher-level &amp;ldquo;AI control planes&amp;rdquo; (queuing, quotas, priorities)—working alongside Volcano, Kueue, Koordinator, and others.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/hami-scheduler-extender.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/hami-scheduler-extender.webp" alt="Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)" data-caption="Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="production-evidence-from-can-we-share-to-can-we-operate"&gt;Production Evidence: From &amp;ldquo;Can We Share?&amp;rdquo; to &amp;ldquo;Can We Operate?&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.cncf.io/case-studies/?_sft_lf_project=hami" target="_blank" rel="noopener"&gt;CNCF public case studies&lt;/a&gt; provide concrete answers: &lt;strong&gt;in a hybrid, multi-cloud platform built on Kubernetes and HAMi, 10,000+ Pods run concurrently, and GPU utilization improves from 13% to 37% (nearly 3×).&lt;/strong&gt;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/case-studies.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/case-studies.webp" alt="Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%&amp;#43; utilization, SF Technology 57% savings" data-caption="Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%&amp;#43; utilization, SF Technology 57% savings"
width="2466"
height="1508"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here are highlights from several cases:&lt;/p&gt;
&lt;h3 id="case-study-1-ke-holdings-february-5-2026"&gt;Case Study 1: Ke Holdings (February 5, 2026)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Environment:&lt;/strong&gt; 5 clusters spanning public and private clouds&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU models:&lt;/strong&gt; H200/H100/A100/V100/4090 and more&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Architecture:&lt;/strong&gt; Separate &amp;ldquo;GPU clusters&amp;rdquo; for large training tasks (dedicated allocation) vs &amp;ldquo;vGPU clusters&amp;rdquo; with HAMi fine-grained memory slicing for high-density inference&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrent scale:&lt;/strong&gt; 10,000+ Pods&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outcome:&lt;/strong&gt; Overall GPU utilization improved from 13% to 37% (nearly 3×)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="case-study-2-daocloud-december-2-2025"&gt;Case Study 2: DaoCloud (December 2, 2025)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hard constraints:&lt;/strong&gt; Must remain cloud-native, vendor-agnostic, and compatible with CNCF toolchain&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adoption outcomes:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Average GPU utilization: 80%+&lt;/li&gt;
&lt;li&gt;GPU-related operating cost reduction: 20–30%&lt;/li&gt;
&lt;li&gt;Coverage: 10+ data centers, 10,000+ GPUs&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Explicit benefit:&lt;/strong&gt; Unified abstraction layer across NVIDIA and domestic GPUs, reducing vendor dependency&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="case-study-3-prep-edu-august-20-2025"&gt;Case Study 3: Prep EDU (August 20, 2025)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Negative experience:&lt;/strong&gt; Isolation failures in other GPU-sharing approaches caused memory conflicts and instability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Positive outcome:&lt;/strong&gt; HAMi&amp;rsquo;s vGPU scheduling, GPU type/UUID targeting, and compatibility with NVIDIA GPU Operator and RKE2 became decisive factors for production adoption&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Environment:&lt;/strong&gt; Heterogeneous RTX 4070/4090 cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="case-study-4-sf-technology-september-18-2025"&gt;Case Study 4: SF Technology (September 18, 2025)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Project:&lt;/strong&gt; EffectiveGPU (built on HAMi)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use cases:&lt;/strong&gt; Large model inference, test services, speech recognition, domestic AI hardware (Huawei Ascend, Baidu Kunlun, etc.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outcomes:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;GPU savings: Large model inference runs 65 services on 28 GPUs (37 saved); test cluster runs 19 services on 6 GPUs (13 saved)&lt;/li&gt;
&lt;li&gt;Overall savings: Up to 57% GPU savings for production and test clusters&lt;/li&gt;
&lt;li&gt;Utilization improvement: Up to 100% GPU utilization improvement with GPU virtualization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Highlights:&lt;/strong&gt; Cross-node collaborative scheduling, priority-based preemption, memory over-subscription&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These cases demonstrate a consistent pattern: &lt;strong&gt;GPU virtualization becomes economically meaningful only when it participates in a governable contract—where utilization, isolation, and policy can be expressed, measured, and improved over time.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="strategic-implications-for-dynamia"&gt;Strategic Implications for Dynamia&lt;/h2&gt;
&lt;p&gt;From Dynamia&amp;rsquo;s perspective (and as VP of Open Source Ecosystem), the strategic value of HAMi becomes clear:&lt;/p&gt;
&lt;h3 id="two-layer-architecture-open-source-vs-commercial"&gt;Two-Layer Architecture: Open Source vs Commercial&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HAMi (CNCF open source project):&lt;/strong&gt; Responsible for &amp;ldquo;adoption and trust,&amp;rdquo; focused on GPU virtualization and compute efficiency&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamia enterprise products and services:&lt;/strong&gt; Responsible for &amp;ldquo;production and scale,&amp;rdquo; providing commercial distributions and enterprise services built on HAMi&lt;/li&gt;
&lt;/ul&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dynamia-hami-dual-mechanism.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dynamia-hami-dual-mechanism.webp" alt="Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial" data-caption="Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This boundary is the foundation for long-term trust—project and company offerings remain separate, with commercial distributions and services built on the open source project.&lt;/p&gt;
&lt;h3 id="global-narrative-strategy"&gt;Global Narrative Strategy&lt;/h3&gt;
&lt;p&gt;The internal alignment memo recommends a bilingual approach:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First layer:&lt;/strong&gt; Lead globally with &amp;ldquo;GPU virtualization / sharing / utilization&amp;rdquo; (Chinese can directly use &amp;ldquo;GPU virtualization and heterogeneous scheduling,&amp;rdquo; but English first layer should avoid &amp;ldquo;heterogeneous&amp;rdquo; as a headline)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Second layer:&lt;/strong&gt; When users discuss mixed GPUs or workload diversity, introduce &amp;ldquo;heterogeneous&amp;rdquo; to confirm capability boundaries—never as the opening hook&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core anchor:&lt;/strong&gt; Maintain &amp;ldquo;HAMi (project and community) ≠ company products&amp;rdquo; as the non-negotiable baseline for long-term positioning&lt;/p&gt;
&lt;h3 id="the-right-commercialization-landing"&gt;The Right Commercialization Landing&lt;/h3&gt;
&lt;p&gt;DaoCloud&amp;rsquo;s case study already set vendor-agnostic and CNCF toolchain compatibility as hard constraints, framing vendor dependency reduction as a business and operational benefit—not just a technical detail. Project-HAMi&amp;rsquo;s official documentation lists &amp;ldquo;avoid vendor lock&amp;rdquo; as a core value proposition.&lt;/p&gt;
&lt;p&gt;In this context, &lt;strong&gt;the right commercialization landing isn&amp;rsquo;t &amp;ldquo;closed-source scheduling&amp;rdquo;—it&amp;rsquo;s productizing capabilities around real enterprise complexity:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Systematic compatibility matrix&lt;/li&gt;
&lt;li&gt;SLO and tail-latency governance&lt;/li&gt;
&lt;li&gt;Metering for billing&lt;/li&gt;
&lt;li&gt;RBAC, quotas, multi-cluster governance&lt;/li&gt;
&lt;li&gt;Upgrade and rollback safety&lt;/li&gt;
&lt;li&gt;Faster path-to-production for DRA/CDI and other standardization efforts&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="forward-view-the-next-23-years"&gt;Forward View: The Next 2–3 Years&lt;/h2&gt;
&lt;p&gt;My strong judgment: &lt;strong&gt;over the next 2–3 years, GPU scheduling competition will shift from &amp;ldquo;whose implementation is more black-box&amp;rdquo; to &amp;ldquo;whose contract is more open.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The reasons are practical:&lt;/p&gt;
&lt;h3 id="hardware-form-factors-and-supply-chains-are-diversifying"&gt;Hardware Form Factors and Supply Chains Are Diversifying&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;OpenAI&amp;rsquo;s February 12, 2026 &amp;ldquo;GPT‑5.3‑Codex‑Spark&amp;rdquo; release emphasizes ultra-low latency serving, including persistent WebSockets and a dedicated serving tier on Cerebras hardware&lt;/li&gt;
&lt;li&gt;Large-scale GPU-backed financing announcements (for pan-European deployments) illustrate the infrastructure scale and financial engineering surrounding accelerator fleets&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These signals suggest that heterogeneity will grow: mixed accelerators, mixed clouds, mixed workload types.&lt;/p&gt;
&lt;h3 id="low-latency-inference-tiers-will-force-systematic-scheduling"&gt;Low-Latency Inference Tiers Will Force Systematic Scheduling&lt;/h3&gt;
&lt;p&gt;Low-latency inference tiers (beyond just GPUs) will force resource scheduling toward &amp;ldquo;multi-accelerator, multi-layer cache, multi-class node&amp;rdquo; architectural design—scheduling must inherently be heterogeneous.&lt;/p&gt;
&lt;h3 id="open-scheduling-is-risk-management-not-idealism"&gt;Open Scheduling Is Risk Management, Not Idealism&lt;/h3&gt;
&lt;p&gt;In this world, &amp;ldquo;open scheduling&amp;rdquo; isn&amp;rsquo;t idealism—it&amp;rsquo;s risk management. Building schedulable governable &amp;ldquo;control plane + data plane&amp;rdquo; combinations around DRA/CDI and other solidifying open interfaces, ones that are pluggable, multi-tenant governable, and co-evolvable with the ecosystem—this looks like the truly sustainable path for AI Native Infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The next battleground isn&amp;rsquo;t &amp;ldquo;whose scheduling is smarter&amp;rdquo;—it&amp;rsquo;s &amp;ldquo;who can standardize device resource contracts into something governable.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;When you place HAMi 2025 back in the broader AI Native Infrastructure context, it&amp;rsquo;s no longer just the year of &amp;ldquo;GPU sharing tools&amp;rdquo;—it&amp;rsquo;s a more structural signal: &lt;strong&gt;GPUs are moving toward open scheduling.&lt;/strong&gt;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/future-vision-open-scheduling.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/future-vision-open-scheduling.webp" alt="Figure 7: Open Scheduling Future Vision" data-caption="Figure 7: Open Scheduling Future Vision"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 7: Open Scheduling Future Vision&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The driving forces come from both ends:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Upstream:&lt;/strong&gt; Standards like DRA/CDI continue to solidify&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Downstream:&lt;/strong&gt; Scale and diversity (multi-cloud, multi-model, even accelerators beyond GPUs)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For Dynamia, HAMi&amp;rsquo;s significance has transcended &amp;ldquo;GPU sharing tool&amp;rdquo;: it turns GPU virtualization and slicing into declarative, schedulable, measurable data planes—letting queues, quotas, priorities, and multi-tenancy actually close the governance loop.&lt;/p&gt;</content:encoded></item><item><title>AI Learning Resources: 44 Curated Collections from Our Cleanup</title><link>https://jimmysong.io/blog/ultimate-ai-learning-resources/</link><pubDate>Sun, 08 Feb 2026 12:20:05 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ultimate-ai-learning-resources/</guid><description>A curated collection of AI learning resources we removed from the AI Resources list: awesome lists, courses, tutorials, and cookbooks. These educational materials deserve their own spotlight.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;The best way to learn AI is to start building. These resources will guide your journey.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ultimate-ai-learning-resources/banner.webp" data-img="https://assets.jimmysong.io/images/blog/ultimate-ai-learning-resources/banner.webp" alt="Figure 1: AI Learning Resources Collection" data-caption="Figure 1: AI Learning Resources Collection"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI Learning Resources Collection&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In my ongoing effort to keep the AI Resources list focused on &lt;strong&gt;production-ready tools and frameworks&lt;/strong&gt;, I&amp;rsquo;ve removed &lt;strong&gt;44 collection-type projects&lt;/strong&gt;—courses, tutorials, awesome lists, and cookbooks.&lt;/p&gt;
&lt;p&gt;These resources aren&amp;rsquo;t gone—they&amp;rsquo;ve been moved here. This post is a &lt;strong&gt;curated collection&lt;/strong&gt; of those educational materials, organized by type and topic. Whether you&amp;rsquo;re a complete beginner or an experienced practitioner, you&amp;rsquo;ll find something valuable here.&lt;/p&gt;
&lt;h2 id="why-remove-collections-from-ai-resources"&gt;Why Remove Collections from AI Resources?&lt;/h2&gt;
&lt;p&gt;My AI Resources list now focuses on &lt;strong&gt;concrete tools and frameworks&lt;/strong&gt;—projects you can directly use in production. Collections, while valuable, serve a different purpose: &lt;strong&gt;education and discovery&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;By separating them, I:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep the resources list actionable and focused&lt;/li&gt;
&lt;li&gt;Create a dedicated space for learning materials&lt;/li&gt;
&lt;li&gt;Make it easier to find what you need&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-awesome-lists-14-collections"&gt;📚 Awesome Lists (14 Collections)&lt;/h2&gt;
&lt;p&gt;Awesome lists are community-curated collections of the best resources. They&amp;rsquo;re perfect for discovering new tools and staying updated.&lt;/p&gt;
&lt;h3 id="must-explore-awesome-lists"&gt;Must-Explore Awesome Lists&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/filipecalegario/awesome-generative-ai" target="_blank" rel="noopener"&gt;Awesome Generative AI&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Models, tools, tutorials, and research papers&lt;/li&gt;
&lt;li&gt;Great for: Comprehensive overview of generative AI landscape&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hannibal046/awesome-llm" target="_blank" rel="noopener"&gt;Awesome LLM&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM resources: papers, tools, datasets, applications&lt;/li&gt;
&lt;li&gt;Great for: Deep dive into large language models&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/arindam200/awesome-ai-apps" target="_blank" rel="noopener"&gt;Awesome AI Apps&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Practical LLM applications, RAG examples, agent implementations&lt;/li&gt;
&lt;li&gt;Great for: Real-world implementation examples&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hesreallyhim/awesome-claude-code" target="_blank" rel="noopener"&gt;Awesome Claude Code&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Claude Code commands, files, and workflows&lt;/li&gt;
&lt;li&gt;Great for: Maximizing Claude Code productivity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/punkpeye/awesome-mcp-servers" target="_blank" rel="noopener"&gt;Awesome MCP Servers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP servers for modular AI backend systems&lt;/li&gt;
&lt;li&gt;Great for: Building with Model Context Protocol&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="specialized-awesome-lists"&gt;Specialized Awesome Lists&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/f/awesome-chatgpt-prompts" target="_blank" rel="noopener"&gt;Awesome ChatGPT Prompts&lt;/a&gt;&lt;/strong&gt; - Prompt examples for various scenarios&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/shubhamsaboo/awesome-llm-apps" target="_blank" rel="noopener"&gt;Awesome LLM Apps&lt;/a&gt;&lt;/strong&gt; - LLM applications with code examples&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/bradyfu/awesome-multimodal-large-language-models" target="_blank" rel="noopener"&gt;Awesome Multimodal LLM&lt;/a&gt;&lt;/strong&gt; - Multimodal model resources&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/punkpeye/awesome-mcp-clients" target="_blank" rel="noopener"&gt;Awesome MCP Clients&lt;/a&gt;&lt;/strong&gt; - MCP client tools and SDKs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/composiohq/awesome-claude-skills" target="_blank" rel="noopener"&gt;Awesome Claude Skills&lt;/a&gt;&lt;/strong&gt; - Claude Skills and workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/github/awesome-copilot" target="_blank" rel="noopener"&gt;Awesome GitHub Copilot&lt;/a&gt;&lt;/strong&gt; - Copilot customizations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/zerolu/awesome-nanobanana-pro" target="_blank" rel="noopener"&gt;Awesome Nano Banana Pro&lt;/a&gt;&lt;/strong&gt; - Image model prompts and examples&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/alchemyst-ai/awesome-saas" target="_blank" rel="noopener"&gt;Awesome SaaS&lt;/a&gt;&lt;/strong&gt; - AI platform templates&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/voltagent/awesome-claude-code-subagents" target="_blank" rel="noopener"&gt;Awesome Claude Code Subagents&lt;/a&gt;&lt;/strong&gt; - Claude Code subagents&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-courses--tutorials-9-curricula"&gt;🎓 Courses &amp;amp; Tutorials (9 Curricula)&lt;/h2&gt;
&lt;p&gt;Structured learning paths from universities and tech companies.&lt;/p&gt;
&lt;h3 id="microsofts-ai-curriculum"&gt;Microsoft&amp;rsquo;s AI Curriculum&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/ai-for-beginners" target="_blank" rel="noopener"&gt;AI for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;12 weeks, 24 lessons covering neural networks, deep learning, CV, NLP&lt;/li&gt;
&lt;li&gt;Great for: Complete AI foundation&lt;/li&gt;
&lt;li&gt;Format: Lessons, quizzes, projects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/ml-for-beginners" target="_blank" rel="noopener"&gt;Machine Learning for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;12-week, 26-lesson curriculum on classic ML&lt;/li&gt;
&lt;li&gt;Great for: ML fundamentals without deep math&lt;/li&gt;
&lt;li&gt;Format: Project-based exercises&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/generative-ai-for-beginners" target="_blank" rel="noopener"&gt;Generative AI for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;18 lessons on building GenAI applications&lt;/li&gt;
&lt;li&gt;Great for: Practical GenAI development&lt;/li&gt;
&lt;li&gt;Format: Hands-on projects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/ai-agents-for-beginners" target="_blank" rel="noopener"&gt;AI Agents for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;11 lessons on agent systems&lt;/li&gt;
&lt;li&gt;Great for: Understanding autonomous agents&lt;/li&gt;
&lt;li&gt;Format: Project-driven learning&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/edgeai-for-beginners" target="_blank" rel="noopener"&gt;EdgeAI for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Optimization, deployment, and real-world Edge AI&lt;/li&gt;
&lt;li&gt;Great for: On-device AI applications&lt;/li&gt;
&lt;li&gt;Format: Practical tutorials&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/mcp-for-beginners" target="_blank" rel="noopener"&gt;MCP for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model Context Protocol curriculum&lt;/li&gt;
&lt;li&gt;Great for: Building with MCP&lt;/li&gt;
&lt;li&gt;Format: Cross-language examples and labs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="official-platform-courses"&gt;Official Platform Courses&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/huggingface/course" target="_blank" rel="noopener"&gt;Hugging Face Learn Center&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Free courses on LLMs, deep RL, CV, audio&lt;/li&gt;
&lt;li&gt;Great for: Hands-on Hugging Face ecosystem&lt;/li&gt;
&lt;li&gt;Format: Interactive notebooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/openai/openai-cookbook" target="_blank" rel="noopener"&gt;OpenAI Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Runnable examples using OpenAI API&lt;/li&gt;
&lt;li&gt;Great for: OpenAI API best practices&lt;/li&gt;
&lt;li&gt;Format: Code examples and guides&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pytorch/tutorials" target="_blank" rel="noopener"&gt;PyTorch Tutorials&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Basics to advanced deep learning&lt;/li&gt;
&lt;li&gt;Great for: PyTorch mastery&lt;/li&gt;
&lt;li&gt;Format: Comprehensive tutorials&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-cookbooks--example-collections-5-collections"&gt;🍳 Cookbooks &amp;amp; Example Collections (5 Collections)&lt;/h2&gt;
&lt;p&gt;Practical code examples and recipes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/anthropics/claude-cookbooks" target="_blank" rel="noopener"&gt;Claude Cookbooks&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Notebooks and examples for building with Claude&lt;/li&gt;
&lt;li&gt;Great for: Anthropic Claude integration&lt;/li&gt;
&lt;li&gt;Format: Jupyter notebooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/huggingface/cookbook" target="_blank" rel="noopener"&gt;Hugging Face Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Practical AI cookbook with Jupyter notebooks&lt;/li&gt;
&lt;li&gt;Great for: Open models and tools&lt;/li&gt;
&lt;li&gt;Format: Hands-on examples&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/RationaleInstitute/tinker-cookbook" target="_blank" rel="noopener"&gt;Tinker Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training and fine-tuning examples&lt;/li&gt;
&lt;li&gt;Great for: Fine-tuning workflows&lt;/li&gt;
&lt;li&gt;Format: Platform-specific recipes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/e2b-dev/e2b-cookbook" target="_blank" rel="noopener"&gt;E2B Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Examples for building LLM apps&lt;/li&gt;
&lt;li&gt;Great for: LLM application development&lt;/li&gt;
&lt;li&gt;Format: Recipes and tutorials&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/jamwithai/arxiv-paper-curator" target="_blank" rel="noopener"&gt;arXiv Paper Curator&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;6-week course on RAG systems&lt;/li&gt;
&lt;li&gt;Great for: Production-ready RAG&lt;/li&gt;
&lt;li&gt;Format: Project-based learning&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-guides--handbooks-5-resources"&gt;📖 Guides &amp;amp; Handbooks (5 Resources)&lt;/h2&gt;
&lt;p&gt;In-depth guides on specific topics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/dair-ai/prompt-engineering-guide" target="_blank" rel="noopener"&gt;Prompt Engineering Guide&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Comprehensive prompt engineering resources&lt;/li&gt;
&lt;li&gt;Great for: Mastering prompt design&lt;/li&gt;
&lt;li&gt;Format: Guides, papers, lectures, notebooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/huggingface/evaluation-guidebook" target="_blank" rel="noopener"&gt;Evaluation Guidebook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM evaluation best practices from Hugging Face&lt;/li&gt;
&lt;li&gt;Great for: Assessing LLM performance&lt;/li&gt;
&lt;li&gt;Format: Practical guide&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/davidkimai/context-engineering" target="_blank" rel="noopener"&gt;Context Engineering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Design and optimize context beyond prompt engineering&lt;/li&gt;
&lt;li&gt;Great for: Advanced context management&lt;/li&gt;
&lt;li&gt;Format: Practical handbook&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/coleam00/context-engineering-intro" target="_blank" rel="noopener"&gt;Context Engineering Intro&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Template and guide for context engineering&lt;/li&gt;
&lt;li&gt;Great for: Providing project context to AI assistants&lt;/li&gt;
&lt;li&gt;Format: Template + guide&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/IIETER/IIETER" target="_blank" rel="noopener"&gt;Vibe-Coding Workflow&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;5-step prompt template for building MVPs with LLMs&lt;/li&gt;
&lt;li&gt;Great for: Rapid prototyping with AI&lt;/li&gt;
&lt;li&gt;Format: Workflow template&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-template--workflow-collections"&gt;🗂️ Template &amp;amp; Workflow Collections&lt;/h2&gt;
&lt;p&gt;Reusable templates and workflows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/davila7/claude-code-templates" target="_blank" rel="noopener"&gt;Claude Code Templates&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code templates for various programming scenarios&lt;/li&gt;
&lt;li&gt;Great for: Claude AI development&lt;/li&gt;
&lt;li&gt;Format: Template collection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/zie619/n8n-workflows" target="_blank" rel="noopener"&gt;n8n Workflows&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2,000+ professionally organized n8n workflows&lt;/li&gt;
&lt;li&gt;Great for: Workflow automation&lt;/li&gt;
&lt;li&gt;Format: Searchable catalog&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/nusquama/n8nworkflows.xyz" target="_blank" rel="noopener"&gt;N8N Workflows Catalog&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Community-driven reusable workflow templates&lt;/li&gt;
&lt;li&gt;Great for: Workflow import and versioning&lt;/li&gt;
&lt;li&gt;Format: Template catalog&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-research--evaluation"&gt;📊 Research &amp;amp; Evaluation&lt;/h2&gt;
&lt;p&gt;Academic and evaluation resources.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/amberljc/llmsys-paperlist" target="_blank" rel="noopener"&gt;LLMSys PaperList&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Curated list of LLM systems papers&lt;/li&gt;
&lt;li&gt;Great for: Research on training, inference, serving&lt;/li&gt;
&lt;li&gt;Format: Paper collection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/cheahjs/free-llm-api-resources" target="_blank" rel="noopener"&gt;Free LLM API Resources&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM providers with free/trial API access&lt;/li&gt;
&lt;li&gt;Great for: Experimentation without cost&lt;/li&gt;
&lt;li&gt;Format: Provider list&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-other-notable-resources"&gt;🎨 Other Notable Resources&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools" target="_blank" rel="noopener"&gt;System Prompts and Models of AI Tools&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Community-curated collection of system prompts and AI tool examples&lt;/li&gt;
&lt;li&gt;Great for: Prompt and agent engineering&lt;/li&gt;
&lt;li&gt;Format: Resource collection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/epfml/ml_course" target="_blank" rel="noopener"&gt;ML Course CS-433&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;EPFL Machine Learning Course&lt;/li&gt;
&lt;li&gt;Great for: Academic ML foundation&lt;/li&gt;
&lt;li&gt;Format: Lectures, labs, projects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/stas00/ml-engineering" target="_blank" rel="noopener"&gt;Machine Learning Engineering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ML engineering open-book: compute, storage, networking&lt;/li&gt;
&lt;li&gt;Great for: Production ML systems&lt;/li&gt;
&lt;li&gt;Format: Comprehensive guide&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/neural-maze/realtime-phone-agents-course" target="_blank" rel="noopener"&gt;Realtime Phone Agents Course&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build low-latency voice agents&lt;/li&gt;
&lt;li&gt;Great for: Voice AI applications&lt;/li&gt;
&lt;li&gt;Format: Hands-on course&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/johnma2006/m3-workshop" target="_blank" rel="noopener"&gt;LLMs from Scratch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build a working LLM from first principles&lt;/li&gt;
&lt;li&gt;Great for: Understanding LLM internals&lt;/li&gt;
&lt;li&gt;Format: Repository + book materials&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-how-to-use-this-collection"&gt;💡 How to Use This Collection&lt;/h2&gt;
&lt;h3 id="for-complete-beginners"&gt;For Complete Beginners&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Start with&lt;/strong&gt;: Microsoft&amp;rsquo;s AI for Beginners&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Practice with&lt;/strong&gt;: PyTorch Tutorials&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Explore&lt;/strong&gt;: Awesome AI Apps for inspiration&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="for-developers"&gt;For Developers&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Build skills&lt;/strong&gt;: OpenAI Cookbook + Claude Cookbooks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Find tools&lt;/strong&gt;: Awesome Generative AI + Awesome LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Learn workflows&lt;/strong&gt;: n8n Workflows Catalog&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="for-researchers"&gt;For Researchers&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Stay updated&lt;/strong&gt;: Awesome Generative AI + LLMSys PaperList&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deep dive&lt;/strong&gt;: Awesome LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Implement&lt;/strong&gt;: Hugging Face Cookbook&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="for-product-builders"&gt;For Product Builders&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Find examples&lt;/strong&gt;: Awesome AI Apps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Learn workflows&lt;/strong&gt;: n8n Workflows Catalog&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Study patterns&lt;/strong&gt;: Awesome LLM Apps&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id="-what-was-not-removed"&gt;🔄 What Was NOT Removed&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Agent frameworks and production tools remain in the AI Resources list&lt;/strong&gt;, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AutoGen&lt;/strong&gt; - Microsoft&amp;rsquo;s multi-agent framework&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CrewAI&lt;/strong&gt; - High-performance multi-agent orchestration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; - Stateful multi-agent applications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flowise&lt;/strong&gt; - Visual agent platform&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Langflow&lt;/strong&gt; - Visual workflow builder&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;And 80+ more agent frameworks&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are &lt;strong&gt;functional tools&lt;/strong&gt; you can use to build applications, not educational collections. They belong in the AI Resources list.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-summary"&gt;📝 Summary&lt;/h2&gt;
&lt;p&gt;I removed &lt;strong&gt;44 collection-type projects&lt;/strong&gt; from the AI Resources list to keep it focused on production tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;14 Awesome Lists&lt;/strong&gt; - Discover new tools and stay updated&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;9 Courses &amp;amp; Tutorials&lt;/strong&gt; - Structured learning paths&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5 Cookbooks&lt;/strong&gt; - Practical code examples&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5 Guides &amp;amp; Handbooks&lt;/strong&gt; - In-depth resources&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;4 Template Collections&lt;/strong&gt; - Reusable workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;7 Other Resources&lt;/strong&gt; - Research and evaluation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These resources remain &lt;strong&gt;incredibly valuable&lt;/strong&gt; for learning and discovery. They just serve a different purpose than the production-focused tools in my AI Resources list.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Next Steps&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Bookmark this post for future reference&lt;/li&gt;
&lt;li&gt;Explore the &lt;a href="https://jimmysong.io/ai/"&gt;AI Resources list&lt;/a&gt; for production tools (agent frameworks, databases, etc.)&lt;/li&gt;
&lt;li&gt;Check out my blog for more AI engineering insights&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Acknowledgments&lt;/strong&gt;: This collection was compiled during my AI Resources cleanup initiative. Special thanks to all the maintainers of these awesome lists, courses, and collections for their invaluable contributions to the AI community.&lt;/p&gt;</content:encoded></item><item><title>Standing on Giants' Shoulders: The Traditional Infrastructure Powering Modern AI</title><link>https://jimmysong.io/blog/giants-beneath-ai-feet/</link><pubDate>Sun, 08 Feb 2026 08:00:00 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/giants-beneath-ai-feet/</guid><description>Before ChatGPT and TensorFlow, there was Hadoop, Kafka, and Kubernetes. This post honors the traditional open source infrastructure that became the foundation of today&amp;#39;s AI revolution.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;If I have seen further, it is by standing on the shoulders of giants.&amp;rdquo; — Isaac Newton&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/giants-beneath-ai-feet/banner.webp" data-img="https://assets.jimmysong.io/images/blog/giants-beneath-ai-feet/banner.webp" alt="Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI" data-caption="Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In the excitement surrounding LLMs, vector databases, and AI agents, it&amp;rsquo;s easy to forget that modern AI didn&amp;rsquo;t emerge from a vacuum. Today&amp;rsquo;s AI revolution stands upon decades of infrastructure work—distributed systems, data pipelines, search engines, and orchestration platforms that were built long before &amp;ldquo;AI Native&amp;rdquo; became a buzzword.&lt;/p&gt;
&lt;p&gt;This post is a tribute to those traditional open source projects that became the invisible foundation of AI infrastructure. They&amp;rsquo;re not &amp;ldquo;AI projects&amp;rdquo; per se, but without them, the AI revolution as we know it wouldn&amp;rsquo;t exist.&lt;/p&gt;
&lt;h2 id="the-evolution-from-big-data-to-ai"&gt;The Evolution: From Big Data to AI&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Core Technologies&lt;/th&gt;
&lt;th&gt;AI Connection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2000s&lt;/td&gt;
&lt;td&gt;Web Search &amp;amp; Indexing&lt;/td&gt;
&lt;td&gt;Lucene, Elasticsearch&lt;/td&gt;
&lt;td&gt;Semantic search foundations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2010s&lt;/td&gt;
&lt;td&gt;Big Data &amp;amp; Distributed Computing&lt;/td&gt;
&lt;td&gt;Hadoop, Spark, Kafka&lt;/td&gt;
&lt;td&gt;Data processing at scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2010s&lt;/td&gt;
&lt;td&gt;Cloud Native&lt;/td&gt;
&lt;td&gt;Docker, Kubernetes&lt;/td&gt;
&lt;td&gt;Model deployment platforms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2010s&lt;/td&gt;
&lt;td&gt;Stream Processing&lt;/td&gt;
&lt;td&gt;Flink, Storm, Pulsar&lt;/td&gt;
&lt;td&gt;Real-time ML inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020s&lt;/td&gt;
&lt;td&gt;AI Native&lt;/td&gt;
&lt;td&gt;Transformers, Vector DBs&lt;/td&gt;
&lt;td&gt;Built on everything above&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Evolution of Data Infrastructure
&lt;/figcaption&gt;
&lt;h2 id="big-data-frameworks-the-data-engines"&gt;Big Data Frameworks: The Data Engines&lt;/h2&gt;
&lt;p&gt;Before we could train models on petabytes of data, we needed ways to store, process, and move that data.&lt;/p&gt;
&lt;h3 id="apache-hadoop-2006"&gt;Apache Hadoop (2006)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/hadoop" target="_blank" rel="noopener"&gt;https://github.com/apache/hadoop&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hadoop democratized big data by making distributed computing accessible. Its HDFS filesystem and MapReduce paradigm proved that commodity hardware could process web-scale datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Modern ML training datasets live in HDFS-compatible storage&lt;/li&gt;
&lt;li&gt;Data lakes built on Hadoop became training data reservoirs&lt;/li&gt;
&lt;li&gt;Proved that distributed computing could scale horizontally&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="apache-kafka-2011"&gt;Apache Kafka (2011)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/kafka" target="_blank" rel="noopener"&gt;https://github.com/apache/kafka&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kafka redefined data streaming with its log-based architecture. It became the nervous system for real-time data flows in enterprises worldwide.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Real-time feature pipelines for ML models&lt;/li&gt;
&lt;li&gt;Event-driven architectures for AI agent systems&lt;/li&gt;
&lt;li&gt;Streaming inference pipelines&lt;/li&gt;
&lt;li&gt;Model telemetry and monitoring backbones&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="apache-spark-2014"&gt;Apache Spark (2014)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/spark" target="_blank" rel="noopener"&gt;https://github.com/apache/spark&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Spark brought in-memory computing to big data, making iterative algorithms (like ML training) practical at scale.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MLlib made ML accessible to data engineers&lt;/li&gt;
&lt;li&gt;Distributed data processing for model training&lt;/li&gt;
&lt;li&gt;Spark ML became the de facto standard for big data ML&lt;/li&gt;
&lt;li&gt;Proved that in-memory computing could accelerate ML workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="search-engines-the-retrieval-foundation"&gt;Search Engines: The Retrieval Foundation&lt;/h2&gt;
&lt;p&gt;Before RAG (Retrieval-Augmented Generation) became a buzzword, search engines were solving retrieval at scale.&lt;/p&gt;
&lt;h3 id="elasticsearch-2010"&gt;Elasticsearch (2010)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/elastic/elasticsearch" target="_blank" rel="noopener"&gt;https://github.com/elastic/elasticsearch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Elasticsearch made full-text search accessible and scalable. Its distributed architecture and RESTful API became the standard for search.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pioneered distributed inverted index structures&lt;/li&gt;
&lt;li&gt;Proved that horizontal scaling was possible for search workloads&lt;/li&gt;
&lt;li&gt;Many &amp;ldquo;AI search&amp;rdquo; systems actually use Elasticsearch under the hood&lt;/li&gt;
&lt;li&gt;Query DSL influenced modern vector database query languages&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="opensearch-2021"&gt;OpenSearch (2021)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/opensearch-project/opensearch" target="_blank" rel="noopener"&gt;https://github.com/opensearch-project/opensearch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When AWS forked Elasticsearch, it ensured search infrastructure remained truly open. OpenSearch continues the mission of accessible, scalable search.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintains open source innovation in search&lt;/li&gt;
&lt;li&gt;Vector search capabilities added in 2023&lt;/li&gt;
&lt;li&gt;Demonstrates community fork resilience&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="databases-from-sql-to-vectors"&gt;Databases: From SQL to Vectors&lt;/h2&gt;
&lt;p&gt;The evolution from relational databases to vector databases represents a paradigm shift—but both have AI relevance.&lt;/p&gt;
&lt;h3 id="traditional-databases-that-paved-the-way"&gt;Traditional Databases That Paved the Way&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Dgraph&lt;/strong&gt; (2015) - Graph database proving that specialized data structures enable new use cases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TDengine&lt;/strong&gt; (2019) - Time-series database for IoT ML workloads&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OceanBase&lt;/strong&gt; (2021) - Distributed database showing ACID transactions could scale&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Proved that specialized database engines could outperform general-purpose ones&lt;/li&gt;
&lt;li&gt;Database internals (indexing, sharding, replication) are now applied to vector databases&lt;/li&gt;
&lt;li&gt;Multi-model databases (graph + vector + relational) are becoming the norm for AI apps&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="cloud-native-the-runtime-foundation"&gt;Cloud Native: The Runtime Foundation&lt;/h2&gt;
&lt;p&gt;When Docker and Kubernetes emerged, they weren&amp;rsquo;t built for AI—but AI couldn&amp;rsquo;t scale without them.&lt;/p&gt;
&lt;h3 id="docker-2013--kubernetes-2014"&gt;Docker (2013) &amp;amp; Kubernetes (2014)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/kubernetes/kubernetes" target="_blank" rel="noopener"&gt;https://github.com/kubernetes/kubernetes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kubernetes became the operating system for cloud-native applications. Its declarative API and controller pattern made it perfect for AI workloads.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model deployment platforms (KServe, Seldon Core) run on K8s&lt;/li&gt;
&lt;li&gt;GPU orchestration (NVIDIA GPU Operator, Volcano, HAMi) extends K8s&lt;/li&gt;
&lt;li&gt;Kubeflow made K8s the standard for ML pipelines&lt;/li&gt;
&lt;li&gt;Microservice patterns enable modular AI agent architectures&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="service-mesh--serverless"&gt;Service Mesh &amp;amp; Serverless&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Istio&lt;/strong&gt; (2016), &lt;strong&gt;Knative&lt;/strong&gt; (2018) - Service mesh and serverless platforms that proved:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Network-level observability applies to AI model calls&lt;/li&gt;
&lt;li&gt;Scale-to-zero is essential for cost-effective inference&lt;/li&gt;
&lt;li&gt;Traffic splitting enables A/B testing of ML models&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI Gateway patterns evolved from API gateways + service mesh&lt;/li&gt;
&lt;li&gt;Serverless inference platforms use Knative-style autoscaling&lt;/li&gt;
&lt;li&gt;Observability patterns (tracing, metrics) are now standard for ML systems&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="api-gateways-from-rest-to-llm"&gt;API Gateways: From REST to LLM&lt;/h2&gt;
&lt;p&gt;API gateways weren&amp;rsquo;t designed for AI, but they became the foundation of AI Gateway patterns.&lt;/p&gt;
&lt;h3 id="kong-apisix-kgateway"&gt;Kong, APISIX, KGateway&lt;/h3&gt;
&lt;p&gt;These API gateways solved rate limiting, auth, and routing at scale. When LLMs emerged, the same patterns applied:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI Gateway Evolution&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Traditional API Gateway (2010s)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Rate Limiting → Token Bucket Rate Limiting
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Auth → API Key + Organization Management
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Routing → Model Routing (GPT-4 → Claude → Local Models)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Observability → LLM-specific Telemetry (token usage, cost)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;AI Gateway (2024)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Proved that centralized API management scales&lt;/li&gt;
&lt;li&gt;Plugin architectures enable LLM-specific features&lt;/li&gt;
&lt;li&gt;Traffic management patterns apply to prompt routing&lt;/li&gt;
&lt;li&gt;Security patterns (mTLS, JWT) now protect AI endpoints&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="workflow-orchestration-the-pipeline-backbone"&gt;Workflow Orchestration: The Pipeline Backbone&lt;/h2&gt;
&lt;p&gt;Data engineering needs pipelines. ML engineering needs pipelines. AI agents need workflows.&lt;/p&gt;
&lt;h3 id="apache-airflow-2015"&gt;Apache Airflow (2015)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/airflow" target="_blank" rel="noopener"&gt;https://github.com/apache/airflow&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Airflow made pipeline orchestration accessible with its DAG-based approach. It became the standard for ETL and data engineering.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ML pipeline orchestration (feature engineering, training, evaluation)&lt;/li&gt;
&lt;li&gt;Proved that DAG-based workflow definition works at scale&lt;/li&gt;
&lt;li&gt;Prompt engineering pipelines use Airflow-style orchestration&lt;/li&gt;
&lt;li&gt;Scheduler patterns are now applied to AI agent workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="n8n-prefect-flyte"&gt;n8n, Prefect, Flyte&lt;/h3&gt;
&lt;p&gt;Modern workflow platforms that evolved from Airflow&amp;rsquo;s foundations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;n8n&lt;/strong&gt; (2019) - Visual workflow automation with AI capabilities&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefect&lt;/strong&gt; (2018) - Python-native workflow orchestration for ML&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flyte&lt;/strong&gt; (2019) - Kubernetes-native workflow orchestration for ML/data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-modal agents need workflow orchestration&lt;/li&gt;
&lt;li&gt;RAG pipelines are essentially ETL pipelines for embeddings&lt;/li&gt;
&lt;li&gt;Prompt chaining is DAG-based orchestration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="data-formats-the-lakehouse-foundation"&gt;Data Formats: The Lakehouse Foundation&lt;/h2&gt;
&lt;p&gt;Before we could train on massive datasets, we needed formats that supported ACID transactions and schema evolution.&lt;/p&gt;
&lt;h3 id="delta-lake-apache-iceberg-apache-hudi"&gt;Delta Lake, Apache Iceberg, Apache Hudi&lt;/h3&gt;
&lt;p&gt;These table formats brought reliability to data lakes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training datasets need versioning and reproducibility&lt;/li&gt;
&lt;li&gt;Feature stores use Delta/Iceberg as storage formats&lt;/li&gt;
&lt;li&gt;Proved that &amp;ldquo;big data&amp;rdquo; could have transactional semantics&lt;/li&gt;
&lt;li&gt;Schema evolution handles ML feature drift&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-invisible-thread-why-these-projects-matter"&gt;The Invisible Thread: Why These Projects Matter&lt;/h2&gt;
&lt;p&gt;What do all these projects have in common?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;They solved scaling first&lt;/strong&gt; - AI training/inference needs horizontal scaling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They proved distributed systems work&lt;/strong&gt; - Modern AI is fundamentally distributed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They created ecosystem patterns&lt;/strong&gt; - Plugin systems, extension points, APIs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They established best practices&lt;/strong&gt; - Observability, security, CI/CD&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They built developer habits&lt;/strong&gt; - YAML configs, declarative APIs, CLI tools&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="the-ai-native-continuum"&gt;The AI Native Continuum&lt;/h2&gt;
&lt;p&gt;Modern &amp;ldquo;AI Native&amp;rdquo; infrastructure didn&amp;rsquo;t replace these projects—it builds on them:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional Project&lt;/th&gt;
&lt;th&gt;AI Native Evolution&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hadoop HDFS&lt;/td&gt;
&lt;td&gt;Distributed model storage&lt;/td&gt;
&lt;td&gt;HDFS for datasets, S3 for checkpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;td&gt;Real-time feature pipelines&lt;/td&gt;
&lt;td&gt;Kafka → Feature Store → Model Serving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spark ML&lt;/td&gt;
&lt;td&gt;Distributed ML training&lt;/td&gt;
&lt;td&gt;MLlib → PyTorch Distributed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;Vector search&lt;/td&gt;
&lt;td&gt;ES → Weaviate/Qdrant/Milvus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes&lt;/td&gt;
&lt;td&gt;ML orchestration&lt;/td&gt;
&lt;td&gt;K8s → Kubeflow/KServe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Istio&lt;/td&gt;
&lt;td&gt;AI Gateway service mesh&lt;/td&gt;
&lt;td&gt;Istio → LLM Gateway with mTLS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow&lt;/td&gt;
&lt;td&gt;ML pipeline orchestration&lt;/td&gt;
&lt;td&gt;Airflow → Prefect/Flyte for ML&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: From Traditional to AI Native
&lt;/figcaption&gt;
&lt;h2 id="why-were-removing-them-from-ai-resources-list"&gt;Why We&amp;rsquo;re Removing Them from AI Resources List&lt;/h2&gt;
&lt;p&gt;This post honors these projects, but we&amp;rsquo;re also removing them from our AI Resources list. Here&amp;rsquo;s why:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;They&amp;rsquo;re not &amp;ldquo;AI Projects&amp;rdquo;—they&amp;rsquo;re foundational infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hadoop, Kafka, Spark&lt;/strong&gt; are data engineering tools, not ML frameworks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elasticsearch&lt;/strong&gt; is search, not semantic search&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt; is general-purpose orchestration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;API gateways&lt;/strong&gt; serve REST/GraphQL, not just LLMs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;But their absence doesn&amp;rsquo;t diminish their importance.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;By removing them, we acknowledge that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;AI has its own ecosystem&lt;/strong&gt; - Transformers, vector DBs, LLM ops&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Traditional infra has its own domain&lt;/strong&gt; - Data engineering, cloud native&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The intersection is where innovation happens&lt;/strong&gt; - AI-native data platforms, LLM ops on K8s&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="the-giants-we-stand-on"&gt;The Giants We Stand On&lt;/h2&gt;
&lt;p&gt;The next time you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deploy a model on Kubernetes&lt;/li&gt;
&lt;li&gt;Stream features through Kafka&lt;/li&gt;
&lt;li&gt;Search embeddings with a vector database&lt;/li&gt;
&lt;li&gt;Orchestrate a RAG pipeline with Prefect&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Remember: You&amp;rsquo;re standing on the shoulders of Hadoop, Kafka, Elasticsearch, Kubernetes, and countless others. They built the roads we now drive on.&lt;/p&gt;
&lt;h2 id="the-future-building-new-giants"&gt;The Future: Building New Giants&lt;/h2&gt;
&lt;p&gt;Just as Hadoop and Kafka enabled modern AI, today&amp;rsquo;s AI infrastructure will become tomorrow&amp;rsquo;s foundation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Vector databases&lt;/strong&gt; may become the new standard for all search&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LLM observability&lt;/strong&gt; may evolve into general distributed tracing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI agent orchestration&lt;/strong&gt; may reinvent workflow automation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU scheduling&lt;/strong&gt; may influence general-purpose resource management&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cycle continues. The giants of today will be the foundations of tomorrow.&lt;/p&gt;
&lt;h2 id="conclusion-gratitude-and-continuity"&gt;Conclusion: Gratitude and Continuity&lt;/h2&gt;
&lt;p&gt;As we clean up our AI Resources list to focus on AI-native projects, we don&amp;rsquo;t forget where we came from. Traditional big data and cloud native infrastructure made the AI revolution possible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To the Hadoop committers, Kafka maintainers, Kubernetes contributors, and all who built the foundation: Thank you.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Your work enabled ChatGPT, enabled Transformers, enabled everything we now call &amp;ldquo;AI.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Standing on your shoulders, we see further.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Acknowledgments&lt;/strong&gt;: This post was inspired by the need to refactor our AI Resources list. The 27 projects mentioned here are being removed—not because they&amp;rsquo;re unimportant, but because they deserve their own category: &lt;strong&gt;The Foundation&lt;/strong&gt;.&lt;/p&gt;</content:encoded></item><item><title>My First Month at Dynamia: Why AI Native Infra Is Worth the Investment</title><link>https://jimmysong.io/blog/why-i-join-dynamia-ai-native-infra/</link><pubDate>Fri, 06 Feb 2026 12:56:35 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/why-i-join-dynamia-ai-native-infra/</guid><description>Observations from my first month at Dynamia: From cloud native to AI Native Infra, why this direction is worth investing in, and the key issues and opportunities in compute governance.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Time flies—it&amp;rsquo;s already been a month since I joined Dynamia. In this article, I want to share my observations from this past month: why AI Native Infra is a direction worth investing in, and some considerations for those thinking about their own career or technical direction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;After nearly five years of remote work, I officially joined &lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia&lt;/a&gt; last month as VP of Open Source Ecosystem. This decision was not sudden, but a natural extension of my journey from cloud native to AI Native Infra.&lt;/p&gt;
&lt;p&gt;But this article is not just about my personal choice. I want to answer a more universal question: &lt;strong&gt;In the wave of AI infrastructure startups, why is compute governance a direction worth investing in?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For the past decade, I have worked continuously in the infrastructure space: from Kubernetes to Service Mesh, and now to AI Infra. I am increasingly convinced that the core challenge in the AI era is not &amp;ldquo;can the model run,&amp;rdquo; but &amp;ldquo;can compute resources be run efficiently, reliably, and in a controlled manner.&amp;rdquo; This conviction has only grown stronger through my observations and reflections during this first month at Dynamia.&lt;/p&gt;
&lt;p&gt;This article answers three questions: What is AI Native Infra? Why is GPU virtualization a necessity? Why did I choose Dynamia and HAMi?&lt;/p&gt;
&lt;h2 id="what-is-ai-native-infra"&gt;What Is AI Native Infra&lt;/h2&gt;
&lt;p&gt;The core of &lt;a href="https://jimmysong.io/book/ai-native-infra/"&gt;AI Native Infrastructure&lt;/a&gt; is not about adding another platform layer, but about redefining the governance target: expanding from &amp;ldquo;services and containers&amp;rdquo; to &amp;ldquo;model behaviors and compute assets.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I summarize it as three key shifts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Models as execution entities&lt;/strong&gt;: Governance now includes not just processes, but also model behaviors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute as a scarce asset&lt;/strong&gt;: GPU, memory, and bandwidth must be scheduled and metered precisely.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Uncertainty as the default&lt;/strong&gt;: Systems must remain observable and recoverable amid fluctuations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In essence, AI Native Infra is about upgrading compute governance from &amp;ldquo;resource allocation&amp;rdquo; to &amp;ldquo;sustainable business capability.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="why-gpu-virtualization-is-essential"&gt;Why GPU Virtualization Is Essential&lt;/h2&gt;
&lt;p&gt;Many teams focus on model inference optimization, but in production, enterprises first encounter the problem of &amp;ldquo;underutilized GPUs.&amp;rdquo; This is where GPU virtualization delivers value.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Structural idleness&lt;/strong&gt;: Small tasks monopolize large GPUs, leaving them idle for long periods.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pseudo-isolation risks&lt;/strong&gt;: Native sharing lacks hard boundaries, so a single task OOM can cause cascading failures.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduling failures&lt;/strong&gt;: Some users queue for GPUs while others occupy but do not use them, leading to both shortages and idleness.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fragmentation waste&lt;/strong&gt;: There may be enough total GPU, but not enough full cards, making efficient packing impossible.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vendor lock-in anxiety&lt;/strong&gt;: Proprietary, tightly coupled solutions make migration costs uncontrollable.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short: GPUs must not only be allocatable, but also splittable, isolatable, schedulable, and governable.&lt;/p&gt;
&lt;h2 id="the-relationship-between-hami-and-dynamia"&gt;The Relationship Between HAMi and Dynamia&lt;/h2&gt;
&lt;p&gt;This is the most frequently asked question. Here is the shortest answer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HAMi&lt;/strong&gt;: A CNCF-hosted open source project and community focused on GPU virtualization and heterogeneous compute scheduling.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamia&lt;/strong&gt;: The founding and leading company behind HAMi, providing enterprise-grade products and services based on HAMi.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Open source projects are not the same as company products, but the two evolve together. HAMi drives industry adoption and technical trust, while Dynamia brings these capabilities into enterprise production environments at scale. This &amp;ldquo;dual engine&amp;rdquo; approach is what makes Dynamia unique.&lt;/p&gt;
&lt;h2 id="what-hami-provides"&gt;What HAMi Provides&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/project-hami/hami" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; (&lt;em&gt;Heterogeneous AI Computing Virtualization Middleware&lt;/em&gt;) delivers three key capabilities on Kubernetes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Virtualization and partitioning&lt;/strong&gt;: Split physical GPUs into logical resources on demand to improve utilization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduling and topology awareness&lt;/strong&gt;: Place workloads optimally based on topology to reduce communication bottlenecks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Isolation and observability&lt;/strong&gt;: Support quotas, policies, and monitoring to reduce production risks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently, HAMi has attracted over 360 contributors from 16 countries, with more than 200 enterprise end users, and its international influence continues to grow.&lt;/p&gt;
&lt;h2 id="market-trends-the-ai-infrastructure-startup-wave"&gt;Market Trends: The AI Infrastructure Startup Wave&lt;/h2&gt;
&lt;p&gt;AI infrastructure is experiencing a new wave of startups. The vLLM team&amp;rsquo;s company raised $150 million, SGLang&amp;rsquo;s commercial spin-off RadixArk is valued at $4 billion, and Databricks acquired MosaicML for $1.3 billion—all pointing to a consensus: &lt;strong&gt;Whoever helps enterprises run large models more efficiently and cost-effectively will hold the keys to next-generation AI infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Against this backdrop, &lt;strong&gt;the positioning of Dynamia and HAMi&lt;/strong&gt; is even clearer. Many teams focus on &amp;ldquo;model performance acceleration&amp;rdquo; and &amp;ldquo;inference optimization&amp;rdquo; (like vLLM, SGLang), while we focus on &lt;strong&gt;&amp;ldquo;resource scheduling and virtualization&amp;rdquo;&lt;/strong&gt;—enabling better orchestration of existing accelerated hardware resources.&lt;/p&gt;
&lt;p&gt;The two are complementary: the former makes individual models run faster and cheaper, while the latter ensures that compute allocation at the cluster level is efficient, fair, and controllable. This is similar to extending Kubernetes&amp;rsquo; CPU/memory scheduling philosophy to GPU and heterogeneous compute management in the AI era.&lt;/p&gt;
&lt;h2 id="why-ai-native-infra-is-worth-the-investment"&gt;Why AI Native Infra Is Worth the Investment&lt;/h2&gt;
&lt;p&gt;My observations this month have convinced me that &lt;strong&gt;compute governance is the most undervalued yet most promising area in AI infrastructure&lt;/strong&gt;. If you are considering a career or technical investment, here is my assessment:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First, this is a real and urgent pain point&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Model training and inference optimization attract a lot of attention, but in production, enterprises first encounter the problem of &amp;ldquo;underutilized GPUs&amp;rdquo;—structural idleness, scheduling failures, fragmentation waste, and vendor lock-in anxiety. Without solving these problems, even the fastest models cannot scale in production. GPU virtualization and heterogeneous compute scheduling are the &amp;ldquo;infrastructure below infrastructure&amp;rdquo; for enterprise AI transformation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Second, this is a clear long-term track&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Frameworks like vLLM and SGLang emerge constantly, making individual models run faster. But who ensures that compute allocation at the cluster level is efficient, fair, and controllable? This is similar to extending Kubernetes&amp;rsquo; success in CPU/memory scheduling to GPU and heterogeneous compute management in the AI era. This is not something that can be finished in a year or two, but a direction for continuous construction over the next five to ten years.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Third, this is an open and verifiable path&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Dynamia chose to build on HAMi as an open source foundation, first solving general capabilities, then supporting enterprise adoption. This means the technical direction is transparent and verifiable in the community. You can form your own judgment by participating in open source, observing adoption, and evaluating the ecosystem—rather than relying on the black-box promises of proprietary solutions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fourth, this is a window of opportunity that is opening now&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;AI infrastructure is being redefined. Investing in its construction today will continue to yield value in the coming years. The vLLM team&amp;rsquo;s company raised $150 million, SGLang&amp;rsquo;s commercial spin-off RadixArk is valued at $4 billion, Databricks acquired MosaicML for $1.3 billion—all validating the same trend: &lt;strong&gt;Whoever helps enterprises run large models more efficiently will hold the keys to next-generation AI infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I hope to bring my experience in cloud native and open source communities to the next stage of HAMi and Dynamia: turning GPU resources from a &amp;ldquo;cost center&amp;rdquo; into an &amp;ldquo;operational asset.&amp;rdquo; This is not just my career choice, but my judgment and investment in the direction of next-generation infrastructure.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Join the HAMi Community
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
Add me on WeChat (&lt;code&gt;jimmysong&lt;/code&gt;) to join the &lt;a href="https://github.com/project-hami/hami" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; community focused on GPU virtualization and heterogeneous compute scheduling.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;If you are also interested in HAMi, GPU virtualization, AI Native Infra, or Dynamia, feel free to reach out.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;From cloud native to AI Native Infra, my observations this month have only strengthened my conviction: &lt;strong&gt;The true upper limit of AI applications is determined by the infrastructure&amp;rsquo;s ability to govern compute resources.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;HAMi addresses the fundamental issues of GPU virtualization and heterogeneous compute scheduling, while Dynamia is driving these capabilities into large-scale production. If you are also looking for a technical direction worth long-term investment, AI Native Infra—especially compute governance and scheduling—is a track with real pain points, a clear path, an open ecosystem, and an opening window of opportunity.&lt;/p&gt;
&lt;p&gt;Joining Dynamia is not just a career choice, but a commitment to building the next generation of infrastructure. I hope the observations and reflections in this article can provide some reference for you as you evaluate technical directions and career opportunities.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;If you are also interested in HAMi, GPU virtualization, AI Native Infra, or Dynamia, feel free to reach out.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>The True Inflection Point of ADD: When Spec Becomes the Core Asset of AI-Era Software</title><link>https://jimmysong.io/blog/add-inflection-point-spec-as-core-asset/</link><pubDate>Tue, 20 Jan 2026 07:51:36 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/add-inflection-point-spec-as-core-asset/</guid><description>Exploring how Spec becomes the governable core asset in Agent-Driven Development (ADD) and the trend toward control-plane engineering systems.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The role of Spec is undergoing a fundamental transformation, becoming the governance anchor of engineering systems in the AI era.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-essence-of-software-engineering-and-the-cost-structure-shift-brought-by-ai"&gt;The Essence of Software Engineering and the Cost Structure Shift Brought by AI&lt;/h2&gt;
&lt;p&gt;From first principles, software engineering has always been about one thing: &lt;strong&gt;stably, controllably, and reproducibly transforming human intent into executable systems.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Artificial Intelligence (AI) does not change this engineering essence, but it dramatically alters the cost structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Implementation costs plummet:&lt;/strong&gt; Code, tests, and boilerplate logic are rapidly commoditized.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Consistency costs rise sharply:&lt;/strong&gt; Intent drift, hidden conflicts, and cross-module inconsistencies become more frequent.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Governance costs are amplified:&lt;/strong&gt; As agents can act directly, auditability, accountability, and explainability become hard constraints.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, in the era of Agent-Driven Development (ADD), the core issue is not &amp;ldquo;can agents do the work,&amp;rdquo; but how to maintain controllability and intent preservation in engineering systems under highly autonomous agents.&lt;/p&gt;
&lt;h2 id="the-add-era-inflection-point-three-structural-preconditions"&gt;The ADD Era Inflection Point: Three Structural Preconditions&lt;/h2&gt;
&lt;p&gt;Many attribute the &amp;ldquo;explosion&amp;rdquo; of ADD to more mature multi-agent systems, stronger models, or more automated tools. In reality, the true structural inflection point arises only when these three conditions are met:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agents have acquired multi-step execution capabilities&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With frameworks like LangChain, LangGraph, and CrewAI, agents are no longer just prompt invocations, but long-lived entities capable of planning, decomposition, execution, and rollback.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agents are entering real enterprise delivery pipelines&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Once in enterprise R&amp;amp;D, the question shifts from &amp;ldquo;can it generate&amp;rdquo; to &amp;ldquo;who approved it, is it compliant, can it be rolled back.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Traditional engineering tools lack a control plane for the agent era&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Tools like Git, CI, and Issue Trackers were designed for &amp;ldquo;human developer collaboration,&amp;rdquo; not for &amp;ldquo;agent execution.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;When these three factors converge, ADD inevitably shifts from an &amp;ldquo;efficiency tool&amp;rdquo; to a &amp;ldquo;governance system.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-changing-role-of-spec-from-documentation-to-system-constraint"&gt;The Changing Role of Spec: From Documentation to System Constraint&lt;/h2&gt;
&lt;p&gt;In the context of ADD, Spec is undergoing a fundamental shift:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Spec is no longer &amp;ldquo;documentation for humans,&amp;rdquo; but &amp;ldquo;the source of constraints and facts for systems and agents to execute.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Spec now serves at least three roles:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Verifiable expression of intent and boundaries&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Requirements, acceptance criteria, and design principles are no longer just text, but objects that can be checked, aligned, and traced.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stable contracts for organizational collaboration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When agents participate in delivery, verbal consensus and tacit knowledge quickly fail. Versioned, auditable artifacts become the foundation of collaboration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Policy surface for agent execution&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Agents can write code, modify configurations, and trigger pipelines. Spec must become the constraint on &amp;ldquo;what can and cannot be done.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;From this perspective, the status of Spec is approaching that of the &lt;strong&gt;Control Plane&lt;/strong&gt; in AI-native infrastructure.&lt;/p&gt;
&lt;h2 id="the-reality-of-multi-agent-workflows-orchestration-and-governance-first"&gt;The Reality of Multi-Agent Workflows: Orchestration and Governance First&lt;/h2&gt;
&lt;p&gt;In recent systems (such as &lt;a href="https://apoxai.com" target="_blank" rel="noopener"&gt;APOX&lt;/a&gt; and other enterprise products), an industry consensus is emerging:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-agent collaboration no longer pursues &amp;ldquo;full automation,&amp;rdquo; but is staged and gated.&lt;/li&gt;
&lt;li&gt;Frameworks like LangGraph are used to build persistent, debuggable agent workflows.&lt;/li&gt;
&lt;li&gt;RAG (e.g., based on Milvus) is used to accumulate historical Specs, decisions, and context as long-term memory.&lt;/li&gt;
&lt;li&gt;The IDE mainly focuses on execution efficiency, not engineering governance.&lt;/li&gt;
&lt;/ul&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/add-inflection-point-spec-as-core-asset/apox.webp" data-img="https://assets.jimmysong.io/images/blog/add-inflection-point-spec-as-core-asset/apox.webp" alt="Figure 1: APOX user interface" data-caption="Figure 1: APOX user interface"
width="1400"
height="1045"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: APOX user interface&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;APOX (AI Product Orchestration eXtended) is a multi-agent collaboration workflow platform for enterprise software delivery. Its core goals are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;To connect the entire process from product requirements to executable code with a governable Agentflow and explicit engineering artifact chain.&lt;/li&gt;
&lt;li&gt;To assign dedicated AI agents to each delivery stage (such as PRD, PO, Architecture, Developer, Implementation, Coding, etc.).&lt;/li&gt;
&lt;li&gt;To embed manual approval gates and full audit trails at every step, solving the &amp;ldquo;intent drift and consistency&amp;rdquo; governance problem that traditional AI coding tools cannot address.&lt;/li&gt;
&lt;li&gt;The platform provides a VS Code plugin for real-time sync between local IDE and web artifacts, allowing Specs, code, tasks, and approval statuses to coexist in the repository.&lt;/li&gt;
&lt;li&gt;Supports assigning different base models to different agents according to enterprise needs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;APOX is not about simply speeding up code generation, but about elevating &amp;ldquo;Spec&amp;rdquo; from auxiliary documentation to a verifiable, constrainable, and traceable core asset in engineering—building a control plane and workflow governance system suitable for Agent-Driven Development.&lt;/p&gt;
&lt;p&gt;Such systems emphasize:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An explicit artifact chain from PRD → Spec → Task → Implementation.&lt;/li&gt;
&lt;li&gt;Manual confirmation and audit points at every stage.&lt;/li&gt;
&lt;li&gt;Bidirectional sync between Spec, code, repository, and IDE.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not about &amp;ldquo;smarter AI,&amp;rdquo; but about engineering systems adapting to the agent era.&lt;/p&gt;
&lt;h2 id="the-long-term-value-of-spec-the-core-anchor-of-engineering-assets"&gt;The Long-Term Value of Spec: The Core Anchor of Engineering Assets&lt;/h2&gt;
&lt;p&gt;This is not to devalue code, but to acknowledge reality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There will always be long-term differentiation in algorithms and model capabilities.&lt;/li&gt;
&lt;li&gt;General engineering implementation is rapidly homogenizing.&lt;/li&gt;
&lt;li&gt;What is hard to replicate is: how to define problems, constrain systems, and govern change.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the ADD era, the value of Spec is reflected in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Determining what agents can and cannot do.&lt;/li&gt;
&lt;li&gt;Carrying the organization&amp;rsquo;s long-term understanding of the system.&lt;/li&gt;
&lt;li&gt;Serving as the anchor for audit, compliance, and accountability.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Code will be rewritten again and again; Spec is the long-term asset.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="risks-and-challenges-of-add-living-spec-and-governance-constraints"&gt;Risks and Challenges of ADD: Living Spec and Governance Constraints&lt;/h2&gt;
&lt;p&gt;ADD also faces significant risks:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Can Spec become a Living Spec&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That is, when key implementation changes occur, can the system detect &amp;ldquo;intent changes&amp;rdquo; and prompt Spec updates, rather than allowing silent drift?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Can governance achieve low friction but strong constraints&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If gates are too strict, teams will bypass them; if too loose, the system loses control.&lt;/p&gt;
&lt;p&gt;These two factors determine whether ADD is &amp;ldquo;the next engineering paradigm&amp;rdquo; or &amp;ldquo;just another tool bubble.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-trend-toward-control-planes-in-engineering-systems"&gt;The Trend Toward Control Planes in Engineering Systems&lt;/h2&gt;
&lt;p&gt;From a broader perspective, ADD is the inevitable result of engineering systems becoming &amp;ldquo;control planes&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Engineering systems are evolving from &amp;ldquo;human collaboration tools&amp;rdquo; to &amp;ldquo;control systems for agent execution.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agent / IDE is the &lt;strong&gt;execution plane&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;RAG / Memory is the &lt;strong&gt;state and memory plane&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Spec is the intent and policy plane&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Gates, audit, and traceability form the governance loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This closely aligns with the evolution path of AI-native infrastructure.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The winners of the ADD era will not be the systems with &amp;ldquo;the most agents or the fastest generation,&amp;rdquo; but those that first upgrade Spec from documentation to a governable, auditable, and executable asset. As automation advances, the true scarcity is the long-term control of intent.&lt;/p&gt;</content:encoded></item><item><title>AI Voice Dictation Input Methods Are Becoming the New Shortcut Key for the Programming Era</title><link>https://jimmysong.io/blog/ai-voice-dictation-input-method-comparison/</link><pubDate>Sun, 18 Jan 2026 06:53:08 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-voice-dictation-input-method-comparison/</guid><description>Comparing Miaoyan, Zhipu, and Shandianshuo voice input methods for developers: speed, stability, command capabilities, and cost models.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Voice input methods are not just about being &amp;ldquo;fast&amp;rdquo;—they are becoming a brand new gateway for developers to collaborate with AI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="alert alert-warning-container"&gt;
&lt;div class="alert-warning-title px-2"&gt;
Warning
&lt;/div&gt;
&lt;div class="alert-warning px-2"&gt;
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.
&lt;/div&gt;
&lt;/div&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/banner.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/banner.webp" alt="Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience." data-caption="Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience."
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="ai-voice-input-methods-are-becoming-the-new-shortcut-key-in-the-programming-era"&gt;AI Voice Input Methods Are Becoming the &amp;ldquo;New Shortcut Key&amp;rdquo; in the Programming Era&lt;/h2&gt;
&lt;p&gt;I am increasingly convinced of one thing: &lt;strong&gt;PC-based AI voice input methods are evolving from mere &amp;ldquo;input tools&amp;rdquo; into the foundational interaction layer for the era of programming and AI collaboration.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not just about typing faster—it determines how you deliver your &lt;strong&gt;intent&lt;/strong&gt; to the system, whether you&amp;rsquo;re writing documentation, code, or collaborating with AI in IDEs, terminals, or chat windows.&lt;/p&gt;
&lt;p&gt;Because of this, the differences in voice input method experiences are far more significant than they appear on the surface.&lt;/p&gt;
&lt;h2 id="my-six-evaluation-criteria-for-ai-voice-input-methods"&gt;My Six Evaluation Criteria for AI Voice Input Methods&lt;/h2&gt;
&lt;p&gt;After long-term, high-frequency use, I have developed a set of criteria to assess the real-world performance of AI voice input methods:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Response speed&lt;/strong&gt;: Does text appear quickly enough after pressing the shortcut to keep up with your thoughts?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuous input stability&lt;/strong&gt;: Does it remain reliable during extended use, or does it suddenly fail or miss recognition?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mixed Chinese-English and technical terms&lt;/strong&gt;: Can it reliably handle code, paths, abbreviations, and product names?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developer friendliness&lt;/strong&gt;: Is it truly designed for command line, IDE, and automation scenarios?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Interaction restraint&lt;/strong&gt;: Does it avoid introducing distracting features that interfere with input itself?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Subscription and cost structure&lt;/strong&gt;: Is it a standalone paid product, or can it be bundled with existing tool subscriptions?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on these criteria, I focused on comparing &lt;strong&gt;Miaoyan&lt;/strong&gt;, &lt;strong&gt;Shandianshuo&lt;/strong&gt;, and &lt;strong&gt;Zhipu AI Voice Input Method&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="miaoyan-currently-the-most-developer-oriented-domestic-product"&gt;Miaoyan: Currently the Most &amp;ldquo;Developer-Oriented&amp;rdquo; Domestic Product&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://miaoyan.cn" target="_blank" rel="noopener"&gt;Miaoyan&lt;/a&gt; was the first domestic AI voice input method I used extensively, and it remains the one I am most willing to use continuously.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/miaoyan.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/miaoyan.webp" alt="Figure 2: Miaoyan is currently my most-used Mac voice input method." data-caption="Figure 2: Miaoyan is currently my most-used Mac voice input method."
width="2272"
height="1624"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Miaoyan is currently my most-used Mac voice input method.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="command-mode-the-key-differentiator-for-developer-productivity"&gt;Command Mode: The Key Differentiator for Developer Productivity&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s important to clarify that &lt;strong&gt;Miaoyan&amp;rsquo;s command mode is not about editing text via voice&lt;/strong&gt;. Instead:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You describe your need in natural language, and the system directly generates an &lt;strong&gt;executable command-line command&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is crucial for developers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s not just about input&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s about turning voice into an automation entry point&lt;/li&gt;
&lt;li&gt;Essentially, it connects voice to the CLI or toolchain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This design is clearly focused on &lt;strong&gt;engineering efficiency&lt;/strong&gt;, not office document polishing.&lt;/p&gt;
&lt;h3 id="usage-experience-summary"&gt;Usage Experience Summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Fast response, nearly instant&lt;/li&gt;
&lt;li&gt;Output is relatively clean, with minimal guessing&lt;/li&gt;
&lt;li&gt;Interaction design is restrained, with no unnecessary concepts&lt;/li&gt;
&lt;li&gt;Developer-friendly mindset&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But there are some practical limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It is a &lt;strong&gt;completely standalone product&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Requires a separate subscription&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Still in relatively small-scale use&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a product strategy perspective, it feels more like a &amp;ldquo;pure tool&amp;rdquo; than part of an ecosystem.&lt;/p&gt;
&lt;div class="alert alert-warning-container"&gt;
&lt;div class="alert-warning-title px-2"&gt;
Note
&lt;/div&gt;
&lt;div class="alert-warning px-2"&gt;
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="shandianshuo-local-first-approach-developer-experience-depends-on-your-setup"&gt;Shandianshuo: Local-First Approach, Developer Experience Depends on Your Setup&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://shandianshuo.cn" target="_blank" rel="noopener"&gt;Shandianshuo&lt;/a&gt; takes a different approach: it treats voice input as a &amp;ldquo;local-first foundational capability,&amp;rdquo; emphasizing low latency and privacy (at least in its product narrative). The natural advantages of this approach are speed and controllable marginal costs, making it suitable as a &amp;ldquo;system capability&amp;rdquo; that&amp;rsquo;s always available, rather than a cloud service.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/shandianshuo.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/shandianshuo.webp" alt="Figure 3: Shandianshuo settings page" data-caption="Figure 3: Shandianshuo settings page"
width="2556"
height="2080"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Shandianshuo settings page&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;However, from a developer&amp;rsquo;s perspective, its upper limit often depends on &amp;ldquo;how you implement enhanced capabilities&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;If you only use it for basic transcription, the experience is more like a high-quality local input tool. But if you want better mixed Chinese-English input, technical term correction, symbol and formatting handling, the common approach is to add optional AI correction/enhancement capabilities, which usually requires extra configuration (such as providing your own API key or subscribing to enhanced features). The key trade-off here is not &amp;ldquo;can it be used,&amp;rdquo; but &amp;ldquo;how much configuration cost are you willing to pay for enhanced capabilities.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;If you want voice input to be a &amp;ldquo;lightweight, stable, non-intrusive&amp;rdquo; foundation, Shandianshuo is worth considering. But if your goal is to make voice input part of your developer workflow (such as command generation or executable actions), it needs to offer stronger productized design at the &amp;ldquo;command layer&amp;rdquo; and in terms of controllability.&lt;/p&gt;
&lt;h2 id="zhipu-ai-voice-input-method-stable-but-with-friction"&gt;Zhipu AI Voice Input Method: Stable but with Friction&lt;/h2&gt;
&lt;p&gt;I also thoroughly tested the &lt;a href="https://autoglm.zhipuai.cn/autotyper/" target="_blank" rel="noopener"&gt;Zhipu AI Voice Input Method&lt;/a&gt;.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/autoglm.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/autoglm.webp" alt="Figure 4: Zhipu Voice Input Method settings interface" data-caption="Figure 4: Zhipu Voice Input Method settings interface"
width="2430"
height="1824"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Zhipu Voice Input Method settings interface&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Its strengths include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;More stable for long-term continuous input&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Rarely becomes completely unresponsive&lt;/li&gt;
&lt;li&gt;Good tolerance for longer Chinese input&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But with frequent use, some issues stand out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Idle misrecognition&lt;/strong&gt;: If you press the shortcut but don&amp;rsquo;t speak, it may output random characters, disrupting your input flow&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Occasionally messy output&lt;/strong&gt;: Sometimes adds irrelevant words, making it less controllable than Miaoyan&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Basic recognition errors&lt;/strong&gt;: For example, &amp;ldquo;Zhipu&amp;rdquo; being recognized as &amp;ldquo;Zhipu&amp;rdquo; (with a different character), which is a trust issue for professional users&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature-heavy design&lt;/strong&gt;: Various tone and style features increase cognitive load&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="subscription-bundling-zhipus-practical-advantage"&gt;Subscription Bundling: Zhipu&amp;rsquo;s Practical Advantage&lt;/h2&gt;
&lt;p&gt;Although I prefer Miaoyan in terms of experience, &lt;strong&gt;Zhipu has a very practical advantage&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;If you already subscribe to Zhipu&amp;rsquo;s programming package, &lt;strong&gt;the voice input method is included for free&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No need to pay separately for the input method&lt;/li&gt;
&lt;li&gt;Lower psychological and decision-making cost&lt;/li&gt;
&lt;li&gt;More likely to become the &amp;ldquo;default tool&amp;rdquo; that stays&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a business perspective, this is a very smart strategy.&lt;/p&gt;
&lt;h2 id="main-comparison-table"&gt;Main Comparison Table&lt;/h2&gt;
&lt;p&gt;The following table compares the three products across key dimensions for quick reference.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Miaoyan&lt;/th&gt;
&lt;th&gt;Shandianshuo&lt;/th&gt;
&lt;th&gt;Zhipu AI Voice Input Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Response Speed&lt;/td&gt;
&lt;td&gt;Fast, nearly instant&lt;/td&gt;
&lt;td&gt;Usually fast (local-first)&lt;/td&gt;
&lt;td&gt;Slightly slower than Miaoyan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continuous Stability&lt;/td&gt;
&lt;td&gt;Stable&lt;/td&gt;
&lt;td&gt;Depends on setup and environment&lt;/td&gt;
&lt;td&gt;Very stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Idle Misrecognition&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;td&gt;Generally restrained (varies by version)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Obvious: outputs characters even if silent&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Cleanliness/Control&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;More like an &amp;ldquo;input tool&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Occasionally messy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer Differentiator&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Natural language → executable command&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local-first / optional enhancements&lt;/td&gt;
&lt;td&gt;Ecosystem-attached capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subscription &amp;amp; Cost&lt;/td&gt;
&lt;td&gt;Standalone, separate purchase&lt;/td&gt;
&lt;td&gt;Basic usable; enhancements often require setup/subscription&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Bundled free with programming package&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;My Current Preference&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Best experience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;More like a &amp;ldquo;foundation approach&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Easy to keep but not clean enough&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Core Comparison of Miaoyan, Shandianshuo, and Zhipu AI Voice Input Methods
&lt;/figcaption&gt;
&lt;h2 id="user-loyalty-to-ai-voice-input-methods"&gt;User Loyalty to AI Voice Input Methods&lt;/h2&gt;
&lt;p&gt;The switching cost for voice input methods is actually low: just a shortcut key and a habit of output.&lt;/p&gt;
&lt;p&gt;What really determines whether users stick around is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether the output is controllable&lt;/li&gt;
&lt;li&gt;Whether it keeps causing annoying minor issues&lt;/li&gt;
&lt;li&gt;Whether it integrates into your existing workflow and payment structure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For me personally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The best and smoothest experience is still Miaoyan&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The one most likely to stick around is probably Zhipu&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shandianshuo is more of a &amp;ldquo;foundation approach&amp;rdquo; and worth watching for how its enhancements evolve&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These points are not contradictory.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Miaoyan is more mature in &lt;strong&gt;engineering orientation, command capabilities, and input control&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Zhipu has practical advantages in &lt;strong&gt;stability and subscription bundling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Shandianshuo takes a &lt;strong&gt;local-first + optional enhancement&lt;/strong&gt; approach, with the key being how it balances &amp;ldquo;basic capability&amp;rdquo; and &amp;ldquo;enhancement cost&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Who truly becomes the &amp;ldquo;default gateway&amp;rdquo; depends on reducing distractions, fixing frequent minor issues, and treating voice input as true &amp;ldquo;infrastructure&amp;rdquo; rather than an add-on feature&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;The competition among AI voice input methods is no longer about recognition accuracy, but about who can own the shortcut key you press every day.&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title>From Spatial Data to AI Open Source: Technical Standards, Data Sovereignty, and the Global Divide</title><link>https://jimmysong.io/blog/spatial-data-ai-open-source-standards-sovereignty/</link><pubDate>Sun, 11 Jan 2026 03:29:28 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/spatial-data-ai-open-source-standards-sovereignty/</guid><description>How technical standards and data sovereignty shape AI open source paths and infrastructure competition in the global AI era.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The divide in technical standards and data sovereignty determines the global competitive landscape of infrastructure open source in the AI era.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this article, I will use the differences in air quality data presentation in Apple Maps and Weather as a starting point to explore how technical standards and data sovereignty influence the open source paths of AI in different countries. I will further analyze why, in the AI era, infrastructure-level open source has become the key battleground for ecosystem dominance.&lt;/p&gt;
&lt;h2 id="authors-note"&gt;Author&amp;rsquo;s Note&lt;/h2&gt;
&lt;p&gt;This article originates from a very everyday observation: Why is air quality data in China shown as &amp;ldquo;points&amp;rdquo; in Apple Maps and Weather, while in other countries it is often displayed as &amp;ldquo;areas&amp;rdquo;?&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/aqi-map.webp" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/aqi-map.webp" alt="Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries" data-caption="Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries"
width="1650"
height="1864"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;At first glance, it seems like a product experience difference. But when I reconsidered this issue in the context of engineering, standards, and system design, I realized it actually points to a much bigger question: how different countries understand the relationship between technology, standards, openness, and sovereignty.&lt;/p&gt;
&lt;p&gt;As an engineer who has long worked in cloud native, AI infrastructure, and open source ecosystems, I gradually realized that this difference is not limited to air quality or map data. In the AI era, it is further amplified, directly affecting how we open source models, build infrastructure, and whether we can participate in the formulation of global rules.&lt;/p&gt;
&lt;p&gt;Writing this article is not about judging right or wrong, but about using a concrete example to explain a structural difference and discuss the long-term impact and real opportunities this difference may bring in the AI era.&lt;/p&gt;
&lt;p&gt;What is especially important: at the level of AI infrastructure and infra-level open source, the competition has just begun. China is not without opportunities, but the choice of path will become more critical than ever.&lt;/p&gt;
&lt;h2 id="differences-in-air-quality-data-presentation-a-microcosm-of-technical-standards-and-sovereignty"&gt;Differences in Air Quality Data Presentation: A Microcosm of Technical Standards and Sovereignty&lt;/h2&gt;
&lt;p&gt;The following image illustrates the divide between spatial data, AI open source, and technical standards. By comparing how air quality data is presented in Apple Maps and Weather in different countries, you can intuitively feel the differences in technical standards and sovereignty strategies behind the scenes.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/banner.webp" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/banner.webp" alt="Figure 2: The divide between spatial data, AI open source, and technical standards" data-caption="Figure 2: The divide between spatial data, AI open source, and technical standards"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: The divide between spatial data, AI open source, and technical standards&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If you regularly use global products such as maps, weather, traffic, or various data services, you may notice a recurring phenomenon that is rarely discussed seriously: the way data is presented in China often differs significantly from global mainstream standards.&lt;/p&gt;
&lt;p&gt;A very intuitive example comes from the air quality display in Apple Maps or Weather. In China, air quality is usually shown as discrete points; in the US, Europe, Japan, and other countries, it is often rendered as continuous coverage areas.&lt;/p&gt;
&lt;p&gt;At first glance, this seems like a product experience difference, and may even lead people to mistakenly believe that &amp;ldquo;China&amp;rsquo;s data is incomplete.&amp;rdquo; But if you treat it as an engineering or system design issue, you will find: this is not a matter of data capability, but a different choice in technical standards, data sovereignty, and openness strategies.&lt;/p&gt;
&lt;p&gt;And this choice is not limited to air quality.&lt;/p&gt;
&lt;h2 id="air-quality-is-just-a-slice-greater-differences-in-spatial-public-data"&gt;Air Quality Is Just a Slice: Greater Differences in Spatial Public Data&lt;/h2&gt;
&lt;p&gt;Air quality is just a highly visible and relatively low-risk example. Similar differences have long existed in broader spatial and public data domains.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maps and coordinate systems&lt;/li&gt;
&lt;li&gt;Surveying and high-precision spatial data&lt;/li&gt;
&lt;li&gt;Real-time traffic and population movement&lt;/li&gt;
&lt;li&gt;Remote sensing, environmental, and urban operation data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In global mainstream systems, such data is usually regarded as public information infrastructure. It is standardized, gridded, API-ified, allows interpolation, modeling, and redistribution, and is widely used in research, business, and product innovation.&lt;/p&gt;
&lt;p&gt;In China, this data often takes another form: hierarchical, discrete, strictly defined, and with centralized interpretation authority.&lt;/p&gt;
&lt;p&gt;This is not a technical preference in a single field, but a systemic logic of technology and governance.&lt;/p&gt;
&lt;h2 id="three-global-paths"&gt;Three Global Paths&lt;/h2&gt;
&lt;p&gt;Placing China in a global context, we can see that there are roughly three different paths worldwide regarding &amp;ldquo;how public data and technical standards are opened.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Engineering-Open Type: Standards and Ecosystem First&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Represented by the US and some European countries, the core features of this system are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Public data prioritized as infrastructure&lt;/li&gt;
&lt;li&gt;Standards and interfaces come first&lt;/li&gt;
&lt;li&gt;Encourages engineering autonomy and ecosystem evolution&lt;/li&gt;
&lt;li&gt;Tolerates model inference and uncertainty&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This path directly shaped the global landscape of foundational software and infrastructure-level open source. Linux, Kubernetes, and the cloud native system are essentially products of openness at the rules layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Governance-Sovereignty Type: Control and Auditability First&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Represented by China, this path emphasizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sensitivity of spatial and public data&lt;/li&gt;
&lt;li&gt;Data as part of governance capability&lt;/li&gt;
&lt;li&gt;Standards, definitions, and release methods are highly bound&lt;/li&gt;
&lt;li&gt;Emphasizes traceability, accountability, and controllability&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this system, &amp;ldquo;point data&amp;rdquo; is not a sign of technological backwardness, but a governable technical form. When a technical system is designed as a governance system, its primary goal is not reusability, but controllability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compromise-Coordinated Type: Cautious Openness, Engineering Internationalization&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some countries try to find a balance between the two, maintaining caution in spatial data while being highly internationalized in engineering and industry. This shows that the difference is not about being advanced or backward, but about different objective functions.&lt;/p&gt;
&lt;p&gt;The following diagram compares the core characteristics, typical cases, and advantages/challenges of these three paths from a global perspective. The &amp;ldquo;Engineering-Open Type&amp;rdquo; on the left shapes the global infrastructure software landscape through standards and ecosystems; the &amp;ldquo;Governance-Sovereignty Type&amp;rdquo; in the middle emphasizes data sovereignty and security controllability but has limitations in influence at the rules layer; the &amp;ldquo;Compromise-Coordinated Type&amp;rdquo; on the right attempts to find a balance between security and openness. The divide between these three paths directly affects the infrastructure competition landscape of various countries in the AI era.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/global-three-paths-en.svg" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/global-three-paths-en.svg" alt="Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards" data-caption="Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards"
width="2663"
height="1862"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="the-essence-of-point-vs-area-in-air-quality"&gt;The Essence of &amp;ldquo;Point&amp;rdquo; vs. &amp;ldquo;Area&amp;rdquo; in Air Quality&lt;/h2&gt;
&lt;p&gt;Among all spatial public data, air quality is an ideal observation window:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does not directly involve military or core economic security&lt;/li&gt;
&lt;li&gt;Highly visible, updated daily, and perceptible to everyone&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;China does not lack air quality data; on the contrary, the density of monitoring stations is among the highest in the world. The real difference lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether interpolation is allowed&lt;/li&gt;
&lt;li&gt;Whether model inference is allowed&lt;/li&gt;
&lt;li&gt;Whether platforms are allowed to reinterpret the data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;ldquo;Point&amp;rdquo; means authenticity and traceability; &amp;ldquo;area&amp;rdquo; means models, inference, and redistribution of interpretive authority. This is precisely the watershed between technical standards and data sovereignty.&lt;/p&gt;
&lt;p&gt;The following diagram compares two different technical paths. The left side, &amp;ldquo;Governance-Sovereignty Type,&amp;rdquo; emphasizes data traceability and controllability, using discrete point-based data presentation. The right side, &amp;ldquo;Engineering-Open Type,&amp;rdquo; allows model interpolation and inference, providing more user-friendly experience through continuous area-based coverage. The essence of this difference lies not in the level of technical capability, but in the different choices made between data sovereignty, governance capability, and open ecosystems.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/data-sovereignty-comparison-en.svg" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/data-sovereignty-comparison-en.svg" alt="Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation" data-caption="Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation"
width="2263"
height="1562"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="the-amplification-effect-in-the-ai-era"&gt;The Amplification Effect in the AI Era&lt;/h2&gt;
&lt;p&gt;With the above logic in mind, many phenomena in the AI era become less confusing.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why are Chinese AI companies more willing to open source large language model (LLM) weights, while American companies have clearly shifted toward closed source in recent years?&lt;/li&gt;
&lt;li&gt;Why is foundational software and infrastructure-level open source still mainly led by the US?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key is not &amp;ldquo;whether to open source,&amp;rdquo; but &amp;ldquo;which layer is open sourced.&amp;rdquo;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model weights are static, declarable assets&lt;/li&gt;
&lt;li&gt;Infrastructure, runtimes, protocols, and standards are dynamic, evolving system rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Open sourcing weights is essentially openness at the asset layer; infrastructure-level open source means relinquishing control over operating rules and interpretive authority.&lt;/p&gt;
&lt;p&gt;The following diagram compares two different layers of AI open source. The left side shows &amp;ldquo;Model Weight Layer Open Source,&amp;rdquo; which is a typical feature of Chinese path—opening static digital assets with low cost and controllable risk, but not involving rule-making. The right side shows &amp;ldquo;Infrastructure Layer Open Source,&amp;rdquo; which is a core strategy of US path—by open sourcing development tools, protocol standards, runtimes, and compute scheduling and other infrastructure, defining how AI is used, thereby mastering ecosystem rules and interpretive authority. Key insight: Open sourcing model weights does not equal mastering AI ecosystem, and the real competitive focus is shifting to the infrastructure layer of &amp;ldquo;how AI runs.&amp;rdquo;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/ai-opensource-layers-en.svg" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/ai-opensource-layers-en.svg" alt="Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure" data-caption="Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure"
width="2363"
height="1862"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="the-us-approach-focusing-on-rules-and-runtime-layers"&gt;The US Approach: Focusing on Rules and Runtime Layers&lt;/h2&gt;
&lt;p&gt;In the past year or two, US-led AI open source and ecosystem initiatives have shown a highly consistent direction: not rushing to open source the strongest models, but focusing on defining &amp;ldquo;how AI is used.&amp;rdquo;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Linux Foundation established &lt;a href="https://aaif.io" target="_blank" rel="noopener"&gt;AAIF&lt;/a&gt; (Agentic AI Foundation), focusing on AI infrastructure, standards, and toolchain collaboration&lt;/li&gt;
&lt;li&gt;Protocols like MCP (Model Context Protocol) aim to define common interaction methods between agents and tools/systems&lt;/li&gt;
&lt;li&gt;Major tech companies are generally focusing on APIs, platforms, runtimes, and ecosystem binding&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The commonality of these actions: competing in model capability, but controlling the usage rules.&lt;/p&gt;
&lt;h2 id="chinas-shift-from-model-oriented-to-infrastructure-oriented"&gt;China&amp;rsquo;s Shift: From Model-Oriented to Infrastructure-Oriented&lt;/h2&gt;
&lt;p&gt;It is important to emphasize that this difference does not mean China is unaware of the issue.&lt;/p&gt;
&lt;p&gt;Whether in policy discussions or within industry and research institutions, the risk of &amp;ldquo;only open sourcing models without controlling infrastructure and standard dominance&amp;rdquo; has been repeatedly discussed.&lt;/p&gt;
&lt;p&gt;The real challenge lies in how to achieve a directional shift within the existing governance logic and risk framework. This shift has already appeared in some concrete practices.&lt;/p&gt;
&lt;h2 id="exploration-and-practice-at-the-infrastructure-layer"&gt;Exploration and Practice at the Infrastructure Layer&lt;/h2&gt;
&lt;p&gt;In the AI era, infrastructure often starts with the most engineering-driven problems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HAMi Project&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Projects like &lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; do not focus on model capability, but on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Abstraction, allocation, and isolation of GPU resources&lt;/li&gt;
&lt;li&gt;How multi-tenant AI workloads are run&lt;/li&gt;
&lt;li&gt;How computing power transitions from hardware assets to governable system resources&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The significance of such projects is not about being &amp;ldquo;SOTA,&amp;rdquo; but about entering the domain of &amp;ldquo;how AI runs.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI Runtime Reconstruction from a System Software Review&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Exploration at the research institution level is also noteworthy. The &lt;a href="https://www.flagos.io" target="_blank" rel="noopener"&gt;FlagOS&lt;/a&gt; initiative by the Beijing Academy of Artificial Intelligence is a clear signal: AI is being redefined as a system software issue, not just a model or algorithm problem.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Long-Term Tech Stack Investment by Industry Players&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In the industry, Huawei&amp;rsquo;s strategy reflects a similar direction: not simply open sourcing models, but attempting to build a complete, controllable AI tech stack, from computing power to frameworks, platforms, and ecosystems. This is a slower, heavier, but more infrastructure-competitive path.&lt;/p&gt;
&lt;h2 id="realistic-assessment-the-starting-point-of-ai-infrastructure-competition"&gt;Realistic Assessment: The Starting Point of AI Infrastructure Competition&lt;/h2&gt;
&lt;p&gt;Taking a longer view, we find an easily overlooked fact:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At the level of AI infrastructure and infra-level open source, there is no settled pattern between China and the US.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The US advantage lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mature engineering culture&lt;/li&gt;
&lt;li&gt;Standard organizations and foundation mechanisms&lt;/li&gt;
&lt;li&gt;High proficiency in openness at the rules layer&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;China&amp;rsquo;s variables include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Huge AI application scenarios&lt;/li&gt;
&lt;li&gt;Extreme demand for computing power and system efficiency&lt;/li&gt;
&lt;li&gt;Ongoing directional adjustments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The real uncertainty is not &amp;ldquo;whether we can catch up,&amp;rdquo; but whether it is possible to gradually open up space for engineering autonomy and standard co-construction while maintaining governance bottom lines.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The &amp;ldquo;points&amp;rdquo; and &amp;ldquo;areas&amp;rdquo; of air quality, model weights and the world of operations—behind these appearances lies not a simple technical route dispute, but how a country finds its own balance between openness, standards, and sovereignty.&lt;/p&gt;
&lt;p&gt;In the AI era, this issue will not disappear, but will become more concrete and more engineering-driven. And this is precisely where there are still opportunities for China&amp;rsquo;s AI infrastructure open source.&lt;/p&gt;</content:encoded></item><item><title>Joining Dynamia: Embarking on a New Journey in AI Native Infrastructure</title><link>https://jimmysong.io/blog/joining-dynamia/</link><pubDate>Wed, 07 Jan 2026 07:49:21 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/joining-dynamia/</guid><description>Joining Dynamia as Open Source Ecosystem VP to drive AI-native infrastructure ecosystem development, transforming compute from hardware consumption to core asset.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Compute governance is the critical bottleneck for AI scaling. From hardware consumption to core asset, this long-undervalued path needs to be redefined.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/joining-dynamia/banner.webp" data-img="https://assets.jimmysong.io/images/blog/joining-dynamia/banner.webp" alt="Figure 1: Dynamia.ai" data-caption="Figure 1: Dynamia.ai"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Dynamia.ai&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="a-new-beginning"&gt;A New Beginning&lt;/h2&gt;
&lt;p&gt;I have officially joined &lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia&lt;/a&gt; as &lt;strong&gt;Open Source Ecosystem VP&lt;/strong&gt;, responsible for the long-term development of the company in open source, technical narrative, and &lt;strong&gt;AI Native Infrastructure&lt;/strong&gt; ecosystem directions.&lt;/p&gt;
&lt;h2 id="why-i-chose-dynamia"&gt;Why I Chose Dynamia&lt;/h2&gt;
&lt;p&gt;I chose to join Dynamia not because it&amp;rsquo;s a company trying to &amp;ldquo;solve all AI problems,&amp;rdquo; but precisely the opposite—it&amp;rsquo;s because Dynamia &lt;strong&gt;focuses intensely on one unavoidable, yet long-undervalued core issue in AI Native Infrastructure&lt;/strong&gt;: compute, especially &lt;strong&gt;Graphics Processing Units&lt;/strong&gt; (GPU), are evolving from &amp;ldquo;technical resources&amp;rdquo; into infrastructure elements that require refined governance and economic management.&lt;/p&gt;
&lt;p&gt;Through years of practice in cloud native, distributed systems, and AI infrastructure (AI Infra), I&amp;rsquo;ve formed a clear judgment: as Large Language Models (LLM) and &lt;strong&gt;AI Agents&lt;/strong&gt; enter the stage of large-scale deployment, the real bottleneck limiting system scalability and sustainability is no longer just model capability itself, but how compute is measured, allocated, isolated, and scheduled, and how a governable, accountable, and optimizable operational mechanism is formed at the system level. From this perspective, the core challenge of AI infrastructure is essentially evolving into a &amp;ldquo;resource governance and Token economy&amp;rdquo; problem.&lt;/p&gt;
&lt;h2 id="about-dynamia-and-hami"&gt;About Dynamia and HAMi&lt;/h2&gt;
&lt;p&gt;Dynamia is an AI-native infrastructure technology company rooted in open source DNA, driving efficiency leaps in heterogeneous compute through technological innovation. Its leading open source project, &lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; (Heterogeneous AI Computing Virtualization Middleware), is a &lt;strong&gt;Cloud Native Computing Foundation&lt;/strong&gt; (CNCF) sandbox project providing GPU, NPU and other heterogeneous device virtualization, sharing, isolation, and topology-aware scheduling capabilities, widely adopted by 50+ enterprises and institutions.&lt;/p&gt;
&lt;h2 id="dynamias-technical-approach"&gt;Dynamia&amp;rsquo;s Technical Approach&lt;/h2&gt;
&lt;p&gt;In this context, Dynamia&amp;rsquo;s technical approach—starting from &lt;strong&gt;the GPU layer, which is the most expensive, scarcest, and least unified abstraction layer in AI systems&lt;/strong&gt;, treating compute as a foundational resource that can be measured, partitioned, scheduled, governed, and even &amp;ldquo;tokenized&amp;rdquo; for refined accounting and optimization—aligns highly with my long-term judgment on AI-native infrastructure.&lt;/p&gt;
&lt;p&gt;This path doesn&amp;rsquo;t use &amp;ldquo;model capabilities&amp;rdquo; or &amp;ldquo;application innovation&amp;rdquo; as selling points in the short term, nor is it easily packaged into simple stories. However, with rising compute costs, heterogeneous accelerators becoming the norm, and AI systems moving toward multi-tenant and large-scale operations, these infrastructure-level capabilities are gradually becoming prerequisites for the establishment and expansion of AI systems.&lt;/p&gt;
&lt;h2 id="future-focus"&gt;Future Focus&lt;/h2&gt;
&lt;p&gt;As Dynamia&amp;rsquo;s Open Source Ecosystem VP, I will focus on &lt;strong&gt;technical narrative of AI-native infrastructure, open source ecosystem building, and global developer collaboration&lt;/strong&gt;, promoting compute from &amp;ldquo;hardware resource being consumed&amp;rdquo; to &lt;strong&gt;governable, measurable, and optimizable AI infrastructure core asset&lt;/strong&gt;, laying the foundation for the scaling and sustainable evolution of AI systems in the next stage.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Joining Dynamia is an important milestone in my career and a concrete action demonstrating my long-term optimism about AI-native infrastructure. Compute governance is not a short-term trend that yields quick results, but an infrastructure proposition that cannot be bypassed for AI large-scale deployment. I look forward to exploring, building, and landing solutions on this long-undervalued path with global developers.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi - Heterogeneous AI Computing Virtualization Middleware (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Running Parallel AI Agents on My Mac: Hands-On with Verdent's Standalone App</title><link>https://jimmysong.io/blog/verdent-standalone-app-parallel-agents/</link><pubDate>Sun, 04 Jan 2026 02:25:48 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/verdent-standalone-app-parallel-agents/</guid><description>A hands-on experience with Verdent&amp;#39;s standalone Mac app, exploring how parallel AI agents, isolated workspaces, and task-oriented workflows change real-world development.</description><content:encoded>
&lt;p&gt;I&amp;rsquo;ve been spending more time recently experimenting with vibe coding tools on real projects, not demos. One of those projects is my own website, where I constantly tweak content structure, navigation, and layout.&lt;/p&gt;
&lt;p&gt;During this process, I started using &lt;a href="https://verdent.ai" target="_blank" rel="noopener"&gt;Verdent&amp;rsquo;s standalone Mac app&lt;/a&gt; more seriously. What stood out was not any single feature, but how different the experience felt compared to traditional AI coding tools.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-standalone-app-ui.webp" data-img="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-standalone-app-ui.webp" alt="Figure 1: Verdent Standalone App UI" data-caption="Figure 1: Verdent Standalone App UI"
width="3836"
height="2240"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Verdent Standalone App UI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Verdent doesn&amp;rsquo;t behave like an assistant waiting for instructions. It behaves more like an environment where work happens in parallel.&lt;/p&gt;
&lt;h2 id="a-different-starting-point-tasks-not-chats"&gt;A Different Starting Point: Tasks, Not Chats&lt;/h2&gt;
&lt;p&gt;Most AI coding tools begin with a conversation. Verdent begins with &lt;strong&gt;tasks&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When I opened my website repository in the Verdent app, I didn&amp;rsquo;t start with a long prompt. I created multiple tasks directly: one to rethink navigation and SEO structure, another to explore homepage layout improvements, and a third to review existing content organization.&lt;/p&gt;
&lt;p&gt;Each task immediately spun up its own agent and workspace. From the beginning, the app encouraged me to think in parallel, the same way I normally would when sketching ideas on paper or jumping between files.&lt;/p&gt;
&lt;p&gt;This framing alone changes how you work.&lt;/p&gt;
&lt;h2 id="built-for-multitasking-without-losing-context"&gt;Built for Multitasking, Without Losing Context&lt;/h2&gt;
&lt;p&gt;Switching contexts is unavoidable in real development work. What usually breaks is continuity.&lt;/p&gt;
&lt;p&gt;Verdent handles this well. Each task preserves its full context independently. I could stop one task mid-way, switch to another, and come back later without re-explaining the problem or reloading files.&lt;/p&gt;
&lt;p&gt;For example, while one agent was analyzing my site&amp;rsquo;s navigation structure, another was exploring layout options. I moved between them freely. Nothing was lost. Each agent remembered exactly what it was doing.&lt;/p&gt;
&lt;p&gt;This feels closer to how developers think than how chat-based tools operate.&lt;/p&gt;
&lt;h2 id="safe-parallel-coding-with-workspaces"&gt;Safe Parallel Coding with Workspaces&lt;/h2&gt;
&lt;p&gt;Parallel work only becomes truly safe when code changes are isolated. When parallelism moves from discussion to actual code modification, risk management becomes essential.&lt;/p&gt;
&lt;p&gt;Verdent solves this with &lt;strong&gt;Workspaces&lt;/strong&gt;. Each workspace is an isolated, independent code environment with its own change history, commit log, and branches. This isn&amp;rsquo;t just about separation—it&amp;rsquo;s about making concurrent code changes manageable.&lt;/p&gt;
&lt;p&gt;What this means in practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple tasks can write code simultaneously&lt;/li&gt;
&lt;li&gt;Changes remain isolated from each other&lt;/li&gt;
&lt;li&gt;If conflicts arise, they&amp;rsquo;re visible and cleanly resolvable&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I intentionally let different agents operate on overlapping parts of my project: one modifying Markdown content and links, another adjusting CSS and layout logic. Both ran in parallel. No conflicts emerged. Later, I reviewed the diffs from each workspace and merged only what made sense.&lt;/p&gt;
&lt;p&gt;This kind of isolation removes significant anxiety from AI-assisted coding. You stop worrying about breaking things and start experimenting more freely, knowing that each change exists in its own contained environment.&lt;/p&gt;
&lt;h2 id="parallel-agent-execution-feels-like-delegation"&gt;Parallel Agent Execution Feels Like Delegation&lt;/h2&gt;
&lt;p&gt;Parallelism doesn’t mean that all agents complete the same phase of work at the same time—instead, by isolating and overlapping phases, what was once a strictly sequential process is compressed into a more efficient, collaborative mode.&lt;/p&gt;
&lt;p&gt;In Verdent, each agent runs in its own workspace, essentially an automatically managed branch or worktree. In practice, I often create multiple tasks with different responsibilities for the same requirement, such as planning, implementation, and review. But this doesn’t mean they all complete the same phase simultaneously.&lt;/p&gt;
&lt;p&gt;These tasks are triggered as needed, each running for a period and producing clear artifacts as boundaries for collaboration. The planning task generates planning documents or constraint specifications; the implementation task advances code changes based on those documents and produces diffs; the review task, according to the established planning goals and audit criteria, performs staged reviews of the generated changes. By overlapping phases around artifacts, the originally strict sequential process is compressed into a workflow that more closely resembles team collaboration.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The value of splitting into multiple tasks is not parallel execution, but parallel cognition and clear collaboration boundaries.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While it’s technically possible to put multiple roles into a single task, this causes planning, implementation, and review to share the same context, which weakens role isolation and the auditability of results.&lt;/p&gt;
&lt;h2 id="configurability-and-design-trade-offs"&gt;Configurability and Design Trade-offs&lt;/h2&gt;
&lt;p&gt;Beyond the workflow model itself, Verdent exposes a surprisingly rich set of configurable capabilities.&lt;/p&gt;
&lt;p&gt;It allows users to customize MCP settings, define subagents with configurable prompts, and create reusable commands via slash (&lt;code&gt;/&lt;/code&gt;) shortcuts. Personal rules can be written to influence agent behavior and response style, and command-level permissions can be configured to enforce basic security boundaries. Verdent also supports multiple mainstream foundation models, including GPT, Claude, Gemini, and K2. For users who prefer a lightweight coding experience without a full IDE, Verdent offers DiffLens as an alternative review-oriented interface. Both &lt;a href="https://www.verdent.ai/pricing" target="_blank" rel="noopener"&gt;subscription-based and credit-based pricing models&lt;/a&gt; are supported.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-settings.webp" data-img="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-settings.webp" alt="Figure 2: Verdent Settings" data-caption="Figure 2: Verdent Settings"
width="2780"
height="1648"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Verdent Settings&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;That said, Verdent makes a clear set of trade-offs. It is not built around tab-based code completion, nor does it offer a plugin system. If it did, it would start to resemble a traditional IDE - which does not seem to be its goal. Verdent is not designed for direct, fine-grained code manipulation; most changes are mediated through conversational tasks and agent-driven edits. This makes the experience clean and focused, but it also means that for large, highly complex codebases, Verdent may function better as a complementary orchestration layer rather than a full-time development environment.&lt;/p&gt;
&lt;h2 id="where-verdent-fits-today"&gt;Where Verdent Fits Today&lt;/h2&gt;
&lt;p&gt;There are many AI-assisted coding tools emerging right now. Some focus on smarter editors, others on faster generation.&lt;/p&gt;
&lt;p&gt;Verdent feels different because it focuses on &lt;strong&gt;orchestration&lt;/strong&gt;, not just assistance.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t try to replace your editor. It sits one level above, coordinating planning, execution, and review across multiple agents.&lt;/p&gt;
&lt;p&gt;That makes it particularly suitable for exploratory work, refactoring, and early-stage design - exactly the kind of work I was doing on my website.&lt;/p&gt;
&lt;h2 id="final-thoughts"&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;Using Verdent&amp;rsquo;s standalone app didn&amp;rsquo;t just speed things up. It changed how I structured work.&lt;/p&gt;
&lt;p&gt;Instead of doing everything sequentially, I started thinking in parallel again - and letting the system support that way of thinking.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://verdent.ai" target="_blank" rel="noopener"&gt;Verdent&lt;/a&gt; feels less like an AI feature and more like an environment that assumes AI is already part of how development happens.&lt;/p&gt;
&lt;p&gt;For developers experimenting with AI-native workflows, that shift is worth paying attention to.&lt;/p&gt;</content:encoded></item><item><title>2025 Annual Review: The Transformation Journey from Cloud Native to AI Native</title><link>https://jimmysong.io/blog/2025-annual-review/</link><pubDate>Wed, 31 Dec 2025 10:02:01 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/2025-annual-review/</guid><description>A look back at the major changes in 2025: shifting from Cloud Native to AI Native Infrastructure, AI tool ecosystem, and major website improvements.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The waves of technology keep evolving; only by actively embracing change can we continue to create value. In 2025, I chose to move from Cloud Native to AI Native—this year marked a key turning point for personal growth and system reinvention.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;2025 was a turning point for me. This year, I not only changed my technical direction but also the way I approach problems. Moving from Cloud Native infrastructure to AI Native Infrastructure was not just a migration of content, but an upgrade in mindset.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/2025-annual-review/banner.webp" data-img="https://assets.jimmysong.io/images/blog/2025-annual-review/banner.webp" alt="Figure 1: Farewell 2025!" data-caption="Figure 1: Farewell 2025!"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Farewell 2025!&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This year, I conducted a large-scale refactoring of the website and systematically organized the content. Beyond the technical improvements, I want to share my thoughts and changes throughout the year.&lt;/p&gt;
&lt;h2 id="a-bold-shift-embracing-the-ai-native-era"&gt;A Bold Shift: Embracing the AI Native Era&lt;/h2&gt;
&lt;p&gt;At the beginning of 2025, I made an important decision: to reposition myself from a Cloud Native Evangelist to an AI Infrastructure Architect. This was not just a change in title, but a strategic transformation after careful consideration.&lt;/p&gt;
&lt;p&gt;As I witnessed the surge of AI technologies and the rise of Agent-based applications reshaping software, I realized that clinging to the boundaries of Cloud Native might mean missing an era. So, I systematically adjusted the website’s content structure, shifting the focus toward AI Native Infrastructure.&lt;/p&gt;
&lt;p&gt;This transformation was not about abandoning the past, but extending forward from the foundation of Cloud Native. Classic content like Kubernetes and Istio remains and is continuously updated, but new topics such as AI Agent and the AI OSS landscape have been added, forming a more complete knowledge map.&lt;/p&gt;
&lt;h2 id="content-creation-from-technical-details-to-ecosystem-perspective"&gt;Content Creation: From Technical Details to Ecosystem Perspective&lt;/h2&gt;
&lt;h3 id="ai-agent-building-systematic-knowledge"&gt;AI Agent: Building Systematic Knowledge&lt;/h3&gt;
&lt;p&gt;Agents represent a major evolution in software for the AI era. When I tried to understand Agent design principles, I found fragmented information everywhere but lacked a systematic knowledge base.&lt;/p&gt;
&lt;p&gt;So I created content that analyzes the Agent context lifecycle and control loop mechanisms, summarizing several proven architectural patterns. To make complex knowledge easier to digest, I organized it into logical sections so readers can learn step by step.&lt;/p&gt;
&lt;h3 id="ai-tool-ecosystem-mapping-the-open-source-landscape"&gt;AI Tool Ecosystem: Mapping the Open Source Landscape&lt;/h3&gt;
&lt;p&gt;AI tools and frameworks are emerging rapidly, with new projects appearing daily. To help readers quickly grasp the ecosystem, I built a comprehensive AI OSS database.&lt;/p&gt;
&lt;p&gt;This database covers everything from Agent frameworks to development tools and deployment services. I not only included active projects but also established an archive mechanism, preserving detailed information on over 150 historical projects. More importantly, I developed a scoring system to objectively evaluate projects across dimensions like quality and sustainability, helping readers decide which tools are worth investing time in.&lt;/p&gt;
&lt;h3 id="blogging-capturing-technology-trends-faster"&gt;Blogging: Capturing Technology Trends Faster&lt;/h3&gt;
&lt;p&gt;In 2025, I wrote over 120 blog posts. Compared to previous years, these articles focused more on observing and reflecting on technology trends, rather than just technical tutorials.&lt;/p&gt;
&lt;p&gt;I started paying attention to deeper questions: How will AI infrastructure evolve? What does Beijing’s open source initiative mean for the AI industry? What ripple effects might a tech acquisition trigger? These articles allowed me and my readers to not only see &amp;ldquo;what&amp;rdquo; technology is, but also &amp;ldquo;why&amp;rdquo; and &amp;ldquo;what’s next.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="user-experience-making-knowledge-easier-to-discover-and-consume"&gt;User Experience: Making Knowledge Easier to Discover and Consume&lt;/h2&gt;
&lt;p&gt;No matter how good the content is, if it can’t be easily found and read, its value is greatly diminished. In 2025, I invested significant effort into website functionality, with one goal: to provide readers with a smoother reading experience.&lt;/p&gt;
&lt;h3 id="comprehensive-search-upgrade"&gt;Comprehensive Search Upgrade&lt;/h3&gt;
&lt;p&gt;As the volume of content grew, the original search function could no longer meet demand. I redesigned the search system to support fuzzy search and result scoring, and optimized index loading performance. More importantly, the new search interface is more user-friendly, supporting keyboard navigation and category filtering so users can find what they want faster.&lt;/p&gt;
&lt;h3 id="multi-device-experience-optimization"&gt;Multi-Device Experience Optimization&lt;/h3&gt;
&lt;p&gt;Mobile reading experience has improved significantly. I refactored the mobile navigation and table of contents, making reading on phones much smoother. Dark mode is now more refined, fixing several display issues and ensuring images and diagrams look good on dark backgrounds.&lt;/p&gt;
&lt;h3 id="efficiency-revolution-in-content-distribution"&gt;Efficiency Revolution in Content Distribution&lt;/h3&gt;
&lt;p&gt;A major change was optimizing the WeChat Official Account publishing workflow. Previously, publishing website content to WeChat required manual handling of many details; now, it’s almost one-click export. This workflow automatically processes images, metadata, styles, and all details, reducing a half-hour task to just a few minutes.&lt;/p&gt;
&lt;p&gt;Additionally, I added a glossary feature for technical term highlighting and tooltips; improved SEO and social sharing metadata; and cleaned up outdated content. These seemingly minor improvements quietly enhance the user experience.&lt;/p&gt;
&lt;h2 id="content-evolution-more-dimensional-knowledge-expression"&gt;Content Evolution: More Dimensional Knowledge Expression&lt;/h2&gt;
&lt;p&gt;Looking back at content creation in 2025, I found clear changes in several dimensions.&lt;/p&gt;
&lt;h3 id="from-tutorials-to-observations"&gt;From Tutorials to Observations&lt;/h3&gt;
&lt;p&gt;Early content leaned toward technical tutorials and practical guides, showing &amp;ldquo;how to do.&amp;rdquo; This year, I focused more on &amp;ldquo;why&amp;rdquo; and &amp;ldquo;what are the trends.&amp;rdquo; I wrote more technology trend analyses, ecosystem maps, and in-depth case studies. These may not directly teach you how to use an API, but they help you understand the direction of technological evolution.&lt;/p&gt;
&lt;h3 id="from-chinese-to-bilingual"&gt;From Chinese to Bilingual&lt;/h3&gt;
&lt;p&gt;AI is a global wave and cannot be limited to the Chinese-speaking world. In 2025, I wrote bilingual documentation for almost all new AI tools, and important blog posts also have English versions. This increased the workload, but allowed the content to reach a broader audience.&lt;/p&gt;
&lt;h3 id="from-text-to-multimedia"&gt;From Text to Multimedia&lt;/h3&gt;
&lt;p&gt;Text is efficient, but not all knowledge is best expressed in words. This year, I used many architecture and schematic diagrams to explain complex concepts, adding 59 new charts. These visual elements lower the barrier to understanding, making abstract concepts more intuitive. I also optimized image display in dark mode to ensure consistent visual experience.&lt;/p&gt;
&lt;h2 id="development-approach-embracing-ai-assisted-programming"&gt;Development Approach: Embracing AI-Assisted Programming&lt;/h2&gt;
&lt;p&gt;2025 was not only a year of shifting content themes toward AI, but also a year of deep practice in AI-assisted programming.&lt;/p&gt;
&lt;p&gt;I developed a VS Code plugin and created many prompts to automate repetitive tasks. I experimented with various AI programming tools and settled on a toolchain that suits me. I even migrated the website to Cloudflare Pages and used its edge computing services to develop a chatbot. These practices greatly improved development efficiency, giving me more time to focus on thinking and creating rather than mechanical coding.&lt;/p&gt;
&lt;p&gt;This made me realize: AI will not replace developers, but developers who use AI well will replace those who do not. I also shared more insights to help others master AI-assisted programming.&lt;/p&gt;
&lt;h2 id="looking-ahead-to-2026-keep-moving-forward"&gt;Looking Ahead to 2026: Keep Moving Forward&lt;/h2&gt;
&lt;p&gt;Looking back at 2025, the site underwent a profound transformation—from a Cloud Native tech blog to an AI infrastructure knowledge base. But this is just the beginning, not the end.&lt;/p&gt;
&lt;p&gt;Looking forward to 2026, I plan to continue deepening in several areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Enhancing the knowledge system&lt;/strong&gt;: Continue to supplement GPU infrastructure and AI Agent content, especially practical cases and performance tuning knowledge.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tracking ecosystem evolution&lt;/strong&gt;: AI tools and frameworks iterate rapidly; I need to keep up with this fast-changing ecosystem and update content in a timely manner.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deepening engineering practice&lt;/strong&gt;: Share more practical AI engineering experience to help readers turn theory into practice.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exploring knowledge connections&lt;/strong&gt;: Consider building a knowledge graph to connect different content sections, providing smarter navigation and recommendations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;2025 was a year of change and growth. From Cloud Native to AI Native, from technical practice to ecosystem observation, both the content and functionality of the site have made qualitative leaps.&lt;/p&gt;
&lt;p&gt;What makes me happiest is that this transformation allowed me and my readers to stand at the forefront of the technology wave. We are not just learning new technologies, but thinking about how technology changes the world and the way we write software.&lt;/p&gt;
&lt;p&gt;The waves of technology keep evolving; only by actively embracing change can we continue to create value. Thank you to every reader for your companionship and support. I look forward to sharing more insights and practices in 2026.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Further Reading&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/ai/"&gt;AI OSS Landscape&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/"&gt;2025 Blog Posts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>The Butterfly Effect After Manus Was Acquired by Meta</title><link>https://jimmysong.io/blog/manus-meta-acquisition-butterfly-effect/</link><pubDate>Tue, 30 Dec 2025 03:30:51 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/manus-meta-acquisition-butterfly-effect/</guid><description>Manus&amp;#39;s acquisition by Meta sparked polarized opinions. This article explores the butterfly effect in AI applications and key lessons for entrepreneurs on growth strategies.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The success or failure of AI applications often lies not in the technology itself, but in the ability to scale delivery and create a closed loop.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/manus-meta-acquisition-butterfly-effect/banner.webp" data-img="https://assets.jimmysong.io/images/blog/manus-meta-acquisition-butterfly-effect/banner.webp" alt="Figure 1: The Butterfly Effect After Manus Was Acquired by Meta" data-caption="Figure 1: The Butterfly Effect After Manus Was Acquired by Meta"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: The Butterfly Effect After Manus Was Acquired by Meta&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="when-those-who-discuss-it-are-not-those-who-pay-for-it"&gt;When &amp;ldquo;Those Who Discuss It&amp;rdquo; Are Not &amp;ldquo;Those Who Pay for It&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;On December 30, 2025, a piece of news went viral: Manus was acquired by Meta for billions of dollars (&lt;a href="https://manus.im/blog/manus-joins-meta-for-next-era-of-innovation" target="_blank" rel="noopener"&gt;Manus Joins Meta for Next Era of Innovation&lt;/a&gt;). This startup, founded in China and under pressure from tech giants since its inception, completed a whirlwind journey in less than a year—from explosive growth, relocating to Singapore, to being acquired by a global giant.&lt;/p&gt;
&lt;p&gt;According to Manus&amp;rsquo;s official statement, its products and subscriptions will continue to be available via the app and website, and the company will remain operational in Singapore. The team will join Meta to provide general Agent capabilities for Meta&amp;rsquo;s consumer and enterprise products (including Meta AI).&lt;/p&gt;
&lt;p&gt;Rather than focusing on &amp;ldquo;who won,&amp;rdquo; I&amp;rsquo;m more interested in the chain reaction this event triggered: it activated completely opposite judgment systems among different groups, and this split is reshaping the growth paths and strategies for AI applications and startups.&lt;/p&gt;
&lt;h2 id="two-public-opinion-arenas-blessings-and-doubts-coexist"&gt;Two Public Opinion Arenas: Blessings and Doubts Coexist&lt;/h2&gt;
&lt;p&gt;After Manus was acquired, the mainstream sentiment in social circles was one of congratulations and excitement. Many saw it as a stellar example of a Chinese team going global—achieving remarkable results in the most competitive field in a very short time.&lt;/p&gt;
&lt;p&gt;Meanwhile, the comment sections of public accounts became &amp;ldquo;venting valves for counter-narratives,&amp;rdquo; with skepticism centering on three main points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether the technology has real barriers (e.g., &amp;ldquo;there are countless similar products,&amp;rdquo; &amp;ldquo;it&amp;rsquo;s not hard for big companies to build their own&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Valuation and bubble concerns (e.g., &amp;ldquo;another case of the AI bubble&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Distrust in the buyer&amp;rsquo;s judgment (e.g., &amp;ldquo;giants making desperate bets,&amp;rdquo; &amp;ldquo;history repeating itself&amp;rdquo;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This divergence isn&amp;rsquo;t about who understands AI better, but about different evaluation frameworks: social circles focus on &amp;ldquo;trajectory and outcome,&amp;rdquo; while comment sections focus on &amp;ldquo;legitimacy and worthiness.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="where-does-the-100m-arr-come-from-the-target-users-arent-in-our-social-circles"&gt;Where Does the $100M ARR Come From: The Target Users Aren&amp;rsquo;t in Our Social Circles&lt;/h2&gt;
&lt;p&gt;Many people are impressed by Manus&amp;rsquo;s marketing buzz and controversies, which can lead to skepticism. But if it achieved a &amp;ldquo;strict $100M ARR&amp;rdquo; in 10 months, one fact is clear: &lt;strong&gt;its revenue doesn&amp;rsquo;t depend on broad consensus, but comes from a highly concentrated group of global users with strong willingness to pay.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Manus&amp;rsquo;s core user profile is closer to &amp;ldquo;individuals as production units,&amp;rdquo; including freelancers, indie developers, independent researchers, and key deliverers in small and medium businesses. They don&amp;rsquo;t care about debates over &amp;ldquo;wrapping&amp;rdquo; or not; they care about &amp;ldquo;can I deliver end-to-end tasks,&amp;rdquo; and &amp;ldquo;can this help me hire one less person, work fewer late nights, or avoid juggling ten tools.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This leads to a counterintuitive phenomenon: &lt;strong&gt;those who discuss the most may not pay, while those who pay steadily are often silent.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For these users, tools are not identity badges—they are profit levers.&lt;/p&gt;
&lt;h2 id="three-lessons-for-entrepreneurs-the-growth-paradigm-in-the-ai-application-era-has-changed"&gt;Three Lessons for Entrepreneurs: The Growth Paradigm in the AI Application Era Has Changed&lt;/h2&gt;
&lt;p&gt;Based on the above, the Manus case offers three lessons for entrepreneurs:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Growth No Longer Equals Positive Reviews&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;AI applications can commercialize first and build consensus later. Public opinion can remain divided for a long time, but cash flow doesn&amp;rsquo;t wait for unified recognition.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Heavy Marketing&amp;rdquo; Is Becoming a Capability, Not a Stigma&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As foundational models and capabilities spread rapidly, differentiation is quickly erased. Being seen, understood, and paid for is itself part of the moat. Not all marketing deserves respect, but &amp;ldquo;distribution and mindshare&amp;rdquo; have become unavoidable battlegrounds for AI applications.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Globalization Is No Longer a Bonus, but May Be a Survival Strategy&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From payment willingness, compliance boundaries, talent density to valuation systems, market structure means many teams &amp;ldquo;can only complete the loop overseas.&amp;rdquo; It&amp;rsquo;s not romantic, but it&amp;rsquo;s reality.&lt;/p&gt;
&lt;h2 id="a-personal-reflection"&gt;A Personal Reflection&lt;/h2&gt;
&lt;p&gt;As someone long engaged in cloud native and AI infrastructure, I&amp;rsquo;m used to evaluating products by their &amp;ldquo;technical barriers.&amp;rdquo; But cases like Manus remind me: at the AI application layer, barriers may not first appear in models or code, but often in &lt;strong&gt;organizational speed, productization capability, delivery loop, and distribution efficiency&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When a system can reliably turn &amp;ldquo;capability&amp;rdquo; into &amp;ldquo;results,&amp;rdquo; it has built a commercial moat—even if its tech stack doesn&amp;rsquo;t meet outsiders&amp;rsquo; ideals of &amp;ldquo;purity.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The biggest butterfly effect of Manus being acquired by Meta may not be the deal itself, but making more entrepreneurs realize: &lt;strong&gt;in the AI era, the winning move is shifting from &amp;ldquo;what model you use&amp;rdquo; to &amp;ldquo;whether you can deliver results at scale.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The acquisition of Manus by Meta is not just a convergence of capital and technology, but also a microcosm of the changing growth paradigm in the AI application era. For entrepreneurs, understanding and mastering &amp;ldquo;user structure,&amp;rdquo; &amp;ldquo;distribution capability,&amp;rdquo; and &amp;ldquo;global closed loops&amp;rdquo; will be key to future competition.&lt;/p&gt;</content:encoded></item><item><title>AI Infra Open Source in China: Analysis of Beijing and Shanghai's Plans</title><link>https://jimmysong.io/blog/beijing-open-source-plan-ai-infra-analysis/</link><pubDate>Thu, 25 Dec 2025 10:01:13 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/beijing-open-source-plan-ai-infra-analysis/</guid><description>Beijing and Shanghai&amp;#39;s open source plans reveal opportunities and challenges for China&amp;#39;s AI infrastructure, balancing technology and governance.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Institutionalized open source marks a new starting point for China&amp;rsquo;s AI Infra, but true breakthroughs and risks lie in the engineering and governance details.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="perspective-on-beijing-and-shanghais-open-source-plans"&gt;Perspective on Beijing and Shanghai&amp;rsquo;s Open Source Plans&lt;/h2&gt;
&lt;p&gt;Using the simultaneous release of open source ecosystem plans by Beijing and Shanghai as a lens, and drawing on China&amp;rsquo;s past foundation practices and international open source governance experience, this article explores the real opportunities, structural constraints, and potential risks as AI Infrastructure (AI Infra, Artificial Intelligence Infrastructure) enters a new phase of institutionalized open source.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/beijing-open-source-plan-ai-infra-analysis/banner.webp" data-img="https://assets.jimmysong.io/images/blog/beijing-open-source-plan-ai-infra-analysis/banner.webp" alt="Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans" data-caption="Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="why-compare-beijing-and-shanghai-together"&gt;Why Compare Beijing and Shanghai Together&lt;/h2&gt;
&lt;p&gt;It is rare for me to write an article solely because of a local policy document. However, during Christmas, both Beijing and Shanghai&amp;rsquo;s Bureaus of Economy and Information Technology released their respective open source ecosystem construction plans:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://mp.weixin.qq.com/s/9YEL1HORWatsol3nRT596w" target="_blank" rel="noopener"&gt;Building an Open Source Innovation Highland! Beijing Releases Open Source Ecosystem Construction Implementation Plan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mp.weixin.qq.com/s/QZl66fUllKiePwQ7euhiGQ" target="_blank" rel="noopener"&gt;Shanghai&amp;rsquo;s Implementation Plan for Strengthening the Open Source System | Infographic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This time, the fact that both cities released their plans on the same day sends a signal worth serious attention: China is attempting to advance open source in a more systematic and institutionalized way, especially regarding open source capabilities related to AI Infra.&lt;/p&gt;
&lt;p&gt;If you only look at Beijing&amp;rsquo;s plan, it is easy to interpret it as a local industrial policy upgrade. But when you consider both Beijing and Shanghai&amp;rsquo;s plans together, it looks more like a clearly defined &amp;ldquo;dual-center structure.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The question is no longer whether to develop open source, but:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the AI era, what institutional forms, engineering paths, and governance models will open source take?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="open-source-as-industrial-infrastructure-engineering"&gt;Open Source as &amp;ldquo;Industrial Infrastructure Engineering&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Both Beijing and Shanghai&amp;rsquo;s plans reflect a highly consistent judgment:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Open source is no longer seen as a spontaneous community activity, but as an industrial infrastructure capability that requires systematic construction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is especially evident in the field of AI Infra.&lt;/p&gt;
&lt;p&gt;Issues such as computing power scheduling, model evaluation, toolchains, data elements, license compliance, and supply chain security—previously hidden in &amp;ldquo;engineering details&amp;rdquo;—are now systematically incorporated into policy language for the first time. This at least shows that decision-makers have realized:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI competition is not only about model parameter scale&lt;/li&gt;
&lt;li&gt;It is even more about toolchains, infrastructure, evaluation systems, and engineering capabilities&lt;/li&gt;
&lt;li&gt;These capabilities are naturally more suitable for building public foundations through open source&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this respect, Beijing and Shanghai are highly aligned.&lt;/p&gt;
&lt;h2 id="two-open-source-paths-infra-vs-platform"&gt;Two Open Source Paths: Infra vs. Platform&lt;/h2&gt;
&lt;p&gt;When we zoom in, the differences between the two plans become clear.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Beijing: &amp;ldquo;Foundation-Oriented&amp;rdquo; Open Source Path for AI Infra&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Beijing&amp;rsquo;s plan focuses on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Heterogeneous computing power scheduling&lt;/li&gt;
&lt;li&gt;Model evaluation toolchains&lt;/li&gt;
&lt;li&gt;Data elements and data governance&lt;/li&gt;
&lt;li&gt;RISC-V software-hardware collaboration&lt;/li&gt;
&lt;li&gt;SBOM, license compatibility, open source compliance&lt;/li&gt;
&lt;li&gt;Supply chain security and industrial resilience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a typical perspective of &amp;ldquo;treating AI as an infrastructure problem.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It is less concerned with the number of projects or community size, and more with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether reusable engineering capabilities can be formed&lt;/li&gt;
&lt;li&gt;Whether these can be trusted by industry and government over the long term&lt;/li&gt;
&lt;li&gt;Whether they can stand up to scrutiny in terms of security, compliance, and governance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To some extent, Beijing is answering the question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How can open source become a &amp;ldquo;governable, auditable, and scalable public capability&amp;rdquo;?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Shanghai: &amp;ldquo;Scale and Internationalization&amp;rdquo; Path for AI Platform&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In contrast, Shanghai&amp;rsquo;s plan has a different focus:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Building an international open source community for artificial intelligence&lt;/li&gt;
&lt;li&gt;Covering the entire platform chain from development, training, testing, hosting, to operation&lt;/li&gt;
&lt;li&gt;Overseas sites, multilingual support, international activities&lt;/li&gt;
&lt;li&gt;Resource linkage through computing vouchers and model vouchers&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Open source platform first release / global simultaneous release&amp;rdquo; dual-release mechanism&lt;/li&gt;
&lt;li&gt;Clear targets for community, enterprise, and developer scale&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Shanghai cares more about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How open source can achieve scale effects&lt;/li&gt;
&lt;li&gt;How it can support the growth of commercial enterprises&lt;/li&gt;
&lt;li&gt;How it can be seen and adopted globally&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a path of &amp;ldquo;treating open source as a global digital product and platform capability.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="together-a-complete-but-tension-filled-structure"&gt;Together: A Complete but Tension-Filled Structure&lt;/h2&gt;
&lt;p&gt;When viewed together, Beijing and Shanghai&amp;rsquo;s plans form a more complete picture:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Beijing is responsible for &amp;ldquo;making open source solid,&amp;rdquo; while Shanghai is responsible for &amp;ldquo;taking open source global.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Structurally, this is a clear division of labor:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Beijing focuses on institutions, governance, and foundational capabilities&lt;/li&gt;
&lt;li&gt;Shanghai focuses on community, commercialization, and international communication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two paths are not in conflict; in theory, they are even complementary. The real question is whether they can form positive feedback in practice, rather than operating in silos.&lt;/p&gt;
&lt;h2 id="cautious-attitude-toward-institutionalized-platformized-open-source"&gt;Cautious Attitude Toward &amp;ldquo;Institutionalized, Platformized Open Source&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Precisely because both plans are so &amp;ldquo;systematic,&amp;rdquo; I am even more cautious.&lt;/p&gt;
&lt;p&gt;The reason is simple: this is not China&amp;rsquo;s first attempt to promote open source through foundations, associations, or platforms.&lt;/p&gt;
&lt;p&gt;Over the past decade, we have seen similar paths repeatedly, and recurring structural problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The difficulty of establishing neutrality and multi-party trust is extremely high&lt;/li&gt;
&lt;li&gt;There is a huge gap between showcase metrics (quantity, activities, certifications) and ecosystem strength&lt;/li&gt;
&lt;li&gt;Commercialization and long-term maintenance mechanisms are hard to sustain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These problems will not disappear just because the plans are more comprehensive.&lt;/p&gt;
&lt;h2 id="four-risks-to-watch-under-the-dual-plans"&gt;Four Risks to Watch Under the Dual Plans&lt;/h2&gt;
&lt;p&gt;If we are to &amp;ldquo;listen to their words and watch their actions,&amp;rdquo; I would focus on the following four risks:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Will Metrics Hijack Engineering Reality&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When &amp;ldquo;internationally influential projects,&amp;rdquo; &amp;ldquo;star projects,&amp;rdquo; and &amp;ldquo;first-release projects&amp;rdquo; become hard metrics, will this induce packaging, migration, and short-term hype, rather than truly solving engineering problems?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Will It Slide Toward Platform Centralism&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The long-term pattern of AI Infra is closer to a model that prioritizes protocols, standards, and interoperability. If it eventually evolves into &amp;ldquo;a few platforms concentrating resources and discourse power,&amp;rdquo; it may be efficient in the short term but will suppress external participation and international collaboration in the long run.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Is Internationalization Underestimated as an &amp;ldquo;Operational Issue&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;True international collaboration is never just about language, sites, or events; it also involves governance structures, compliance boundaries, and supply chain trust.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Will Application Demonstrations Become One-Off Projects&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If &amp;ldquo;first plans&amp;rdquo; and &amp;ldquo;computing vouchers&amp;rdquo; are just procurement tactics without continuous iteration and community feedback mechanisms, the long-term benefit to the ecosystem will be very limited.&lt;/p&gt;
&lt;h2 id="what-are-the-hard-results-of-ai-infra-open-source-after-three-years"&gt;What Are the &amp;ldquo;Hard Results&amp;rdquo; of AI Infra Open Source After Three Years&lt;/h2&gt;
&lt;p&gt;If we review the success of this round of institutionalized open source after three years, I would look for three types of results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether de facto standards and interoperable ecosystems have emerged, including scheduling interfaces, evaluation benchmarks, Agent tool invocation protocols, and observability semantics.&lt;/li&gt;
&lt;li&gt;Whether compliance and supply chain security have become public capabilities—SBOM, license compatibility, vulnerability monitoring—truly productized and service-oriented.&lt;/li&gt;
&lt;li&gt;Whether a sustainable maintenance business mechanism has been established, allowing core maintainers to stay long-term, rather than relying on passion and subsidies.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;If I were to use a North Star metric to measure the success of these plans, it would be the emergence of several outstanding open source commercial companies rooted in China and serving the world.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The open source ecosystem plans of Beijing and Shanghai mark a new phase of institutionalization and engineering for AI Infra open source in China. Over the next three years, the real achievements will not be about meeting targets, but about forming sustainable engineering capabilities, de facto standards, and maintenance mechanisms. Only through continuous participation and practice can open source become the public foundation of AI infrastructure.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jxj.beijing.gov.cn/zwgk/2024zcwj/202512/t20251224_4360437.html" target="_blank" rel="noopener"&gt;Beijing Open Source Ecosystem Construction Implementation Plan (2026–2028) - jxj.beijing.gov.cn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mp.weixin.qq.com/s/QZl66fUllKiePwQ7euhiGQ" target="_blank" rel="noopener"&gt;Shanghai&amp;rsquo;s Implementation Plan for Strengthening the Open Source System | Infographic - mp.weixin.qq.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>From 2025 Onwards, Software Engineering Shifts from Code-Centric to Runtime and Cost-Centric</title><link>https://jimmysong.io/blog/software-engineering-shift-runtime-cost-2025/</link><pubDate>Wed, 24 Dec 2025 14:59:11 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/software-engineering-shift-runtime-cost-2025/</guid><description>In 2025, software engineering shifts from code-centric to runtime and cost governance. AI and Agents move complexity to runtime, compute, and budget layers, reshaping engineering value.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;In 2025, the core of software engineering is no longer just about code itself, but about runtime controllability and cost governance. This shift is fundamentally reshaping the industry&amp;rsquo;s underlying logic.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Looking back at 2025, I became increasingly aware that this year was not about &amp;ldquo;code becoming unimportant,&amp;rdquo; but rather that &lt;strong&gt;the value coordinates of engineering have shifted as a whole&lt;/strong&gt;. For more than a decade, software engineering has focused on code quality, architectural evolution, and delivery efficiency. But starting in 2025, the key to system success is shifting—&lt;strong&gt;towards whether the runtime is controllable and whether costs are governable&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is not just a slogan, but a conclusion repeatedly validated by my real-world experiences throughout the year.&lt;/p&gt;
&lt;h2 id="my-2025-from-platform-engineering-to-runtime-challenges"&gt;My 2025: From &amp;ldquo;Platform Engineering&amp;rdquo; to &amp;ldquo;Runtime Challenges&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;In my annual review, I noted a clear change: I spent less time on &amp;ldquo;how to write a good system,&amp;rdquo; and more time on &amp;ldquo;how to keep the system running stably, reliably, and affordably.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This shift in focus is a natural extension of a decade of cloud native evolution.&lt;/p&gt;
&lt;p&gt;The following timeline diagram illustrates how my focus has changed over recent years:
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/focus-shift-timeline-en.svg" data-img="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/focus-shift-timeline-en.svg" alt="Figure 1: My Focus Shift Timeline" data-caption="Figure 1: My Focus Shift Timeline"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: My Focus Shift Timeline&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;My focus shifted from cloud native platform engineering to LLM application engineering, then to AI infrastructure, and finally to Agentic Runtime with governance and cost control.&lt;/p&gt;
&lt;p&gt;When AI workloads truly enter business scenarios, the core challenges engineers face also change:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Are inference, training, and evaluation competing for the same compute pool?&lt;/li&gt;
&lt;li&gt;Is GPU utilization consistently below expectations?&lt;/li&gt;
&lt;li&gt;Does cost scale linearly and uncontrollably with concurrency?&lt;/li&gt;
&lt;li&gt;Does the system have failure isolation and replay capabilities?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These issues go far beyond the code level.&lt;/p&gt;
&lt;h2 id="industry-consensus-ai-is-shifting-the-focus-of-engineering"&gt;Industry Consensus: AI Is Shifting the Focus of Engineering&lt;/h2&gt;
&lt;p&gt;By 2025, an industry consensus is emerging: AI is rewriting software engineering. But the real change is not happening in the IDE or code completion speed—it is reflected in &lt;strong&gt;the migration of engineering complexity&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Previously, complexity was concentrated in code and interfaces, and problems were solved through abstraction, refactoring, and testing.&lt;/p&gt;
&lt;p&gt;Now, complexity has shifted to the runtime, resource, and cost layers, and must be addressed through scheduling, isolation, observability, and governance.&lt;/p&gt;
&lt;p&gt;This is why the same AI tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Serve as &amp;ldquo;accelerators&amp;rdquo; for junior engineers&lt;/li&gt;
&lt;li&gt;But act as &amp;ldquo;magnifiers&amp;rdquo; for senior engineers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI tools amplify whether you truly understand how systems run in production.&lt;/p&gt;
&lt;h2 id="why-cost-becomes-a-first-principle"&gt;Why &amp;ldquo;Cost&amp;rdquo; Becomes a First Principle&lt;/h2&gt;
&lt;p&gt;In traditional cloud native systems, low CPU utilization is often just an efficiency issue; but in AI systems, &lt;strong&gt;low GPU utilization is often a cash flow problem&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In 2025, I repeatedly encountered scenarios like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Resources &amp;ldquo;seem insufficient,&amp;rdquo; but utilization is not actually high&lt;/li&gt;
&lt;li&gt;Scaling up to solve queuing issues ends up increasing unit costs&lt;/li&gt;
&lt;li&gt;The system lacks clear budget and quota boundaries, so throttling becomes the only way to stop the bleeding&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The root cause of these phenomena is not model selection, but &lt;strong&gt;the lack of a runtime and cost control plane tailored for AI workloads&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The following flowchart visually illustrates the cyclical relationship between GPU resources and cost pressures:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/gpu-cost-cycle-en.svg" data-img="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/gpu-cost-cycle-en.svg" alt="Figure 2: GPU Resource and Cost Cycle in AI Systems" data-caption="Figure 2: GPU Resource and Cost Cycle in AI Systems"
width="2263"
height="320"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: GPU Resource and Cost Cycle in AI Systems&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In AI systems, limited GPU supply leads to queuing and waiting, which causes throughput to drop. Attempts to solve this through blind scaling only increase unit costs and create budget pressure, ultimately forcing the adoption of finer scheduling and governance strategies.&lt;/p&gt;
&lt;p&gt;Engineering problems ultimately manifest as cost issues.&lt;/p&gt;
&lt;h2 id="the-rise-of-agents-the-real-challenge-is-at-runtime"&gt;The Rise of Agents: The Real Challenge Is at Runtime&lt;/h2&gt;
&lt;p&gt;In 2025, Agent (Intelligent Agent, Agent, Intelligent Agent) became a hot topic; by 2026, it will enter the &amp;ldquo;can it actually run&amp;rdquo; stage.&lt;/p&gt;
&lt;p&gt;The challenge for Agents has never been about &amp;ldquo;how smart they are,&amp;rdquo; but rather:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether there are clear permission and data boundaries&lt;/li&gt;
&lt;li&gt;Whether they run in an isolated execution environment&lt;/li&gt;
&lt;li&gt;Whether they can be observed, evaluated, and replayed&lt;/li&gt;
&lt;li&gt;Whether they are subject to explicit cost and budget constraints&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities form the outline of &lt;strong&gt;Agentic Runtime (Agentic Runtime, Intelligent Agent Runtime)&lt;/strong&gt; that I have been trying to clarify throughout the year.&lt;/p&gt;
&lt;p&gt;The following flowchart shows the core capability layers of Agentic Runtime:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/agentic-runtime-layers-en.svg" data-img="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/agentic-runtime-layers-en.svg" alt="Figure 3: Agentic Runtime Capability Layers" data-caption="Figure 3: Agentic Runtime Capability Layers"
width="463"
height="983"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Agentic Runtime Capability Layers&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Agentic Runtime builds from the foundation of Agents and workflows, connecting through orchestration and tool protocols, with the runtime managing state, memory, and evaluation. It provides secure execution environments (Sandbox and Policy), and ultimately implements a resource and cost control plane that unifies GPU, quota, and billing management.&lt;/p&gt;
&lt;p&gt;Without a runtime, an Agent is just a demo; without cost constraints, an Agent is just a risk amplifier.&lt;/p&gt;
&lt;h2 id="outlook-for-2026-the-foundation-of-engineering-matters-again"&gt;Outlook for 2026: The &amp;ldquo;Foundation&amp;rdquo; of Engineering Matters Again&lt;/h2&gt;
&lt;p&gt;Looking ahead to 2026, I remain cautiously optimistic.&lt;/p&gt;
&lt;p&gt;I do not believe the future belongs to &amp;ldquo;those who write the best prompts,&amp;rdquo; but more likely to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Those who understand runtime boundaries&lt;/li&gt;
&lt;li&gt;Those who can govern compute as a constrained resource&lt;/li&gt;
&lt;li&gt;Those who design AI systems as long-running systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From 2025 onwards, software engineering is no longer code-centric, but &lt;strong&gt;runtime and cost-centric&lt;/strong&gt;. This is not a regression, but a return: a return to being responsible for the whole system and for real-world constraints.&lt;/p&gt;
&lt;p&gt;For me personally, this is both a year-end summary and the direction I will continue to invest in for the coming years.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In 2025, the focus of software engineering has shifted from code itself to runtime and cost governance. The rise of AI and Agents has not diminished the value of engineering, but has pushed complexity to a higher level. In the future, understanding runtime, managing compute and cost will become the new core competencies for engineers. I hope this year-end review provides some inspiration and reflection for fellow professionals.&lt;/p&gt;</content:encoded></item><item><title>From Cloud Native to AI Native: Why Kubernetes Is the Foundation for Next-Gen AI Agents</title><link>https://jimmysong.io/blog/ai-native-from-cloud-native/</link><pubDate>Wed, 24 Dec 2025 12:25:52 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-native-from-cloud-native/</guid><description>Explores why AI Agents need Kubernetes infrastructure and how Agent orchestration, MCP services, and AI gateways enable production-ready AI architectures.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;As a long-time practitioner in the cloud native field, I am increasingly convinced of one thing: &lt;strong&gt;AI Agents are not just a change in application form, but a migration of infrastructure paradigms.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As artificial intelligence evolves from demos and copilots to systems that truly take on tasks and responsibilities, &lt;strong&gt;AI Agents&lt;/strong&gt; are becoming the new execution units in enterprise IT architectures. They not only &amp;ldquo;think,&amp;rdquo; but also &lt;strong&gt;act&lt;/strong&gt;: they can invoke tools, access systems, and collaborate to achieve goals.&lt;/p&gt;
&lt;p&gt;This raises an important question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What kind of infrastructure should such systems run on?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In my view, Kubernetes remains a solid choice for large-scale scenarios—but only if we &lt;strong&gt;reimagine Kubernetes in an AI-native way&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="cloud-native-challenges-for-production-grade-ai-agents"&gt;Cloud Native Challenges for Production-Grade AI Agents&lt;/h2&gt;
&lt;p&gt;In real production environments, AI Agents expose infrastructure needs that are fundamentally different from traditional microservices. Agents are not &amp;ldquo;just another HTTP service&amp;rdquo;; they have three distinct characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Behavior is non-deterministic&lt;/strong&gt; (driven by model inference)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Execution paths are dynamic&lt;/strong&gt; (tool invocation cannot be fully enumerated in advance)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Decisions must be auditable, constrained, and reviewable&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we simply apply existing cloud native infrastructure, we quickly hit bottlenecks.&lt;/p&gt;
&lt;p&gt;The following table summarizes the main challenges and risks AI Agents face in cloud native environments:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Challenge Category&lt;/th&gt;
&lt;th&gt;Real Needs of Agents&lt;/th&gt;
&lt;th&gt;What Happens If Missing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Policy &amp;amp; Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dynamic control of tool and data access based on context, identity, and task&lt;/td&gt;
&lt;td&gt;Agents have &amp;ldquo;superuser&amp;rdquo; privileges, risks are uncontrollable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not just &amp;ldquo;did it succeed,&amp;rdquo; but also &lt;strong&gt;why was this decision made&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hard to debug, hard to review, hard to hold accountable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Governance &amp;amp; Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Platform-level guardrails enforce organizational policies&lt;/td&gt;
&lt;td&gt;Each Agent could become a &amp;ldquo;shadow AI&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Challenges and Risks for AI Agents in Cloud Native Environments
&lt;/figcaption&gt;
&lt;p&gt;All these issues point to one conclusion:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI Agents must be treated as first-class citizens in Kubernetes, not just ordinary workloads.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="core-architecture-making-agents-native-kubernetes-objects"&gt;Core Architecture: Making Agents Native Kubernetes Objects&lt;/h2&gt;
&lt;p&gt;Looking back at the evolution of cloud native technologies, we&amp;rsquo;ve gone through similar stages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Physical machines → Virtual machines&lt;/li&gt;
&lt;li&gt;Virtual machines → Containers&lt;/li&gt;
&lt;li&gt;Containers → Microservices&lt;/li&gt;
&lt;li&gt;Microservices → Declarative, governable platforms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;AI Agents are simply the next step.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A production-ready AI Agent architecture requires at least three layers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Agent Orchestration Layer&lt;/strong&gt;: Declaratively define Agents&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool Service-ization Layer (MCP Services)&lt;/strong&gt;: Turn capabilities into governable services&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Native Data Plane / Gateway&lt;/strong&gt;: Unify policy, security, and protocols&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="agent-orchestration-layer-declarative-agent-management"&gt;Agent Orchestration Layer: Declarative Agent Management&lt;/h2&gt;
&lt;p&gt;Agents should no longer be &amp;ldquo;runtime objects&amp;rdquo; inside an SDK—they should be managed like Pods or Deployments.&lt;/p&gt;
&lt;p&gt;Key concepts:&lt;/p&gt;
&lt;h3 id="agents-as-kubernetes-resources"&gt;Agents as Kubernetes Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Agents are defined using &lt;strong&gt;CRD (CustomResourceDefinition)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Lifecycle managed via &lt;code&gt;kubectl&lt;/code&gt; or GitOps&lt;/li&gt;
&lt;li&gt;Agent &lt;strong&gt;models, tools, and policies&lt;/strong&gt; are all explicitly declared&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A typical Agent definition includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Agent logic&lt;/strong&gt; (inference loop)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model configuration&lt;/strong&gt; (specifying which large language model to use)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Callable toolset&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;This closely mirrors how we once decomposed &amp;ldquo;applications&amp;rdquo; into Deployments, Services, and ConfigMaps.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="tool-service-ization-layer-mcp-services-are-essential"&gt;Tool Service-ization Layer: MCP Services Are Essential&lt;/h2&gt;
&lt;p&gt;In Agent architectures, &lt;strong&gt;tools&lt;/strong&gt; are where real &amp;ldquo;actions&amp;rdquo; happen.&lt;/p&gt;
&lt;p&gt;Early MCP tools were often:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local processes&lt;/li&gt;
&lt;li&gt;Tightly coupled to a single Agent&lt;/li&gt;
&lt;li&gt;Lacking versioning, permissions, and auditing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is unsustainable in enterprise environments.&lt;/p&gt;
&lt;h3 id="the-essence-of-mcp-service-ization"&gt;The Essence of MCP Service-ization&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Tools → &lt;strong&gt;Remote services&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Services → &lt;strong&gt;Kubernetes native workloads&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Capabilities → &lt;strong&gt;Reusable, governable, auditable&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This step is fundamentally similar to how we once turned scripts into microservices.&lt;/p&gt;
&lt;h2 id="ai-native-gateway-the-control-plane-entry-for-the-agent-world"&gt;AI Native Gateway: The &amp;ldquo;Control Plane Entry&amp;rdquo; for the Agent World&lt;/h2&gt;
&lt;p&gt;As the number of Agents grows and tools/models diversify, &lt;strong&gt;connectivity itself becomes a system risk&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Traditional API Gateways do not understand scenarios like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP&lt;/li&gt;
&lt;li&gt;Agent-to-Agent (A2A) communication&lt;/li&gt;
&lt;li&gt;Model invocation context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thus, we need an &lt;strong&gt;AI native gateway&lt;/strong&gt; dedicated to mediation and governance.&lt;/p&gt;
&lt;p&gt;It must understand at least three types of traffic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A2T&lt;/strong&gt;: Agent → Tool&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A2L&lt;/strong&gt;: Agent → LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A2A&lt;/strong&gt;: Agent ↔ Agent&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And enforce, across these paths:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identity and authorization&lt;/li&gt;
&lt;li&gt;Policy and guardrails&lt;/li&gt;
&lt;li&gt;Auditing and rate limiting&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="architecture-overview"&gt;Architecture Overview&lt;/h2&gt;
&lt;p&gt;The diagram below illustrates the core layers and traffic paths of an AI-native system on Kubernetes:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-native-from-cloud-native/5be5cb784d4b228006abdf024bb99d6f.svg" data-img="https://assets.jimmysong.io/images/blog/ai-native-from-cloud-native/5be5cb784d4b228006abdf024bb99d6f.svg" alt="Figure 1: AI Native Architecture Layers and Traffic Paths" data-caption="Figure 1: AI Native Architecture Layers and Traffic Paths"
width="1311"
height="1642"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI Native Architecture Layers and Traffic Paths&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;AI Agents do not negate cloud native; on the contrary:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI Agents are the natural extension of cloud native in the era of intelligence.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Declarative → Agent definitions&lt;/li&gt;
&lt;li&gt;Service → MCP Services&lt;/li&gt;
&lt;li&gt;Service Mesh → AI Native Gateway&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If Kubernetes is the &amp;ldquo;automated factory,&amp;rdquo; then AI Agents are the &lt;strong&gt;intelligent workers who actually get things done&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;And the AI native gateway is the &lt;strong&gt;security and governance system tailored for these intelligent workers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is not an optional architecture—it is &lt;strong&gt;the only path for AI to reach production&lt;/strong&gt;.&lt;/p&gt;</content:encoded></item><item><title>AI Open Source Landscape: A One-Stop Guide to AI Project Navigation and Scoring System</title><link>https://jimmysong.io/blog/ai-oss-landscape-intro/</link><pubDate>Tue, 23 Dec 2025 08:34:05 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-oss-landscape-intro/</guid><description>Comprehensive introduction to the AI Open Source Landscape&amp;#39;s positioning, interface, scoring model, and data mechanisms to help developers efficiently discover quality AI projects.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The AI Open Source Landscape is not just a project directory, but an innovative attempt to bring transparency and quantifiability to the AI open source ecosystem.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note: This article is intended for general readers and focuses on platform features and usage scenarios. If you want to see the technical details and formulas behind the scoring, please refer to:&lt;/strong&gt; &lt;a href="https://jimmysong.io/ai/ranking-criteria/"&gt;AI Project Scoring and Inclusion Criteria&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="project-background-and-positioning"&gt;Project Background and Positioning&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://jimmysong.io/ai/"&gt;AI Open Source Landscape&lt;/a&gt; aims to provide developers, researchers, and enterprise users with a one-stop navigation and evaluation platform for AI open source projects. With the rapid development of large language models (LLM, Large Language Model), multimodal models (Multimodal Model), and other AI technologies, the open source community has seen a surge of innovative projects. However, information is scattered and quality varies, making it difficult for users to filter and make decisions.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/ai-oss-landscape.webp" data-img="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/ai-oss-landscape.webp" alt="Figure 1: AI Open Source Landscape" data-caption="Figure 1: AI Open Source Landscape"
width="3653"
height="2494"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI Open Source Landscape&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The AI Open Source Landscape systematically collects mainstream AI open source projects. As of the time of writing, it has included 851 open source projects. This landscape combines a multi-dimensional scoring system to help users efficiently discover, compare, and select the most suitable AI tools and frameworks for their needs. The platform not only focuses on models themselves, but also covers datasets, inference engines, evaluation tools, application frameworks, and the entire ecosystem chain, striving to promote transparency, quantifiability, and sustainable development in the AI open source ecosystem.&lt;/p&gt;
&lt;h2 id="main-interface-and-feature-highlights"&gt;Main Interface and Feature Highlights&lt;/h2&gt;
&lt;p&gt;The platform homepage presents project distribution in both landscape and list views, supporting category filtering, keyword search, and tag navigation to help users quickly locate target projects.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/project-details.webp" data-img="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/project-details.webp" alt="Figure 2: Open Source Project Detail Page" data-caption="Figure 2: Open Source Project Detail Page"
width="2780"
height="2915"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Open Source Project Detail Page&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For general readers, the main experience points include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Card view: One-sentence overview, star rating, and overall score for quick browsing and comparison.&lt;/li&gt;
&lt;li&gt;Health card: Displays overall health and key dimensions (activity, community, influence, sustainability) on the project page or sidebar, with the latest update marked for easy assessment of maintenance status.&lt;/li&gt;
&lt;li&gt;Detail page: Provides more background information, project links, and application scenarios to help you evaluate suitability for your needs.&lt;/li&gt;
&lt;li&gt;Smart badges: Visually display labels such as &amp;ldquo;Active&amp;rdquo;, &amp;ldquo;New Project&amp;rdquo;, &amp;ldquo;Popular&amp;rdquo;, &amp;ldquo;Archived&amp;rdquo; on cards, helping you quickly capture key project features.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are interested in the specific rules for badge determination or scoring, detailed explanations are available on the &lt;a href="https://jimmysong.io/ai/ranking-criteria/"&gt;Scoring Rules Page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="scoring-and-ranking-mechanism"&gt;Scoring and Ranking Mechanism&lt;/h2&gt;
&lt;p&gt;The platform uses multi-dimensional scores to reflect the overall health and popularity of projects. The main dimensions include: &lt;strong&gt;Activity&lt;/strong&gt;, &lt;strong&gt;Community&lt;/strong&gt;, &lt;strong&gt;Quality&lt;/strong&gt;, &lt;strong&gt;Sustainability&lt;/strong&gt;, and the comprehensive &lt;strong&gt;Health&lt;/strong&gt; score. These scores help you quickly judge whether a project is suitable for production or experimentation.&lt;/p&gt;
&lt;h2 id="data-sources-and-update-mechanism"&gt;Data Sources and Update Mechanism&lt;/h2&gt;
&lt;p&gt;The platform&amp;rsquo;s data mainly comes from GitHub, project lists, official documentation, and community recommendations. We regularly and automatically synchronize and update metrics to ensure that the &amp;ldquo;last updated&amp;rdquo; and scores displayed on the interface reflect the current maintenance status of projects. Projects that have not been updated for a long time or are determined to be &amp;ldquo;inactive&amp;rdquo; are moved to the &lt;a href="https://jimmysong.io/ai/archived/"&gt;Archived Page&lt;/a&gt;. Archived projects remain searchable and retain historical scores, but will not appear in the default view of active rankings, making it easier for readers to focus on projects that are still maintained and active.&lt;/p&gt;
&lt;p&gt;For general readers, the key points are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The page displays key metrics and &amp;ldquo;last updated&amp;rdquo; time, helping you quickly judge whether a project is still maintained.&lt;/li&gt;
&lt;li&gt;The AI Open Source Landscape continuously iterates on the scoring model to improve fairness and differentiation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-to-contribute-and-correct-data"&gt;How to Contribute and Correct Data&lt;/h2&gt;
&lt;p&gt;If you want a project to be included or its data updated, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/rootsongjc/rootsongjc.github.io/issues/new?template=ai-resource.md" target="_blank" rel="noopener"&gt;Submit an AI Open Source Project Inclusion Request&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Keep the project&amp;rsquo;s README, License, documentation, and other information complete in the repository to facilitate our data collection and assessment.&lt;/li&gt;
&lt;li&gt;For faster synchronization or if you encounter data issues, contact the maintainers via project issues or raise a request in the site discussion area.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="typical-use-cases-or-user-feedback"&gt;Typical Use Cases or User Feedback&lt;/h2&gt;
&lt;p&gt;The AI Open Source Landscape has been widely used in various scenarios such as AI developer selection, enterprise technology research, and academic studies. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Developers can quickly filter models or tools that meet their needs through the platform, saving significant research time.&lt;/li&gt;
&lt;li&gt;Enterprise technical teams use the ranking lists for competitor analysis and technology planning.&lt;/li&gt;
&lt;li&gt;Educational and research institutions refer to the landscape to understand trends in the AI open source ecosystem, supporting course design and topic selection.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some users have commented that the platform is &amp;ldquo;comprehensive, well-structured, and fair in scoring,&amp;rdquo; greatly improving the efficiency of AI project selection and learning. Community suggestions continue to drive ongoing improvements in platform features and content.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The AI Open Source Landscape systematically and quantitatively organizes the AI open source ecosystem: the backend worker is responsible for reliable data collection and scoring calculations (supporting backfill and migration), while frontend components handle fast rendering and visualization (including smart badges, health cards, and metric explanations).&lt;/p&gt;
&lt;p&gt;If you want to learn more about the scoring details or participate in improvements:&lt;/p&gt;
&lt;p&gt;The community is welcome to join in evaluation, backfilling historical data, and refining scoring rules, working together to make the AI open source ecosystem more transparent and sustainable.&lt;/p&gt;</content:encoded></item><item><title>AI 2026: Infrastructure, Agents, and the Next Cloud-Native Shift</title><link>https://jimmysong.io/blog/ai-2026-infra-agentic-runtime/</link><pubDate>Fri, 19 Dec 2025 03:54:31 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-2026-infra-agentic-runtime/</guid><description>2026 AI&amp;#39;s turning point: not models, but infrastructure, agentic runtimes, GPU efficiency, and new organizational forms.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The real turning point for AI in 2026 is not autonomy, but the maturity of infrastructure - where agentic runtimes, GPU efficiency, and organizational design will decide who wins.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="introduction-2026-is-not-an-ai-moment-it-is-an-infrastructure-moment"&gt;Introduction: 2026 Is Not an AI Moment, It Is an Infrastructure Moment&lt;/h2&gt;
&lt;p&gt;Over the past fifteen years, every major shift in software has followed a familiar arc. Microservices were adopted not out of love for distributed systems, but because monoliths reached organizational limits. Kubernetes succeeded not because containers were novel, but because infrastructure finally matched how teams operated. Cloud native was never about YAML—it was about &lt;strong&gt;operability at scale&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;AI now stands at a similar inflection point.&lt;/p&gt;
&lt;p&gt;The central question for 2026 is not whether models will become more autonomous. That debate overlooks the core issue. Instead, the real question is whether AI can become &lt;strong&gt;operable, governable, and economically sustainable&lt;/strong&gt; within real systems.&lt;/p&gt;
&lt;p&gt;Most organizations today are limited not by intelligence, but by infrastructure: inefficient GPU utilization, escalating inference costs, fragile agent demos, and a tendency to treat AI as a feature rather than a runtime. The next phase of AI will be shaped not by model breakthroughs, but by the maturity of AI infrastructure and its ability to absorb responsibility.&lt;/p&gt;
&lt;h2 id="from-automation-to-capability-multiplication--a-familiar-cloud-native-pattern"&gt;From Automation to Capability Multiplication — A Familiar Cloud-Native Pattern&lt;/h2&gt;
&lt;p&gt;Reflecting on early cloud adoption, the dominant narrative was cost reduction: fewer servers, lower CapEx, elastic scaling. Yet, the true payoff emerged later, when teams realized cloud enabled &lt;strong&gt;entirely new operating models&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;AI is repeating this pattern.&lt;/p&gt;
&lt;p&gt;The following diagram illustrates the shift from automation to capability multiplication.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/from-automation-to-capability-multilication.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/from-automation-to-capability-multilication.svg" alt="Figure 1: From Automation to Capability Multiplication" data-caption="Figure 1: From Automation to Capability Multiplication"
width="1642"
height="1214"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: From Automation to Capability Multiplication&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The first wave of AI focused on labor replacement. The second wave reframes AI as &lt;strong&gt;capability multiplication&lt;/strong&gt;: the same team, observing more signals, covering broader areas, and acting sooner.&lt;/p&gt;
&lt;p&gt;This mirrors the evolution of monitoring, tracing, and SRE practices. Rather than reducing engineers, these systems enabled continuous observation instead of occasional sampling.&lt;/p&gt;
&lt;p&gt;Preemptive AI systems—monitoring every interaction, log, and signal—are only viable if the underlying infrastructure can support them. This exposes a critical constraint: &lt;strong&gt;AI capability scales faster than AI infrastructure&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Without efficient scheduling, isolation, and utilization, multiplying capability simply multiplies cost.&lt;/p&gt;
&lt;h2 id="agents-are-becoming-distributed-systems-whether-we-admit-it-or-not"&gt;Agents Are Becoming Distributed Systems, Whether We Admit It or Not&lt;/h2&gt;
&lt;p&gt;The industry often discusses agents as products. In reality, agents are evolving into &lt;strong&gt;distributed systems&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The diagram below highlights this architectural shift.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/agents-are-becoming-distributed-systems.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/agents-are-becoming-distributed-systems.svg" alt="Figure 2: Agents Are Becoming Distributed Systems" data-caption="Figure 2: Agents Are Becoming Distributed Systems"
width="1102"
height="1382"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Agents Are Becoming Distributed Systems&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Single-agent designs resemble early monoliths: impressive demos, fragile behavior, and opaque failure modes. As tasks grow in complexity, systems must decompose work into planning, execution, verification, and review—making coordination inevitable.&lt;/p&gt;
&lt;p&gt;This is not merely a philosophical change, but an architectural one.&lt;/p&gt;
&lt;p&gt;Multi-agent systems introduce challenges familiar from the microservices era:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Coordination and orchestration&lt;/li&gt;
&lt;li&gt;Resource contention&lt;/li&gt;
&lt;li&gt;Fault isolation&lt;/li&gt;
&lt;li&gt;Observability and rollback&lt;/li&gt;
&lt;li&gt;Deterministic artifacts between stages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Labeling this as &amp;ldquo;multi-agent collaboration&amp;rdquo; can be misleading. What is actually occurring is &lt;strong&gt;workload decomposition and control-plane emergence&lt;/strong&gt;. Agents are transitioning from tools to workloads competing for limited resources.&lt;/p&gt;
&lt;p&gt;Recognizing this clarifies why agent progress is inseparable from infrastructure maturity.&lt;/p&gt;
&lt;h2 id="ai-infra-is-the-missing-layer-between-models-and-organizations"&gt;AI Infra Is the Missing Layer Between Models and Organizations&lt;/h2&gt;
&lt;p&gt;Cloud native taught us that abstractions only scale when a control plane exists.&lt;/p&gt;
&lt;p&gt;Currently, AI lacks a mature control plane.&lt;/p&gt;
&lt;p&gt;The following image demonstrates the gap between models and organizations.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/ai-infra-is-the-missing-layer-between-models-and-organizations.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/ai-infra-is-the-missing-layer-between-models-and-organizations.svg" alt="Figure 3: AI Infra Is the Missing Layer Between Models and Organizations" data-caption="Figure 3: AI Infra Is the Missing Layer Between Models and Organizations"
width="1102"
height="1260"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: AI Infra Is the Missing Layer Between Models and Organizations&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Models are powerful, but the surrounding infrastructure—scheduling, isolation, quota enforcement, cost attribution, observability—remains primitive, especially at the GPU layer.&lt;/p&gt;
&lt;p&gt;GPUs are expensive, scarce, and often underutilized. In many environments, utilization remains below 30–40%, while inference costs continue to rise. Training pipelines monopolize resources, inference workloads spike unpredictably, and organizations must choose between waste and throttling innovation.&lt;/p&gt;
&lt;p&gt;This is not a model problem. It is fundamentally an &lt;strong&gt;AI infrastructure problem&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The next phase of AI will depend on treating GPUs as we learned to treat CPUs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fine-grained allocation&lt;/li&gt;
&lt;li&gt;Fair sharing&lt;/li&gt;
&lt;li&gt;Preemption and prioritization&lt;/li&gt;
&lt;li&gt;Clear ownership and accounting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Until GPU utilization becomes a primary design goal, AI systems will remain economically fragile.&lt;/p&gt;
&lt;h2 id="domain-expertise-matters-because-infrastructure-finally-exposes-it"&gt;Domain Expertise Matters Because Infrastructure Finally Exposes It&lt;/h2&gt;
&lt;p&gt;As models plateau in general reasoning, differentiation shifts elsewhere.&lt;/p&gt;
&lt;p&gt;The diagram below illustrates how infrastructure exposes domain expertise.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/domain-expertise-matters-because-infrastructure-finally-exposes-it.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/domain-expertise-matters-because-infrastructure-finally-exposes-it.svg" alt="Figure 4: Domain Expertise Matters Because Infrastructure Finally Exposes It" data-caption="Figure 4: Domain Expertise Matters Because Infrastructure Finally Exposes It"
width="1482"
height="1302"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Domain Expertise Matters Because Infrastructure Finally Exposes It&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In cloud-native systems, competitive advantage eventually moved from frameworks to &lt;strong&gt;operational excellence&lt;/strong&gt;: superior runbooks, incident response, and cost control. AI is following a similar trajectory.&lt;/p&gt;
&lt;p&gt;High-value AI systems must operate within dense, rule-heavy domains such as finance, healthcare, manufacturing, and infrastructure operations. What matters is not abstract intelligence, but the ability to encode domain constraints, exceptions, and failure patterns.&lt;/p&gt;
&lt;p&gt;Here, domain experts become central—not as prompt engineers, but as &lt;strong&gt;system shapers&lt;/strong&gt;. Their decisions define agent permissions, human intervention points, and error containment strategies.&lt;/p&gt;
&lt;p&gt;Infrastructure determines whether this expertise can be safely operationalized.&lt;/p&gt;
&lt;h2 id="simulation-is-becoming-the-new-staging-environment-for-ai"&gt;Simulation Is Becoming the New Staging Environment for AI&lt;/h2&gt;
&lt;p&gt;One of the most important lessons from cloud-native operations: distributed systems are not tested in production.&lt;/p&gt;
&lt;p&gt;AI systems that act, plan, and modify state are no exception.&lt;/p&gt;
&lt;p&gt;The following image shows simulation as the new staging environment.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/simulation-is-becoming-the-new-staging-environment-for-ai.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/simulation-is-becoming-the-new-staging-environment-for-ai.svg" alt="Figure 5: Simulation Is Becoming the New Staging Environment for AI" data-caption="Figure 5: Simulation Is Becoming the New Staging Environment for AI"
width="1062"
height="1482"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Simulation Is Becoming the New Staging Environment for AI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Training and validating agents directly in live environments is unsustainable. The future lies in &lt;strong&gt;simulation-first AI development&lt;/strong&gt;—sandboxed environments that mirror real systems, workloads, and constraints.&lt;/p&gt;
&lt;p&gt;This approach is analogous to staging clusters, chaos engineering, and load testing, but elevated for decision-making systems. Evaluation shifts from static benchmarks to behavioral metrics: intervention rates, rollback frequency, and cost impact.&lt;/p&gt;
&lt;p&gt;Organizations that build these environments will advance faster and safer. Those that do not may remain limited by conservative deployments and restricted autonomy.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Technological revolutions succeed not on novelty alone, but when infrastructure, tooling, and organizational models align.&lt;/p&gt;
&lt;p&gt;AI is nearing that pivotal moment.&lt;/p&gt;
&lt;p&gt;The leaders in 2026 will be those who:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Treat AI as a runtime, not just a feature&lt;/li&gt;
&lt;li&gt;Optimize for resource efficiency, especially GPUs&lt;/li&gt;
&lt;li&gt;Recognize agents as distributed systems&lt;/li&gt;
&lt;li&gt;Redesign organizations around continuous learning systems&lt;/li&gt;
&lt;li&gt;Invest in infrastructure ahead of autonomy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI is no longer just a model problem. It is an infrastructure challenge—and the next phase will be decided not in labs, but in production systems.&lt;/p&gt;</content:encoded></item><item><title>What I Saw at COSCon'25: The Real State of Open Source in China</title><link>https://jimmysong.io/blog/coscon-2025-china-open-source-observation/</link><pubDate>Thu, 18 Dec 2025 06:14:51 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/coscon-2025-china-open-source-observation/</guid><description>From an engineering and organizer&amp;#39;s perspective, real changes at COSCon&amp;#39;25: AI as the default backdrop, discussions returning to engineering issues, and Chinese open source entering a long-term phase.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Attending COSCon'25 in Beijing, I observed firsthand how open source in China is shifting: AI is now the default context, discussions are grounded in real engineering, and the community is embracing long-term thinking. These are not just trends—they are the new reality.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In early December this year, I attended COSCon'25, the China Open Source Annual Conference, in Beijing. Although I have worked in open source for many years, this was my first time participating in an event organized by the Open Source Society—and I joined as a sub-forum producer. Previously, I thought such conferences were too high-level or disconnected from reality, but after actually taking part, I found there was much to gain.&lt;/p&gt;
&lt;p&gt;A quick note: &lt;strong&gt;this article is not an official conference summary or review&lt;/strong&gt;. The organizers have already published detailed information about the event&amp;rsquo;s scale, attendee numbers, and forum sessions. If you&amp;rsquo;re interested in those details, please refer to the official article:
&lt;a href="https://mp.weixin.qq.com/s/1Q5xBUEmSN9MXon03P00lA" target="_blank" rel="noopener"&gt;COSCon'25: The 10th China Open Source Annual Conference Successfully Concludes in Beijing—A Comprehensive Recap!&lt;/a&gt;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/coscon-2025-china-open-source-observation/banner.webp" data-img="https://assets.jimmysong.io/images/blog/coscon-2025-china-open-source-observation/banner.webp" alt="Figure 1: 10th COSCon Venue" data-caption="Figure 1: 10th COSCon Venue"
width="1080"
height="716"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: 10th COSCon Venue&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;What I want to share is this: &lt;strong&gt;Standing on site, on the engineering front lines, and as an organizer rather than an audience member, I saw real changes happening in Chinese open source.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="this-coscon-no-more-trying-to-prove-open-source-matters"&gt;This COSCon: No More Trying to &amp;ldquo;Prove Open Source Matters&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;One clear impression:
&lt;strong&gt;Almost no one spent time arguing &amp;ldquo;why do open source&amp;rdquo; anymore.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In earlier years, common narratives included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is open source safe?&lt;/li&gt;
&lt;li&gt;Can open source be commercialized?&lt;/li&gt;
&lt;li&gt;Can China create its own open source projects?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But at COSCon'25, these questions were basically assumed as &amp;ldquo;background conditions.&amp;rdquo; The focus shifted to &lt;strong&gt;those already doing open source, and what comes next&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This doesn&amp;rsquo;t mean the issues have disappeared, but it does mean:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In China&amp;rsquo;s engineering circles, open source is no longer a &amp;ldquo;philosophical choice&amp;rdquo;—it&amp;rsquo;s a practical way of working.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="ai-as-background-noise-not-the-main-character"&gt;AI as Background Noise, Not the Main Character&lt;/h2&gt;
&lt;p&gt;The theme of this year&amp;rsquo;s conference was Open Source × Open Intelligence, but interestingly, &lt;strong&gt;AI did not take center stage&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Instead, it was more like background noise—
Almost every topic touched on AI, but no one was giving talks solely &amp;ldquo;about AI.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You would see it repeatedly in areas like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cloud native scheduling, focusing on GPU / NPU / heterogeneous resources&lt;/li&gt;
&lt;li&gt;Storage and data, focusing on data paths for training and inference&lt;/li&gt;
&lt;li&gt;Serverless, focusing on LLM cold starts and elasticity&lt;/li&gt;
&lt;li&gt;Observability, focusing on what to do when system complexity gets out of hand&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI was not treated as a &amp;ldquo;hot trend,&amp;rdquo; but as &lt;strong&gt;a new workload reality&lt;/strong&gt;.
This is a significant change, though not one easily captured in press releases.&lt;/p&gt;
&lt;h2 id="real-impressions-as-a-cloud-native-sub-forum-producer"&gt;Real Impressions as a Cloud Native Sub-forum Producer&lt;/h2&gt;
&lt;p&gt;I helped organize the cloud native open source sub-forum at this year&amp;rsquo;s conference. This role gave me a perspective very different from that of a typical attendee.&lt;/p&gt;
&lt;h3 id="first-topics-clearly-converged-on-engineering-problems"&gt;First, Topics Clearly Converged on &amp;ldquo;Engineering Problems&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;There were almost no talks about Kubernetes concepts;
Very few about &amp;ldquo;architectural philosophies.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Instead, the focus was on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What pitfalls did you encounter at what scale?&lt;/li&gt;
&lt;li&gt;Why did you choose this solution over another?&lt;/li&gt;
&lt;li&gt;Which problems remain unsolved?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many presentations weren&amp;rsquo;t &amp;ldquo;pleasant to hear,&amp;rdquo; but they were very real.&lt;/p&gt;
&lt;h3 id="second-the-boundary-between-academia-and-industry-is-thinning"&gt;Second, The Boundary Between Academia and Industry Is Thinning&lt;/h3&gt;
&lt;p&gt;This was especially evident this year.&lt;/p&gt;
&lt;p&gt;Some talks from universities and research institutes were no longer just &amp;ldquo;from a paper&amp;rsquo;s perspective,&amp;rdquo; but directly addressed core issues in industrial systems, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cold start of serverless LLMs&lt;/li&gt;
&lt;li&gt;The real value of RDMA in inference paths&lt;/li&gt;
&lt;li&gt;Whether prefill/decode separation is truly feasible in engineering&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These topics may not be immediately applicable, but &lt;strong&gt;they are now colliding head-on with engineering problems&lt;/strong&gt;, rather than talking past each other.&lt;/p&gt;
&lt;h3 id="third-open-source-is-no-longer-just-about-code"&gt;Third, Open Source Is No Longer Just About Code&lt;/h3&gt;
&lt;p&gt;In many discussions, &amp;ldquo;governance,&amp;rdquo; &amp;ldquo;maintenance cost,&amp;rdquo; and &amp;ldquo;community collaboration&amp;rdquo; came up frequently.&lt;/p&gt;
&lt;p&gt;This is a signal:
When a project is truly being used, &lt;strong&gt;code is no longer the hardest part&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="main-forum-more-questions-not-answers"&gt;Main Forum: More Questions, Not Answers&lt;/h2&gt;
&lt;p&gt;If I had to sum up the main forum in one sentence:
&lt;strong&gt;It kept raising questions, but wasn&amp;rsquo;t in a hurry to provide answers.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Has the boundary of open source changed in the AI era?&lt;/li&gt;
&lt;li&gt;Should models, data, and chips become part of the open source core?&lt;/li&gt;
&lt;li&gt;Are developers&amp;rsquo; roles being redefined?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are no standard answers to these questions, but the fact that they are being raised repeatedly shows they have become common concerns, not just the thoughts of a few.&lt;/p&gt;
&lt;h2 id="exhibition-area-and-sub-forums-closer-to-the-real-ecosystem"&gt;Exhibition Area and Sub-forums: Closer to the Real Ecosystem&lt;/h2&gt;
&lt;p&gt;Compared to the main forum, I personally paid more attention to the sub-forums and exhibition area.&lt;/p&gt;
&lt;p&gt;There, you would see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many projects no longer emphasize &amp;ldquo;who they want to replace&amp;rdquo;&lt;/li&gt;
&lt;li&gt;More discussions about &amp;ldquo;who they can work with&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Several communities are seriously discussing long-term maintenance, not just releasing versions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This may not be glamorous, but it&amp;rsquo;s important.&lt;/p&gt;
&lt;h2 id="a-personal-judgment"&gt;A Personal Judgment&lt;/h2&gt;
&lt;p&gt;If I had to make a judgment about COSCon'25, I would say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Chinese open source is shifting from &amp;ldquo;can we do it&amp;rdquo; to &amp;ldquo;can we sustain it for the long term.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a more difficult, but also more realistic, stage.&lt;/p&gt;
&lt;p&gt;This COSCon did not try to create a grand narrative. Instead, it felt like a &amp;ldquo;status exposure&amp;rdquo; at a particular stage:
There are more questions, participants are more diverse, but the discussions are also closer to the real world.&lt;/p&gt;
&lt;p&gt;Open source doesn&amp;rsquo;t depend on a single conference to move forward, but being on site helps you see more clearly:
&lt;strong&gt;Where exactly are we standing right now?&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title>Decoding Goose: Why It Joined AAIF and What This Means for Agentic Runtime</title><link>https://jimmysong.io/blog/goose-aaif-agentic-runtime/</link><pubDate>Fri, 12 Dec 2025 08:16:48 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/goose-aaif-agentic-runtime/</guid><description>An analysis of Block&amp;#39;s Goose project, why it became one of the first Agentic AI Foundation (AAIF) projects, and what this means for Agentic Runtime and the evolution of AI-Native infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Goose is not a project that excites you at first glance in this wave of Agent innovation, but its entry into AAIF signals a deeper shift in how we think about Agentic Runtime and AI-Native infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At first glance, &lt;a href="https://github.com/block/goose" target="_blank" rel="noopener"&gt;Goose&lt;/a&gt; is not a project that immediately excites people.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/goose-aaif-agentic-runtime/goose.webp" data-img="https://assets.jimmysong.io/images/blog/goose-aaif-agentic-runtime/goose.webp" alt="Figure 1: Goose App UI" data-caption="Figure 1: Goose App UI"
width="2622"
height="2360"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Goose App UI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;It doesn&amp;rsquo;t have flashy demos, nor does it showcase overwhelming multimodal capabilities, and it certainly doesn&amp;rsquo;t look like an AI product aimed at consumers. Yet, this seemingly &amp;ldquo;plain&amp;rdquo; project became one of the first donations to the Agentic AI Foundation (AAIF), standing alongside Anthropic&amp;rsquo;s MCP and OpenAI&amp;rsquo;s AGENTS.md.&lt;/p&gt;
&lt;p&gt;This fact alone is worth a closer look.&lt;/p&gt;
&lt;p&gt;This article does not aim to prove how powerful Goose is, but rather to answer three more practical questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What overlooked but long-term critical problems does Goose actually solve?&lt;/li&gt;
&lt;li&gt;Why was it Goose, and not another Agent framework, that entered AAIF?&lt;/li&gt;
&lt;li&gt;What does this mean for Agentic Runtime and AI-Native infrastructure, which I care about?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="gooses-true-positioning-its-not-an-ide-or-a-chatbot"&gt;Goose&amp;rsquo;s True Positioning: It&amp;rsquo;s Not an IDE or a Chatbot&lt;/h2&gt;
&lt;p&gt;If you only look at its surface features, Goose is easily mistaken for one of two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &amp;ldquo;multi-model AI desktop client&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Or an &amp;ldquo;intelligent programming assistant that can run commands&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But inside Block, it was never designed as a &amp;ldquo;tool&amp;rdquo; from the start.&lt;/p&gt;
&lt;p&gt;Goose&amp;rsquo;s origin is closely tied to Block&amp;rsquo;s engineering environment.&lt;/p&gt;
&lt;p&gt;Block (formerly Square) is a classic engineering-driven company: complex systems, high automation needs, many internal tools, and very high execution costs in real production environments. In its recent AI transformation, Block did not focus on &amp;ldquo;which model to choose&amp;rdquo; or &amp;ldquo;which AI tool to introduce,&amp;rdquo; but directly targeted the engineering execution layer itself.&lt;/p&gt;
&lt;p&gt;Goose was born in this context.&lt;/p&gt;
&lt;p&gt;Its goal is not to &amp;ldquo;help people code faster,&amp;rdquo; but to enable models to &lt;strong&gt;stably and controllably take action&lt;/strong&gt;: run tests, modify code, drive UIs, call internal systems, and operate reliably in real engineering environments.&lt;/p&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Goose is more like an executable Agent Runtime than a conversation-centric product.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="blocks-ai-transformation-started-with-organization-not-tools"&gt;Block&amp;rsquo;s AI Transformation Started with Organization, Not Tools&lt;/h2&gt;
&lt;p&gt;To understand Goose, you can&amp;rsquo;t ignore a key organizational shift at Block.&lt;/p&gt;
&lt;p&gt;In an interview with Block&amp;rsquo;s CTO, one signal was very clear: the starting point for AI transformation was not buying tools or stacking models, but the organizational structure itself.&lt;/p&gt;
&lt;p&gt;Block shifted from a business-line GM model to a more functionally oriented structure, making engineering and design the company&amp;rsquo;s core scheduling units again. This is essentially a proactive response to Conway&amp;rsquo;s Law.&lt;/p&gt;
&lt;p&gt;If the organizational structure doesn&amp;rsquo;t allow technical capabilities to be orchestrated centrally, Agents will ultimately remain &amp;ldquo;personal assistants&amp;rdquo; or &amp;ldquo;engineering toys.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;From this perspective, Goose is not just a tool, but a &lt;strong&gt;cultural signal&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Every employee can use AI to build and execute real system behaviors.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This also explains a fact many overlook:
Goose was not packaged as SaaS, nor was it rushed to commercialization, but was open-sourced and rapidly standardized.&lt;/p&gt;
&lt;p&gt;Because its role inside Block is closer to an &amp;ldquo;operating system for execution models&amp;rdquo; than a product that can be sold separately.&lt;/p&gt;
&lt;h2 id="why-did-goose-enter-aaif-not-because-its-technically-strongest"&gt;Why Did Goose Enter AAIF? Not Because It&amp;rsquo;s &amp;ldquo;Technically Strongest&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;This is what confuses outsiders the most.&lt;/p&gt;
&lt;p&gt;If you only look at flashy features, model support, or community popularity, Goose doesn&amp;rsquo;t stand out. But AAIF&amp;rsquo;s choice was not about &amp;ldquo;maximum capability,&amp;rdquo; but about &lt;strong&gt;whether the position is right&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Looking at the first batch of AAIF projects, a clear chain emerges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP (Anthropic): Defines how models safely and standardly call tools&lt;/li&gt;
&lt;li&gt;AGENTS.md (OpenAI): Defines behavioral conventions for Agents in code repositories&lt;/li&gt;
&lt;li&gt;Goose (Block): A real, runnable, open-source Agent execution framework&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Goose&amp;rsquo;s role is not to set new protocols, but to serve as the &lt;strong&gt;practical carrier and reference implementation&lt;/strong&gt; for these protocols.&lt;/p&gt;
&lt;p&gt;It proves one thing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP is not just a paper standard&lt;/li&gt;
&lt;li&gt;Agents are not just research concepts&lt;/li&gt;
&lt;li&gt;In real enterprise environments, they can actually run&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From this angle, Goose&amp;rsquo;s &amp;ldquo;ordinariness&amp;rdquo; is actually an advantage.&lt;/p&gt;
&lt;p&gt;It is not tied to Block&amp;rsquo;s business moat, nor does it have irreplaceable private APIs; it can be forked, replaced, audited—&amp;ldquo;boring&amp;rdquo; enough, and neutral enough.&lt;/p&gt;
&lt;p&gt;And that is the most important trait of public infrastructure.&lt;/p&gt;
&lt;h2 id="gooses-value-lies-not-in-today-but-in-23-years"&gt;Goose&amp;rsquo;s Value Lies Not in Today, But in 2–3 Years&lt;/h2&gt;
&lt;p&gt;From a longer-term perspective, Goose&amp;rsquo;s value becomes clearer.&lt;/p&gt;
&lt;p&gt;What we&amp;rsquo;re experiencing now is much like the early days of containers:
Most Agent projects today are demos, IDE plugins, or workflow wrappers, but what&amp;rsquo;s really missing is a &lt;strong&gt;sustainable, schedulable, observable execution layer&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Goose is already moving in this direction.&lt;/p&gt;
&lt;p&gt;Block&amp;rsquo;s metrics for Goose&amp;rsquo;s success are straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How many human hours are saved each week&lt;/li&gt;
&lt;li&gt;How much non-technical teams reduce their dependence on engineering teams&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Behind this is a judgment I&amp;rsquo;m increasingly convinced of:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What enterprises truly need is not &amp;ldquo;smarter models,&amp;rdquo; but &amp;ldquo;cheaper execution.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The long-term value of Agents is not in generation quality, but in execution substitution rate.&lt;/p&gt;
&lt;h2 id="aaif-is-an-attempt-at-infrastructure-level-consensus"&gt;AAIF Is an Attempt at Infrastructure-Level Consensus&lt;/h2&gt;
&lt;p&gt;Just as CNCF did for cloud native, AAIF is not guaranteed to succeed.&lt;/p&gt;
&lt;p&gt;But it at least marks a shift:
Agents are no longer just application-layer innovations, but are beginning to enter the stage of infrastructure-layer collaboration.&lt;/p&gt;
&lt;p&gt;As a reference implementation, Goose is likely to remain in this ecosystem for a long time—even if it is replaced, rewritten, or evolved in the future.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;If you see Goose as a &amp;ldquo;product,&amp;rdquo; it is indeed not dazzling.&lt;/p&gt;
&lt;p&gt;But if you place it in the long-term evolution path of Agentic AI, its significance becomes clear:&lt;/p&gt;
&lt;p&gt;It is not the end, but a necessary intermediate state.&lt;/p&gt;
&lt;p&gt;For me, the emergence of Goose further confirms one thing:&lt;/p&gt;
&lt;p&gt;Agentic Runtime is not a conceptual problem, but an engineering and organizational one.&lt;/p&gt;
&lt;p&gt;And that is one of the most worthwhile directions to invest energy in over the next few years.&lt;/p&gt;</content:encoded></item><item><title>ARK: Multi-Agent Systems Are Finally Entering the Engineer's World</title><link>https://jimmysong.io/blog/ark-agentic-runtime-for-kubernetes/</link><pubDate>Thu, 11 Dec 2025 13:19:42 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ark-agentic-runtime-for-kubernetes/</guid><description>How ARK uses cloud-native architecture and declarative runtime to drive engineering adoption of multi-agent systems and shape the Agentic Runtime ecosystem.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The deep integration of cloud native and AI, with the ARK platform, provides a new paradigm for engineering multi-agent systems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;AI Agents are moving from the &amp;ldquo;single agent demo&amp;rdquo; stage to &amp;ldquo;large-scale operation.&amp;rdquo; The real challenge does not lie in the model itself, but in engineering issues at runtime: model management, tool invocation, state maintenance, elastic scaling, team collaboration, observability, deployment, and upgrades. These are problems that traditional agent libraries struggle to solve.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ARK (Agentic Runtime for Kubernetes)&lt;/strong&gt; provides a fully operational, observable, governable, and continuously deliverable multi-agent operating system. It is not a Python library, but a complete runtime platform.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-homepage.webp" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-homepage.webp" alt="Figure 1: ARK Dashboard" data-caption="Figure 1: ARK Dashboard"
width="3176"
height="1822"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: ARK Dashboard&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Note: In this article, ARK refers to McKinsey&amp;rsquo;s open-source &lt;a href="https://github.com/mckinsey/ark-agent-runtime-for-kubernetes" target="_blank" rel="noopener"&gt;ARK Agent Runtime for Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This article, from an engineer&amp;rsquo;s perspective, will reorganize ARK&amp;rsquo;s core capabilities and answer the following questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What engineering challenges does ARK actually solve?&lt;/li&gt;
&lt;li&gt;Why is it worth special attention in the cloud native field?&lt;/li&gt;
&lt;li&gt;How is it fundamentally different from frameworks like LangChain and CrewAI?&lt;/li&gt;
&lt;li&gt;What insights does it offer for the Agentic Runtime ecosystem?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="ark-architecture-treating-agents-as-kubernetes-native-workloads"&gt;ARK Architecture: Treating Agents as Kubernetes-Native Workloads&lt;/h2&gt;
&lt;p&gt;The core idea of ARK is: &lt;strong&gt;An agent is not a script, but a schedulable, governable, and observable Kubernetes workload.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The following architecture diagram illustrates ARK&amp;rsquo;s underlying structure.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/168007ae485fa14769e5483aa20805d3.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/168007ae485fa14769e5483aa20805d3.svg" alt="Figure 2: ARK Overall Architecture" data-caption="Figure 2: ARK Overall Architecture"
width="2060"
height="1146"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: ARK Overall Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This diagram highlights ARK&amp;rsquo;s key design points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CRDs declare requirements&lt;/strong&gt; (Agent, Model, Team, Tool, Memory, etc.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Controller translates declarations into actual Pods/Services&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The API provides a unified communication entry point and team orchestration&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory supports long-term state management for agents&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MCP Server enables external systems to become tools&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dashboard provides visual management and observability&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ARK adopts the typical cloud-native Operator pattern and applies it to multi-agent systems.&lt;/p&gt;
&lt;h2 id="crd-arks-abstraction-layer"&gt;CRD: ARK&amp;rsquo;s &amp;ldquo;Abstraction Layer&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Unlike traditional agent frameworks where &amp;ldquo;code is logic,&amp;rdquo; ARK uses CRDs (Custom Resource Definitions) to abstract the components of agent applications.&lt;/p&gt;
&lt;p&gt;The main CRD types in ARK include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model&lt;/li&gt;
&lt;li&gt;Agent&lt;/li&gt;
&lt;li&gt;Team&lt;/li&gt;
&lt;li&gt;Tool&lt;/li&gt;
&lt;li&gt;Memory&lt;/li&gt;
&lt;li&gt;Evaluation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These CRDs correspond to all the key components of an agent system.&lt;/p&gt;
&lt;p&gt;The following diagram shows the structure of the CRDs:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/b464d2b85b6d664b51fa48a5aed2fbd0.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/b464d2b85b6d664b51fa48a5aed2fbd0.svg" alt="Figure 3: CRD Structure (Simplified)" data-caption="Figure 3: CRD Structure (Simplified)"
width="795"
height="829"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: CRD Structure (Simplified)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Through CRDs, ARK achieves the following engineering features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;All resources are GitOps-ready&lt;/strong&gt;, supporting declarative management&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Changes are auditable, reversible, and continuously deliverable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The evolution of models, tools, and agents does not require business code changes&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the key gene of ARK&amp;rsquo;s engineering-oriented system.&lt;/p&gt;
&lt;h2 id="agent-execution-flow-from-query-to-tool-invocation"&gt;Agent Execution Flow: From Query to Tool Invocation&lt;/h2&gt;
&lt;p&gt;The following image shows how to view query details in the ARK Dashboard.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-queries.webp" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-queries.webp" alt="Figure 4: Viewing Query Details in ARK Dashboard" data-caption="Figure 4: Viewing Query Details in ARK Dashboard"
width="3176"
height="1822"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Viewing Query Details in ARK Dashboard&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In ARK, the complete execution flow for an agent receiving a query is as follows:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/67a8b1142ee63f7cacd4d907cd198ce4.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/67a8b1142ee63f7cacd4d907cd198ce4.svg" alt="Figure 5: Agent Execution Flow" data-caption="Figure 5: Agent Execution Flow"
width="1146"
height="591"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Agent Execution Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This flow has the following characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Memory modules are naturally involved in the execution flow, without code specialization&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Large language model (LLM, Large Language Model) and tool invocation are governed by the runtime&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agents can reside in Pods long-term, not just as one-off processes&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes ARK more like an &amp;ldquo;agent microservice platform.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Below is an example of a request and response:&lt;/p&gt;
&lt;h2 id="the-true-value-of-multi-agent-team-orchestration"&gt;The True Value of Multi-Agent: Team Orchestration&lt;/h2&gt;
&lt;p&gt;ARK&amp;rsquo;s Team CRD allows multiple agents to be woven into a higher-level &amp;ldquo;system,&amp;rdquo; enabling multi-agent collaboration.&lt;/p&gt;
&lt;p&gt;The following diagram shows the collaboration model of a multi-agent team:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/0fb6990e479cd7b5c0ff3c8e8626693b.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/0fb6990e479cd7b5c0ff3c8e8626693b.svg" alt="Figure 6: Multi-Agent Team Collaboration" data-caption="Figure 6: Multi-Agent Team Collaboration"
width="786"
height="499"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: Multi-Agent Team Collaboration&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;The engineering value of Team is reflected in:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Making &amp;ldquo;expert collaboration&amp;rdquo; declarative and configurable&lt;/li&gt;
&lt;li&gt;Flexible strategies (such as polling, role assignment, routing, etc.)&lt;/li&gt;
&lt;li&gt;A2A Gateway handles message passing&lt;/li&gt;
&lt;li&gt;The Team itself is observable (every round of collaboration is logged)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For enterprises, this means the &amp;ldquo;agent organizational structure&amp;rdquo; can be standardized, replayed, and tuned.&lt;/p&gt;
&lt;h2 id="fundamental-differences-between-ark-and-other-frameworks"&gt;Fundamental Differences Between ARK and Other Frameworks&lt;/h2&gt;
&lt;p&gt;Many engineers, upon first seeing ARK, may wonder:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Is it just LangChain or CrewAI wrapped in Kubernetes?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In fact, there are fundamental differences. The following diagram compares the structural differences between ARK and mainstream agent frameworks:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/8da9b272d1930a2356a6401b6615d134.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/8da9b272d1930a2356a6401b6615d134.svg" alt="Figure 7: ARK vs LangChain / AutoGPT / CrewAI" data-caption="Figure 7: ARK vs LangChain / AutoGPT / CrewAI"
width="3454"
height="345"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 7: ARK vs LangChain / AutoGPT / CrewAI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The table below further summarizes the key differences:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Traditional Agent Libraries&lt;/th&gt;
&lt;th&gt;ARK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Pattern&lt;/td&gt;
&lt;td&gt;Write Python code&lt;/td&gt;
&lt;td&gt;Write CRDs (declarative)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Local/Container&lt;/td&gt;
&lt;td&gt;Kubernetes-native scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Managed inside code&lt;/td&gt;
&lt;td&gt;Memory CR + Service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tools&lt;/td&gt;
&lt;td&gt;Integrated at code level&lt;/td&gt;
&lt;td&gt;Tool CR + MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Agent&lt;/td&gt;
&lt;td&gt;Dialog managed in code&lt;/td&gt;
&lt;td&gt;Team CR + A2A protocol&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Almost none&lt;/td&gt;
&lt;td&gt;OTel / Langfuse / Dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use Cases&lt;/td&gt;
&lt;td&gt;Demo / Prototype / Single Agent&lt;/td&gt;
&lt;td&gt;Enterprise production / Multi-Agent Systems&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: ARK vs Traditional Agent Libraries
&lt;/figcaption&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LangChain is a &amp;ldquo;library for building agents,&amp;rdquo; while ARK is a &amp;ldquo;platform for running agents.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The two are not in conflict and are, in fact, highly complementary.&lt;/p&gt;
&lt;h2 id="the-engineering-value-of-ark"&gt;The Engineering Value of ARK&lt;/h2&gt;
&lt;p&gt;To summarize ARK&amp;rsquo;s engineering value in simple terms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Turns agents into &lt;strong&gt;governable workloads&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Unifies models, tools, and memory as &lt;strong&gt;reusable resources&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Makes multi-agent collaboration &lt;strong&gt;structured, observable, and tunable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Brings agent upgrades and iteration into &lt;strong&gt;CI/CD + GitOps&lt;/strong&gt; mode&lt;/li&gt;
&lt;li&gt;Enables enterprises to &lt;strong&gt;manage agents like microservices&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a clear evolution path:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent → Service → Platform → Runtime → Operating System&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;ARK is currently positioned at the fourth stage: Runtime.&lt;/p&gt;
&lt;h2 id="insights-for-agentic-runtime"&gt;Insights for Agentic Runtime&lt;/h2&gt;
&lt;p&gt;ARK provides three direct insights for building Agentic Runtimes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unified Scheduling System&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The agent runtime must run on a unified scheduling system (Kubernetes, MicroVM, Wasmtime, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Declarative Capability Boundaries&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Must use declarative abstractions to split capability boundaries, including:
&lt;ul&gt;
&lt;li&gt;Model Layer&lt;/li&gt;
&lt;li&gt;Tool Layer&lt;/li&gt;
&lt;li&gt;Memory Layer&lt;/li&gt;
&lt;li&gt;Workflow Layer&lt;/li&gt;
&lt;li&gt;Team Layer&lt;/li&gt;
&lt;li&gt;State Layer&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Observability is essential; otherwise, multi-agent systems cannot be engineered
&lt;ul&gt;
&lt;li&gt;Langfuse&lt;/li&gt;
&lt;li&gt;OTel&lt;/li&gt;
&lt;li&gt;Logs / Events&lt;/li&gt;
&lt;li&gt;Structured JSON&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ARK demonstrates a direction:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multi-agent systems are an engineering problem, not a prompt engineering problem.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;If you only need to build a simple agent, frameworks like LangChain, CrewAI, and AutoGPT are sufficient.&lt;/p&gt;
&lt;p&gt;But if you want to operate a system composed of dozens or hundreds of agents that need to collaborate, run long-term, and support continuous delivery and governance, runtimes like ARK are the inevitable trend.&lt;/p&gt;
&lt;p&gt;It provides Agentic AI with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A cloud-native runtime model&lt;/li&gt;
&lt;li&gt;Observable execution paths&lt;/li&gt;
&lt;li&gt;Governable abstraction layers&lt;/li&gt;
&lt;li&gt;Extensible, componentized architecture&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, ARK deserves to be regarded as an early model for engineering multi-agent systems.&lt;/p&gt;</content:encoded></item><item><title>Can Open Source Suddenly Disappear? An AI Chat Dev Tool Went 404 Overnight</title><link>https://jimmysong.io/blog/ai-project-lunary-404/</link><pubDate>Thu, 11 Dec 2025 05:20:12 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-project-lunary-404/</guid><description>Lunary, an open-source project in the AI DevTool space, suddenly deleted its GitHub repo, exposing the instability of commercial open source projects.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Open source&amp;rdquo; in the AI era is no longer a trustworthy promise. Commercial projects can withdraw their code at any time, and developers must be wary of the gap between appearances and reality.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-disappearance-of-lunarys-repository-a-real-case-of-open-source-vanishing"&gt;The Disappearance of Lunary&amp;rsquo;s Repository: A Real Case of Open Source &amp;ldquo;Vanishing&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;While updating the AI open source project library on my website, I encountered a situation that left me stunned for the first time:
An &amp;ldquo;open-source AI tool&amp;rdquo; that still promotes itself, with an active website and commercial services, suddenly vanished from GitHub—its repository went straight to 404.&lt;/p&gt;
&lt;p&gt;The project is called Lunary.&lt;/p&gt;
&lt;p&gt;Original repository address:
&lt;a href="https://github.com/lunary-ai/lunary" target="_blank" rel="noopener"&gt;https://github.com/lunary-ai/lunary&lt;/a&gt;
It now returns a 404 Not Found.&lt;/p&gt;
&lt;p&gt;Notably, the official site lunary.ai remains online, but the core promise of an &amp;ldquo;open-source codebase&amp;rdquo; has disappeared.&lt;/p&gt;
&lt;h2 id="lunarys-positioning-and-features"&gt;Lunary&amp;rsquo;s Positioning and Features&lt;/h2&gt;
&lt;p&gt;Here is an overview of Lunary&amp;rsquo;s main features and positioning to help understand its role in the AI tool ecosystem.&lt;/p&gt;
&lt;p&gt;Lunary claims to be an Observability and Evaluations platform for large language model (LLM, Large Language Model) applications, focusing on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM conversation and feedback logs&lt;/li&gt;
&lt;li&gt;Cost, latency, and metrics analysis&lt;/li&gt;
&lt;li&gt;Prompt version management&lt;/li&gt;
&lt;li&gt;Distributed tracing&lt;/li&gt;
&lt;li&gt;Evaluations&lt;/li&gt;
&lt;li&gt;Supports both self-hosted and managed modes&lt;/li&gt;
&lt;li&gt;Provides JS / Python SDKs, integrates with LangChain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its overall positioning is clear:
&amp;ldquo;Development and debugging tools for AI applications.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In fact, products like this have emerged rapidly over the past year, forming a new AI DevTool track.&lt;/p&gt;
&lt;h2 id="the-reality-and-risks-behind-the-open-source-label"&gt;The Reality and Risks Behind the &amp;ldquo;Open Source&amp;rdquo; Label&lt;/h2&gt;
&lt;p&gt;The core issue is not the tool itself, but its claim to be &amp;ldquo;open source.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Lunary has consistently emphasized:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Lunary is an open-source platform for developers.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This statement is great for attracting users, as open source implies transparency, trustworthiness, self-hosting, and community participation.&lt;/p&gt;
&lt;p&gt;But now the repository is gone, with only the website continuing its promotion—raising many questions.&lt;/p&gt;
&lt;p&gt;Lunary is not a niche hobby project, but a commercial company-led initiative. If an individual suddenly deletes a repo, it&amp;rsquo;s not surprising, but for a company operating publicly, this move is extremely rare.&lt;/p&gt;
&lt;p&gt;This is the first time I&amp;rsquo;ve truly seen a reality in the AI DevTools space: &amp;ldquo;Open source&amp;rdquo; is being used as a branding term, not a commitment.&lt;/p&gt;
&lt;h2 id="possible-industry-reasons-for-repo-deletion"&gt;Possible Industry Reasons for Repo Deletion&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s analyze some common industry reasons for deleting a repository to help developers understand the motivations behind such actions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Increased commercial pressure&lt;/strong&gt;: These tools often struggle with sustainable business models, prompting teams to shift to closed-source SaaS.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pivoting&lt;/strong&gt;: The company finds the original direction unprofitable and prepares to change course.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Team changes&lt;/strong&gt;: Acquisition, key member departures, or funding issues can all lead to repo shutdowns.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compliance or legal risks&lt;/strong&gt;: Observability products involve user data, which may require public code to be taken down.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regardless of the reason, the impact on users is the same: it is no longer an &amp;ldquo;open-source product.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-pseudo-open-source-phenomenon-in-ai-tools"&gt;The &amp;ldquo;Pseudo Open Source&amp;rdquo; Phenomenon in AI Tools&lt;/h2&gt;
&lt;p&gt;The most noteworthy aspect is not Lunary itself, but the rapid spread of this phenomenon in the AI tool space.&lt;/p&gt;
&lt;p&gt;Many projects use &amp;ldquo;open source&amp;rdquo; as a user acquisition strategy but lack open governance and long-term commitment.&lt;/p&gt;
&lt;p&gt;High substitutability, homogeneity, and commercial pressure mean these DevTools have low survival rates.&lt;/p&gt;
&lt;p&gt;When commercial teams lead open source, a single decision can make the repository disappear instantly.&lt;/p&gt;
&lt;p&gt;In the cloud native era, we&amp;rsquo;ve already seen a wave of &amp;ldquo;pseudo open source.&amp;rdquo; In the AI era, this trend is accelerating.&lt;/p&gt;
&lt;h2 id="three-practical-lessons-for-developers"&gt;Three Practical Lessons for Developers&lt;/h2&gt;
&lt;p&gt;Based on this case, here are three practical lessons for developers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The &amp;ldquo;open source label&amp;rdquo; does not guarantee trustworthiness&lt;/strong&gt;: Open source projects led by commercial companies without community or foundation backing can be withdrawn at any time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI DevTools are far less stable than infrastructure&lt;/strong&gt;: These tools are not essential, highly replaceable, and have short lifecycles.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool usability should take precedence over &amp;ldquo;open source status&amp;rdquo;&lt;/strong&gt;: Because it may stop being open source at any moment.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="my-first-experience-maintaining-an-ai-project-list-and-facing-repo-deletion"&gt;My First Experience Maintaining an AI Project List and Facing Repo Deletion&lt;/h2&gt;
&lt;p&gt;After collecting hundreds of projects over the past two years, this is the first time I&amp;rsquo;ve encountered a &amp;ldquo;commercial open source project disappearing, official repo 404&amp;rdquo; case.&lt;/p&gt;
&lt;p&gt;To me, this is an industry signal: the AI open source world is entering a period of drift, and commercial projects&amp;rsquo; open source commitments are increasingly unstable.&lt;/p&gt;
&lt;p&gt;It also reminds everyone making technical choices: in the AI era, open source is no longer a label you can automatically trust.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The disappearance of the Lunary repository is not an isolated incident, but a reflection of the &amp;ldquo;pseudo open source&amp;rdquo; phenomenon in the AI tool space. Developers should be cautious about the actual commitments behind the &amp;ldquo;open source&amp;rdquo; label, paying attention to project governance and sustainability. In the future, the boundary between open source and commercial will become even more blurred, and rational judgment and risk awareness will be essential for technical decision-making.&lt;/p&gt;
&lt;p&gt;Lunary&amp;rsquo;s sudden disappearance highlights the instability of open source projects in the AI DevTools space. For developers, technical choices should focus more on project usability and community governance, rather than relying solely on the &amp;ldquo;open source&amp;rdquo; label. As the industry evolves, similar incidents may become more frequent. Only rational judgment and risk awareness can help you stand firm in the fast-changing tech landscape.&lt;/p&gt;</content:encoded></item><item><title>CNCF in the AI Native Era? The Agentic AI Foundation Is Officially Established</title><link>https://jimmysong.io/blog/agentic-ai-foundation-cncf-era/</link><pubDate>Wed, 10 Dec 2025 03:25:38 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/agentic-ai-foundation-cncf-era/</guid><description>An analysis of the background, strategic urgency, differences and division of labor between Agentic AI Foundation (AAIF) and CNCF/CNAI, and its significance for the AI Native era.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The standardization and open collaboration of the agent ecosystem is no longer a luxury, but the critical watershed for whether AI Native can be engineered and implemented.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The establishment of &lt;a href="https://aaif.io/" target="_blank" rel="noopener"&gt;AAIF (Agentic AI Foundation)&lt;/a&gt; is the result of leading vendors staking out the &amp;ldquo;agent protocol layer&amp;rdquo; in advance.&lt;/li&gt;
&lt;li&gt;The real challenge is not technical, but how organizations transition from &amp;ldquo;human execution + AI assistance&amp;rdquo; to &amp;ldquo;agent execution + human supervision&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Successful agent adoption requires a phased adoption path, not just a bunch of protocols and demos.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cncf.io/" target="_blank" rel="noopener"&gt;CNCF&lt;/a&gt; and AAIF are complementary: CNCF manages &amp;ldquo;what infrastructure agents run on&amp;rdquo;, AAIF manages &amp;ldquo;how agents collaborate&amp;rdquo;. This matches the system I am building in &lt;a href="https://arksphere.dev/" target="_blank" rel="noopener"&gt;ArkSphere&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="cloud-native-problems-are-solved-ai-native-problems-are-just-beginning"&gt;Cloud Native Problems Are Solved, AI Native Problems Are Just Beginning&lt;/h2&gt;
&lt;p&gt;Over the past decade, Cloud Native technologies like Kubernetes, Service Mesh, and microservices have standardized &amp;ldquo;how applications run in the cloud&amp;rdquo;.
But AI Native faces a completely different challenge:
&lt;strong&gt;It&amp;rsquo;s not about &amp;ldquo;how to deploy a service&amp;rdquo;, but &amp;ldquo;how many behaviors in the system can be handed over to agents to execute themselves&amp;rdquo;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;CNCF&amp;rsquo;s Cloud Native AI (CNAI) addresses infrastructure-level issues:
&amp;ldquo;How can model training/inference/RAG run at scale and securely on Kubernetes?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But what AI Native truly lacks is another layer:
&lt;strong&gt;How do agents collaborate, access tools, get governed, and audited?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is exactly the gap AAIF aims to fill.&lt;/p&gt;
&lt;h2 id="aaifs-three-weapons-protocol--runtime--development-standard"&gt;AAIF&amp;rsquo;s Three Weapons: Protocol + Runtime + Development Standard&lt;/h2&gt;
&lt;p&gt;AAIF hosts three core technologies contributed by its founding members:&lt;/p&gt;
&lt;h3 id="-anthropics-model-context-protocol-mcp"&gt;① Anthropic&amp;rsquo;s Model Context Protocol (MCP)&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/modelcontextprotocol" target="_blank" rel="noopener"&gt;https://github.com/modelcontextprotocol&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A &amp;ldquo;system call interface for agents&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unified definition for how agents access databases, APIs, files, and external tools.&lt;/li&gt;
&lt;li&gt;Designed to be more like an AI version of gRPC + OAuth.&lt;/li&gt;
&lt;li&gt;Already integrated by Claude, Cursor, ChatGPT, VS Code, Microsoft Copilot, Gemini, and others.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It may not be the flashiest technology, but it could become the plumbing for the entire Agentic ecosystem.&lt;/p&gt;
&lt;h3 id="-blocks-goose-framework"&gt;② Block&amp;rsquo;s Goose Framework&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/block/goose" target="_blank" rel="noopener"&gt;https://github.com/block/goose&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Reference runtime for MCP:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local-first, composable agent workflow engine.&lt;/li&gt;
&lt;li&gt;Enables enterprises to pilot agents in small scopes without betting on a specific vendor.&lt;/li&gt;
&lt;li&gt;Serves as an engineering template for protocol implementation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="-openais-agentsmd"&gt;③ OpenAI&amp;rsquo;s AGENTS.md&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://agents.md/" target="_blank" rel="noopener"&gt;https://agents.md&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A simple but effective standard:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Place an AGENTS.md file in the project repository.&lt;/li&gt;
&lt;li&gt;Clearly document build steps, testing, constraints, and context rules.&lt;/li&gt;
&lt;li&gt;Any agent that understands AGENTS.md can operate the codebase using the same instructions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes agent behavior more predictable and auditable.&lt;/p&gt;
&lt;h2 id="why-is-aaif-in-such-a-hurry-this-is-a-race-for-standards"&gt;Why Is AAIF in Such a Hurry? This Is a Race for Standards&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s compare with history:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes&amp;rsquo; predecessor Borg ran internally at Google for over a decade; K8s was open sourced and donated to CNCF two years later.&lt;/li&gt;
&lt;li&gt;PyTorch joined the Linux Foundation six years after its release.&lt;/li&gt;
&lt;li&gt;MCP was donated to AAIF just &lt;strong&gt;over one year&lt;/strong&gt; after its launch.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AAIF is not about &amp;ldquo;mature technology entering a foundation&amp;rdquo;, but &lt;strong&gt;staking out the key position early&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The reasons are practical:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Prevent agent ecosystem fragmentation&lt;/strong&gt;
Today, there are many competing &amp;ldquo;tool invocation protocols&amp;rdquo;, which could become incompatible silos in three years.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Protocol layer is easier to reach global consensus than model layer&lt;/strong&gt;
Model competition is inevitable, but protocols can be standardized, open sourced, and avoid vendor lock-in.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A necessary move in global tech competition&lt;/strong&gt;
Putting the foundational standards for Agentic AI into the Linux Foundation is both a gesture of cooperation and a strategic move.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="aaif-vs-cncf-not-competition-but-two-pieces-of-the-puzzle"&gt;AAIF vs CNCF: Not Competition, But Two Pieces of the Puzzle&lt;/h2&gt;
&lt;p&gt;CNCF&amp;rsquo;s role:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;What infrastructure do agent workloads run on?&amp;rdquo;
Kubernetes, Service Mesh, observability, AI Gateway, RAG Infra—all at this layer.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;AAIF&amp;rsquo;s role:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;How do agents collaborate, invoke tools, and get governed?&amp;rdquo;
Protocols, runtimes, and behavioral standards—all at this layer.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Analogy:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Responsibilities&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AAIF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Semantic and collaboration layer of Agentic Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CNCF/CNAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Resource and execution layer of AI Native Infra&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: AAIF vs CNCF Comparison
&lt;/figcaption&gt;
&lt;p&gt;This matches the upper semantic and lower infrastructure layers in my &lt;a href="https://arksphere.dev/" target="_blank" rel="noopener"&gt;ArkSphere&lt;/a&gt; architecture diagram.&lt;/p&gt;
&lt;p&gt;In the long run, the two sides will be tightly coupled:
CNCF&amp;rsquo;s KServe, KAgent, and AI Gateway will natively support MCP / AGENTS.md,
AAIF&amp;rsquo;s Runtime will run on Cloud Native infrastructure by default.&lt;/p&gt;
&lt;h2 id="the-real-challenge-not-protocols-but-organizations-and-people"&gt;The Real Challenge: Not Protocols, But Organizations and People&lt;/h2&gt;
&lt;p&gt;Most enterprises will get stuck on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How much responsibility can agents actually take?&lt;/li&gt;
&lt;li&gt;Who is accountable when things go wrong?&lt;/li&gt;
&lt;li&gt;How are audit, SLOs, and compliance defined?&lt;/li&gt;
&lt;li&gt;How is multi-agent collaboration visualized?&lt;/li&gt;
&lt;li&gt;How are tool invocation permissions controlled?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, &lt;strong&gt;agent adoption is not a &amp;ldquo;technical migration&amp;rdquo;, but an &amp;ldquo;organizational migration&amp;rdquo;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If AAIF cannot provide:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Phased adoption methodologies&lt;/li&gt;
&lt;li&gt;Typical organizational migration paths&lt;/li&gt;
&lt;li&gt;Engineering best practices&lt;/li&gt;
&lt;li&gt;Failure cases and anti-patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It will be difficult for AAIF to achieve the industry impact that CNCF did.&lt;/p&gt;
&lt;h2 id="summary-aaif-is-the-moment-when-boundaries-are-drawn"&gt;Summary: AAIF Is the Moment When Boundaries Are Drawn&lt;/h2&gt;
&lt;p&gt;For me, the establishment of AAIF feels like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;The battlefield boundaries of the agent world have finally been drawn. Now it&amp;rsquo;s up to the engineering community to make it work.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;CNCF solved &amp;ldquo;how to run Cloud Native&amp;rdquo;,
AAIF is now trying to solve &amp;ldquo;how agents collaborate&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;In the next five years, whoever can truly connect these two worlds
will stand at the gateway to the next generation of infrastructure.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s why I started a dedicated &amp;ldquo;Agentic Runtime + AI Native Infra&amp;rdquo; research track in &lt;a href="https://arksphere.dev" target="_blank" rel="noopener"&gt;ArkSphere&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="the-three-body-architecture-of-the-ai-native-era"&gt;The &amp;lsquo;Three-Body&amp;rsquo; Architecture of the AI Native Era&lt;/h2&gt;
&lt;p&gt;Finally, a personal note—my thoughts on ArkSphere.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/agentic-ai-foundation-cncf-era/a2b0ea6c87b10fd78607da5d75c4cd1a.svg" data-img="https://assets.jimmysong.io/images/blog/agentic-ai-foundation-cncf-era/a2b0ea6c87b10fd78607da5d75c4cd1a.svg" alt="Figure 1: AAIF × CNCF: Three-Layer Architecture of Agentic AI in the AI Native Era" data-caption="Figure 1: AAIF × CNCF: Three-Layer Architecture of Agentic AI in the AI Native Era"
width="2677"
height="739"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AAIF × CNCF: Three-Layer Architecture of Agentic AI in the AI Native Era&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This diagram shows the three-layer structure of the AI Native era:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CNCF (bottom layer): Provides the Cloud Native foundation required for agent operation, including Kubernetes, Service Mesh, GPU scheduling, and security systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;AAIF (middle layer): Defines the runtime semantics and standards for agents, including the MCP protocol, Goose reference runtime, and AGENTS.md behavioral standard.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ArkSphere (bridging layer): Aligns the &amp;ldquo;Agentic Runtime semantic layer&amp;rdquo; with the &amp;ldquo;AI Native Infra infrastructure layer&amp;rdquo;, forming an engineerable agent architecture standard.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;p&gt;Infra is responsible for &amp;ldquo;running&amp;rdquo;, Runtime for &amp;ldquo;how to act&amp;rdquo;, and ArkSphere for &amp;ldquo;how to assemble a system&amp;rdquo;.&lt;/p&gt;</content:encoded></item><item><title>KCD Beijing + vLLM 2026: Kubernetes × AI × LLM Inference, A Community-Driven Tech Event</title><link>https://jimmysong.io/notice/kcd-beijing-2026/</link><pubDate>Fri, 05 Dec 2025 18:46:36 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/notice/kcd-beijing-2026/</guid><description>KCD Beijing + vLLM 2026: Kubernetes × AI × LLM Inference, A Community-Driven Tech Event</description><content:encoded>
&lt;p&gt;As Kubernetes becomes the de facto standard for AI infrastructure, and as large language model inference enters engineering and scaling stages, Cloud Native and AI are truly converging.&lt;/p&gt;
&lt;p&gt;Therefore, KCD Beijing and the vLLM community have decided to do something together: bring the Kubernetes community and the LLM inference community to the same stage. KCD Beijing + vLLM 2026 is officially launching 🚀&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/kcd-beijing.webp" data-img="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/kcd-beijing.webp" alt="KCD Beijing logo" data-caption="KCD Beijing logo"
width="1064"
height="1034"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;KCD Beijing logo&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="about-kcd-beijing--vllm-2026"&gt;About KCD Beijing + vLLM 2026&lt;/h2&gt;
&lt;p&gt;Kubernetes Community Days (KCD) is a CNCF-sponsored, community-organized Kubernetes technology conference that emphasizes community-driven collaboration, real-world practices, and engineering experience sharing.&lt;/p&gt;
&lt;p&gt;KCD Beijing + vLLM 2026 will be co-hosted by the KCD Beijing community and the vLLM community, representing a deep collaborative partnership between the Cloud Native community and the LLM inference community.&lt;/p&gt;
&lt;p&gt;We hope this will be more than just a conference—it&amp;rsquo;s a technical connection around these themes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Real-world experience with Kubernetes in production&lt;/li&gt;
&lt;li&gt;Systematic thinking about AI/ML Infra and AI-related practices&lt;/li&gt;
&lt;li&gt;vLLM engineering practices in LLM inference&lt;/li&gt;
&lt;li&gt;Complete chain from cluster scheduling to model serving&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="event-details"&gt;Event Details&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Event Name&lt;/strong&gt;: KCD Beijing + vLLM 2026&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Date&lt;/strong&gt;: March 21, 2026&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Co-organizers&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Kubernetes Community Days Beijing&lt;/li&gt;
&lt;li&gt;vLLM Community&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="three-parallel-sessions--complete-technical-stack-coverage"&gt;Three Parallel Sessions · Complete Technical Stack Coverage&lt;/h2&gt;
&lt;p&gt;This conference will feature three parallel sessions, forming a complete loop from infrastructure to model inference:&lt;/p&gt;
&lt;h3 id="-kubernetes-track"&gt;☸️ Kubernetes Track&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes production practices&lt;/li&gt;
&lt;li&gt;Platform engineering / multi-cluster governance&lt;/li&gt;
&lt;li&gt;Networking, storage, security, scheduling&lt;/li&gt;
&lt;li&gt;CNCF ecosystem project experiences&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="-ai--ml-track"&gt;🤖 AI / ML Track&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;AI / ML Infra architecture design&lt;/li&gt;
&lt;li&gt;GPU / heterogeneous computing scheduling&lt;/li&gt;
&lt;li&gt;Training and inference platform construction&lt;/li&gt;
&lt;li&gt;AI-related technology practices&lt;/li&gt;
&lt;li&gt;Integration of AI with Cloud Native&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="-vllm-track"&gt;🚀 vLLM Track&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;vLLM architecture and core principles&lt;/li&gt;
&lt;li&gt;High-performance LLM inference practices&lt;/li&gt;
&lt;li&gt;vLLM + Kubernetes deployment cases&lt;/li&gt;
&lt;li&gt;Inference performance optimization and resource management&lt;/li&gt;
&lt;li&gt;vLLM ecosystem and future directions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you&amp;rsquo;re running AI workloads on Kubernetes, if you&amp;rsquo;re using vLLM to build inference services, if you&amp;rsquo;re researching and using AI-related technologies, then this is your stage.&lt;/p&gt;
&lt;h2 id="call-for-proposals-cfp"&gt;Call for Proposals (CFP)&lt;/h2&gt;
&lt;p&gt;We sincerely invite engineers, architects, maintainers, and community contributors to submit proposals.&lt;/p&gt;
&lt;h3 id="session-formats"&gt;Session Formats&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Standard Presentation&lt;/strong&gt;: 30 minutes · solo or duo speakers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lightning Talk&lt;/strong&gt;: 10 minutes · quick, direct, focused on a single point&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We value real-world experience, actual problems, and clear thinking over &amp;ldquo;perfect stories.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="important-dates"&gt;Important Dates&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CFP Opens&lt;/strong&gt;: December 5, 2025&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CFP Closes&lt;/strong&gt;: February 24, 2026&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Acceptance Notification&lt;/strong&gt;: Within 1–2 weeks after the deadline&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="how-to-submit-a-proposal"&gt;How to Submit a Proposal&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Prepare a clear abstract&lt;/strong&gt;: What will you talk about? Who is it for? What will attendees gain?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Choose your track and session format&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Submit your proposal&lt;/strong&gt;: &lt;a href="https://sessionize.com/kcd-beijing-2026/" target="_blank" rel="noopener"&gt;https://sessionize.com/kcd-beijing-2026/&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="thank-you-to-our-sponsors-and-partners"&gt;Thank You to Our Sponsors and Partners&lt;/h2&gt;
&lt;p&gt;KCD Beijing + vLLM 2026 would not be possible without the support of our sponsors and community partners.&lt;/p&gt;
&lt;p&gt;Special thanks (in no particular order):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AWS&lt;/li&gt;
&lt;li&gt;Red Hat&lt;/li&gt;
&lt;li&gt;Huawei&lt;/li&gt;
&lt;li&gt;Ant Open Source&lt;/li&gt;
&lt;li&gt;OceanBase&lt;/li&gt;
&lt;li&gt;KubeEvents&lt;/li&gt;
&lt;li&gt;AtomGit&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Your continued investment in the Cloud Native and AI open source ecosystem enables our community to go further.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Detailed information about sponsors and partners will be released in the next article. Stay tuned.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="why-should-you-participate"&gt;Why Should You Participate?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;This is a co-hosted event by KCD Beijing × vLLM communities&lt;/li&gt;
&lt;li&gt;This is a direct convergence of Kubernetes and LLM inference&lt;/li&gt;
&lt;li&gt;This is an engineering practice-focused community conference&lt;/li&gt;
&lt;li&gt;This is a stage where frontline engineers&amp;rsquo; voices are heard&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Whether you are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes / Platform Engineer&lt;/li&gt;
&lt;li&gt;AI / ML Infra Developer&lt;/li&gt;
&lt;li&gt;vLLM user or contributor&lt;/li&gt;
&lt;li&gt;Long-term participant in Cloud Native or AI communities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;KCD Beijing + vLLM 2026 welcomes you.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Register to attend&lt;/strong&gt;: &lt;a href="https://www.bagevent.com/event/kcd-beijing-2026" target="_blank" rel="noopener"&gt;https://www.bagevent.com/event/kcd-beijing-2026&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Please fill out the form below to apply as a volunteer.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/volunteer.webp" data-img="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/volunteer.webp" alt="Volunteer Application" data-caption="Volunteer Application"
width="1206"
height="1528"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Volunteer Application&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="meet-the-kcd-beijing-organizers"&gt;Meet the KCD Beijing Organizers&lt;/h2&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/kcd-beijing-organizers.webp" data-img="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/kcd-beijing-organizers.webp" alt="KCD Beijing Organizers" data-caption="KCD Beijing Organizers"
width="1286"
height="668"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;KCD Beijing Organizers&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Miley Fu&lt;/strong&gt;: WasmEdge DevRel, CNCF Ambassador&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Jintao Zhang&lt;/strong&gt;: CNCF Ambassador, Kong&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Iceber Gu&lt;/strong&gt;: CNCF Ambassador, DaoCloud&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Jimmy Song&lt;/strong&gt;: CNCF Ambassador, VP of Open Source Ecosystem at Dynamia.ai&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dinah Zhang&lt;/strong&gt;: OceanBase DevRel&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Qing Hao&lt;/strong&gt;: CNCF Ambassador, Red Hat&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zhenyu Zheng&lt;/strong&gt;: Senior Engineer at Huawei, Head of openEuler Operations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shuangkun Tian&lt;/strong&gt;: Argo Maintainer, Alibaba Cloud&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Addo Zhang&lt;/strong&gt;: CNCF Ambassador&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Betty Zheng&lt;/strong&gt;: Senior Developer Advocate, AWS&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stay tuned for further updates. For sponsorship, open source booths, and other collaborations, please contact:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="mailto:qhao@redhat.com"&gt;qhao@redhat.com&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="mailto:furao@secondstate.io"&gt;furao@secondstate.io&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="mailto:zhangjintao9020@gmail.com"&gt;zhangjintao9020@gmail.com&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;Please fill out the form below to apply as a partner community.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/cooperation.webp" data-img="https://assets.jimmysong.io/images/notice/kcd-beijing-2026/cooperation.webp" alt="Partner Community Application" data-caption="Partner Community Application"
width="1206"
height="1528"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Partner Community Application&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;hr&gt;
&lt;p&gt;Please share this with your colleagues and friends, and we welcome you to take the stage yourself.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CFP Open · Speaker Recruitment · Community Building&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;See you in Beijing on March 21, 2026.&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title>Bun Acquired by Anthropic: A Structural Signal for AI-Native Runtimes</title><link>https://jimmysong.io/blog/bun-anthropic-runtime-shift/</link><pubDate>Wed, 03 Dec 2025 05:21:28 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/bun-anthropic-runtime-shift/</guid><description>Bun&amp;#39;s acquisition by Anthropic marks the first time a general-purpose language runtime is integrated into a large model engineering system, revealing a structural trend for AI-native runtimes.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The shifting ownership of runtimes is reshaping the underlying logic of AI programming and infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After the &lt;a href="https://bun.com/blog/bun-joins-anthropic" target="_blank" rel="noopener"&gt;announcement of Bun&amp;rsquo;s acquisition by Anthropic&lt;/a&gt;, my focus was not on the deal itself, but on the structural signal it revealed: general-purpose language runtimes are now being drawn into the path dependencies of AI programming systems. This is not just &amp;ldquo;a JS project finding a home,&amp;rdquo; but &amp;ldquo;the first time a language runtime has been actively integrated into the unified engineering system of a leading large model company.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This event deserves a deeper analysis.&lt;/p&gt;
&lt;h2 id="buns-engineering-features-and-current-status"&gt;Bun&amp;rsquo;s Engineering Features and Current Status&lt;/h2&gt;
&lt;p&gt;Before examining &lt;a href="https://bun.com" target="_blank" rel="noopener"&gt;Bun&lt;/a&gt;&amp;rsquo;s industry significance, let&amp;rsquo;s outline its runtime characteristics. The following list summarizes Bun&amp;rsquo;s main engineering capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High-performance JavaScript/TypeScript runtime&lt;/li&gt;
&lt;li&gt;Built-in bundler, test framework, and package manager&lt;/li&gt;
&lt;li&gt;Single-file executable&lt;/li&gt;
&lt;li&gt;Extremely fast cold start&lt;/li&gt;
&lt;li&gt;Node compatibility without Node&amp;rsquo;s legacy dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities have formed measurable performance barriers.&lt;/p&gt;
&lt;p&gt;However, it should be noted that Bun currently lacks the core attributes of an AI Runtime, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Permission model&lt;/li&gt;
&lt;li&gt;Tool isolation&lt;/li&gt;
&lt;li&gt;Capability declaration protocol&lt;/li&gt;
&lt;li&gt;Execution semantics understandable by models&lt;/li&gt;
&lt;li&gt;Sandbox execution environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, Bun&amp;rsquo;s &amp;ldquo;AI Native&amp;rdquo; properties have not yet been established, but Anthropic&amp;rsquo;s acquisition provides an opportunity for it to evolve in this direction.&lt;/p&gt;
&lt;h2 id="the-significance-of-a-leading-model-company-acquiring-a-general-purpose-runtime"&gt;The Significance of a Leading Model Company Acquiring a General-Purpose Runtime&lt;/h2&gt;
&lt;p&gt;Historically, it is not uncommon for model companies to acquire editors, plugins, or IDEs, but in known public cases, mainstream large model vendors have never directly acquired a mature general-purpose language runtime. Bun × Anthropic is the first clear event pulling the runtime into the AI programming system landscape. This move sends two engineering-level signals:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The speed of AI code generation continues to increase, amplifying the need for deterministic execution environments. The generate→execute→validate→destroy cycle intensifies the problem of environment non-repeatability.&lt;/li&gt;
&lt;li&gt;Models require a &amp;ldquo;controllable execution substrate&amp;rdquo; rather than a traditional operating system. Agents are not suited to run tools in an uncontrollable, unpredictable OS layer.&lt;/li&gt;
&lt;li&gt;The runtime needs to be embedded into the model&amp;rsquo;s internal engineering pipeline. Future IDEs, agents, and auto-repair pipelines may directly invoke the runtime&amp;rsquo;s API.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not a short-term business integration, but a manifestation of the trend toward compressed engineering pipelines.&lt;/p&gt;
&lt;h2 id="runtime-requirements-differentiation-in-the-ai-coding-era"&gt;Runtime Requirements Differentiation in the AI Coding Era&lt;/h2&gt;
&lt;p&gt;Based on observations of agentic runtimes over the past year, runtime requirements in the AI coding era are diverging. The following list summarizes the main engineering abstractions trending in this space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Determinism: AI-generated code is not reviewed line by line; execution results must be consistent across machines and over time.&lt;/li&gt;
&lt;li&gt;Minimal distribution unit: Users no longer install language environments and numerous dependencies. Verifiable, replicable, and portable single execution units are becoming the norm.&lt;/li&gt;
&lt;li&gt;Tool isolation: Models cannot directly access all OS capabilities; the context and permissions visible to tools must be strictly defined.&lt;/li&gt;
&lt;li&gt;Short-lived execution: Agent invocation patterns resemble &amp;ldquo;batch jobs&amp;rdquo; rather than long-running services.&lt;/li&gt;
&lt;li&gt;Capability declaration: The runtime must expose &amp;ldquo;what I can do,&amp;rdquo; rather than the entire OS interface.&lt;/li&gt;
&lt;li&gt;Embeddable self-testing pipeline: After generating code, models need to immediately execute tests, collect errors, and iterate. The runtime must provide observability and diagnostic primitives.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These requirements are not unique to Bun, nor did Bun originate them, but Bun&amp;rsquo;s &amp;ldquo;monolithic and controllable&amp;rdquo; runtime structure is more conducive to evolving in this direction.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/bun-anthropic-runtime-shift/527ff200956d6b73178a0e3521f42fc2.svg" data-img="https://assets.jimmysong.io/images/blog/bun-anthropic-runtime-shift/527ff200956d6b73178a0e3521f42fc2.svg" alt="Figure 1: Minimal execution loop of an AI-native runtime" data-caption="Figure 1: Minimal execution loop of an AI-native runtime"
width="599"
height="1105"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Minimal execution loop of an AI-native runtime&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="buns-potential-role-within-anthropics-system"&gt;Bun&amp;rsquo;s Potential Role Within Anthropic&amp;rsquo;s System&lt;/h2&gt;
&lt;p&gt;If Bun is seen merely as a Node.js replacement, the acquisition is of limited significance. But if it is viewed as the execution foundation for future AI coding systems, the logic becomes clearer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code is generated by models&lt;/li&gt;
&lt;li&gt;Building is handled by the runtime&amp;rsquo;s built-in toolchain&lt;/li&gt;
&lt;li&gt;Testing, validation, and repair are performed by models repeatedly invoking the runtime&lt;/li&gt;
&lt;li&gt;All execution behaviors are defined by the runtime&amp;rsquo;s semantics&lt;/li&gt;
&lt;li&gt;The runtime forms Anthropic&amp;rsquo;s internal &amp;ldquo;minimal stable layer&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This model is similar to the relationship between Chrome and V8: the execution engine and upper-layer system co-evolve over time, with performance and semantics advancing in sync.&lt;/p&gt;
&lt;p&gt;Whether Bun can fulfill this role depends on Anthropic&amp;rsquo;s architectural choices, but the event itself has opened up possibilities in this direction.&lt;/p&gt;
&lt;h2 id="industry-trends-and-future-evolution"&gt;Industry Trends and Future Evolution&lt;/h2&gt;
&lt;p&gt;Combining facts, signals, and engineering trends, the following directions can be anticipated:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &amp;ldquo;Agent Runtime&amp;rdquo; category will gradually become more defined&lt;/li&gt;
&lt;li&gt;The boundaries between bundler, runtime, and test runner will continue to blur&lt;/li&gt;
&lt;li&gt;Cloud vendors will launch controllable runtimes with capability declarations&lt;/li&gt;
&lt;li&gt;Permission models and secure sandboxes will move down to the language runtime layer&lt;/li&gt;
&lt;li&gt;Runtimes will become part of the model toolchain, rather than an external environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These trends will not all materialize in the short term, but they represent the inevitable path of engineering evolution.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The combination of Bun × Anthropic is not about &amp;ldquo;an open-source project being absorbed,&amp;rdquo; but about a language runtime being actively integrated into the engineering pipeline of a large model system for the first time. Competition at the model layer will continue, but what truly reshapes software is the structural transformation of AI-native runtimes. This is a foundational change worth long-term attention.&lt;/p&gt;</content:encoded></item><item><title>Agentic Runtime Realism: Insights from McKinsey Ark on 2026 Infrastructure Trends</title><link>https://jimmysong.io/blog/agentic-runtime-realism/</link><pubDate>Tue, 02 Dec 2025 12:07:45 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/agentic-runtime-realism/</guid><description>Analyzing Ark from architecture, semantics, community activity, and engineering paradigms to reveal its impact on 2026 AI Infra trends and the ArkSphere community.</description><content:encoded>
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Statement
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
ArkSphere has no affiliation or association with McKinsey Ark.
&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;The value of Agentic Runtime lies not in unified interfaces, but in semantic governance and the transformation of engineering paradigms. Ark is just a reflection of the trend; the future belongs to governable Agentic Workloads.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently, the &lt;a href="https://jimmysong.io/en/community"&gt;ArkSphere community&lt;/a&gt; has been focusing on McKinsey&amp;rsquo;s open-source &lt;a href="https://github.com/mckinsey/agents-at-scale-ark" target="_blank" rel="noopener"&gt;Ark&lt;/a&gt; (Agentic Runtime for Kubernetes). Although the project is still in technical preview, its architecture and semantic model have already become key indicators for the direction of AI Infra in 2026.&lt;/p&gt;
&lt;p&gt;This article analyzes the engineering paradigm and semantic model of Ark, highlighting its industry implications. It avoids repeating the reasons for the failure of unified model APIs and generic infrastructure logic, instead focusing on the unique perspective of the ArkSphere community.&lt;/p&gt;
&lt;h2 id="arks-semantic-model-and-engineering-paradigm"&gt;Ark&amp;rsquo;s Semantic Model and Engineering Paradigm&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s greatest value is in making Agents first-class citizens in Kubernetes, achieving closed-loop tasks through CRD (Custom Resource Definition) and controllers (Reconcilers). This semantic abstraction not only enhances governance capabilities but also aligns closely with the Agentic Runtime strategies of major cloud providers.&lt;/p&gt;
&lt;p&gt;Ark&amp;rsquo;s main resources include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agent (inference entity)&lt;/li&gt;
&lt;li&gt;Model (model selection and configuration)&lt;/li&gt;
&lt;li&gt;Tools (capability plugins/MCP, Model Capability Plugin)&lt;/li&gt;
&lt;li&gt;Team (multi-agent collaboration)&lt;/li&gt;
&lt;li&gt;Query (task lifecycle)&lt;/li&gt;
&lt;li&gt;Evaluation (assessment)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The diagram below illustrates the semantic relationships in Agentic Runtime:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/2d76bfcb312694080bd94942b084f210.svg" data-img="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/2d76bfcb312694080bd94942b084f210.svg" alt="Figure 1: Agentic Runtime Semantic Relationships" data-caption="Figure 1: Agentic Runtime Semantic Relationships"
width="1450"
height="589"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Agentic Runtime Semantic Relationships&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="architecture-and-community-activity"&gt;Architecture and Community Activity&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s architecture adopts a standard control plane system, emphasizing unified runtime semantics. The community is highly active, engineer-driven, and the codebase is well-structured, though production readiness is still being improved.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/f481241843db17b6e6172e8093a1daa6.svg" data-img="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/f481241843db17b6e6172e8093a1daa6.svg" alt="Figure 2: Ark Architecture and Control Plane Flow" data-caption="Figure 2: Ark Architecture and Control Plane Flow"
width="4130"
height="565"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Ark Architecture and Control Plane Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="arkspheres-boundaries-and-inspirations"&gt;ArkSphere&amp;rsquo;s Boundaries and Inspirations&lt;/h2&gt;
&lt;p&gt;The emergence of Ark has clarified the boundaries of ArkSphere. ArkSphere does not aim for unified model interfaces, multi-cloud abstraction, a collection of miscellaneous tools, or a comprehensive framework layer. Instead, it focuses on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The semantic system of Agentic Runtime (tasks, states, tool invocation, collaboration graphs, etc.)&lt;/li&gt;
&lt;li&gt;Enterprise-grade runtime governance models (permissions, auditing, isolation, multi-tenancy, compliance, cost tracking)&lt;/li&gt;
&lt;li&gt;Integration capabilities for domestic ecosystem tools&lt;/li&gt;
&lt;li&gt;Engineering paradigms from a runtime perspective&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ArkSphere is an ecosystem and engineering system at the runtime level, not a &amp;ldquo;model abstraction layer&amp;rdquo; or an &amp;ldquo;agent development framework.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="key-changes-in-2026"&gt;Key Changes in 2026&lt;/h2&gt;
&lt;p&gt;2026 will usher in the era of Agentic Runtime, where Agents are no longer just classes but workloads that require governance rather than mere importation. Ark is just one example of this trend, and the direction is clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Semantic models and governability become highlights&lt;/li&gt;
&lt;li&gt;Closed-loop tasks are the core value&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s realism teaches us that the future belongs to runtime, semantics, governability, and workload-level Agents. The industry will no longer pursue unified APIs or framework implementations, but will focus on governable runtime semantics and engineering paradigms.&lt;/p&gt;</content:encoded></item><item><title>In-Depth Analysis of Ark: Kubernetes for the AI Era or a New Engineering Paradigm Shift?</title><link>https://jimmysong.io/blog/ark-agentic-runtime-analysis/</link><pubDate>Tue, 02 Dec 2025 10:54:34 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ark-agentic-runtime-analysis/</guid><description>Analysis of McKinsey&amp;#39;s Ark project: architecture, CRDs, control plane, design paradigms, production readiness, and implications for ArkSphere and AI infrastructure.</description><content:encoded>
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Statement
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
ArkSphere has no affiliation or association with McKinsey Ark.
&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;The greatest value of Ark lies in reshaping engineering paradigms, not just its features. It points the way for AI Infra and leaves vast space for community ecosystems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently, many members in our &lt;a href="https://arksphere.dev/" target="_blank" rel="noopener"&gt;ArkSphere community&lt;/a&gt; have started exploring McKinsey&amp;rsquo;s open-source &lt;a href="https://github.com/mckinsey/agents-at-scale-ark" target="_blank" rel="noopener"&gt;Ark (Agentic Runtime for Kubernetes)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some see it as radical, some think it&amp;rsquo;s just a consulting firm&amp;rsquo;s experiment, and others quote a realistic maxim:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What we need now is &amp;ldquo;agentic runtime realism,&amp;rdquo; not &amp;ldquo;unified model romanticism.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I strongly agree with this sentiment.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve spent some time analyzing Ark&amp;rsquo;s source code, architecture, and design philosophy, combined with our community discussions. My conclusion is:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ark&amp;rsquo;s significance is not in its features, but in its paradigm.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It&amp;rsquo;s not the answer, but it points toward the future of AI Infra.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Below is my interpretation of Ark, focusing on engineering, architecture, trends, and its inspiration for ArkSphere.&lt;/p&gt;
&lt;h2 id="what-exactly-is-ark"&gt;What Exactly Is Ark?&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s core positioning is: &lt;strong&gt;A runtime that treats Agents as Kubernetes Workloads.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not a framework, not an SDK, not an AutoGen-style multi-agent tool, but a complete system including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Control plane (Controller)&lt;/li&gt;
&lt;li&gt;Custom resource models (CRD, Custom Resource Definition)&lt;/li&gt;
&lt;li&gt;API service&lt;/li&gt;
&lt;li&gt;Dashboard&lt;/li&gt;
&lt;li&gt;CLI&lt;/li&gt;
&lt;li&gt;Python SDK&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Essentially, Ark is the &lt;strong&gt;control plane for Agents&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Ark defines seven core CRDs in Kubernetes. The following flowchart shows the relationships among these resources:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/df35874e6886350db30fdf036a118099.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/df35874e6886350db30fdf036a118099.svg" alt="Figure 1: Ark CRD Resource Relationships" data-caption="Figure 1: Ark CRD Resource Relationships"
width="816"
height="557"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Ark CRD Resource Relationships&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Through this set of CRDs, Ark makes Agent systems resource-oriented and declarative, enabling capabilities such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lifecycle management&lt;/li&gt;
&lt;li&gt;Multi-tenant isolation&lt;/li&gt;
&lt;li&gt;RBAC (Role-Based Access Control)&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Upgradability&lt;/li&gt;
&lt;li&gt;Extensibility (tools, models, MCP)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, Ark is not about &amp;ldquo;how to write Agents,&amp;rdquo; but &amp;ldquo;how to operate Agents in enterprise-grade systems.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="three-layer-architecture-mixed-languages-and-components-but-a-complete-system"&gt;Three-Layer Architecture: Mixed Languages and Components, but a Complete System&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s overall architecture is divided into three layers, each with different tech stacks and responsibilities. The following flowchart illustrates the relationships among components in each layer:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/f6ccd732f54e5aa2a0ca3f6283103eb3.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/f6ccd732f54e5aa2a0ca3f6283103eb3.svg" alt="Figure 2: Ark Three-Layer Architecture Components" data-caption="Figure 2: Ark Three-Layer Architecture Components"
width="1532"
height="806"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Ark Three-Layer Architecture Components&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is not a &amp;ldquo;wrapper project,&amp;rdquo; but a fully operational AI Runtime system, with a level of engineering far beyond most agent frameworks on the market.&lt;/p&gt;
&lt;h2 id="is-it-the-kubernetes-of-the-ai-era"&gt;Is It the Kubernetes of the AI Era?&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s revisit Kubernetes&amp;rsquo; core value:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes was never about &amp;ldquo;unifying cloud APIs&amp;rdquo;; it unified the &amp;ldquo;application runtime model.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Cloud provider APIs aren&amp;rsquo;t unified, nor are networking or storage. What is unified: Pod, Deployment, Service—these application models.&lt;/p&gt;
&lt;p&gt;Kubernetes succeeded because:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It provides a stable application abstraction on top of diversity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ark&amp;rsquo;s goal is not to unify all large language models (LLMs), MCPs, or tool formats, but rather:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent resource model (CRD) + control plane (Reconciler) + lifecycle.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From this perspective, Ark offers a prototype of a &amp;ldquo;declarative application model&amp;rdquo; for the AI era.&lt;/p&gt;
&lt;p&gt;Whether it will become &amp;ldquo;Kubernetes for AI&amp;rdquo; is still too early to say, but it has already planted a seed.&lt;/p&gt;
&lt;h2 id="comparison-with-other-frameworks-not-on-the-same-level"&gt;Comparison with Other Frameworks: Not on the Same Level&lt;/h2&gt;
&lt;p&gt;Current mainstream agent frameworks like LangChain, CrewAI, AutoGen, MetaGPT, etc., address problems fundamentally different from Ark.&lt;/p&gt;
&lt;p&gt;The table below compares the positioning and limitations of each framework:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;What Problem Does It Solve&lt;/th&gt;
&lt;th&gt;Core Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;td&gt;Agent/Tool composition&lt;/td&gt;
&lt;td&gt;Doesn&amp;rsquo;t address deployment or governance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen&lt;/td&gt;
&lt;td&gt;Multi-agent conversations&lt;/td&gt;
&lt;td&gt;Lacks control plane and lifecycle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;Workflow-style multi-agent&lt;/td&gt;
&lt;td&gt;Missing scheduling, RBAC, resource model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MetaGPT&lt;/td&gt;
&lt;td&gt;Agent SOP&lt;/td&gt;
&lt;td&gt;Just execution logic, not a platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenDevin&lt;/td&gt;
&lt;td&gt;AI IDE/Dev Assistant&lt;/td&gt;
&lt;td&gt;Not an Agent Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ark&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Agent control plane + resource system&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Functionality not yet mature&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Mainstream Agent Frameworks vs. Ark
&lt;/figcaption&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Other tools focus on &amp;ldquo;how to write Agents.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Ark focuses on &amp;ldquo;how Agents run, schedule, govern, observe, and extend.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;rsquo;s an architectural difference.&lt;/p&gt;
&lt;h2 id="execution-flow-agents-scheduled-like-pods"&gt;Execution Flow: Agents Scheduled Like Pods&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s execution flow closely resembles the Kubernetes controller model. The following sequence diagram shows the core process:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/1ce387835a38f3380734332ea9e769f7.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/1ce387835a38f3380734332ea9e769f7.svg" alt="Figure 3: Ark Agent Execution Flow" data-caption="Figure 3: Ark Agent Execution Flow"
width="961"
height="553"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Ark Agent Execution Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;You can see Ark&amp;rsquo;s process logic is transparent, with a clear engineering path, bringing agent systems into a &amp;ldquo;controllable&amp;rdquo; state.&lt;/p&gt;
&lt;h2 id="production-readiness-right-direction-still-a-tech-preview"&gt;Production Readiness: Right Direction, Still a Tech Preview&lt;/h2&gt;
&lt;p&gt;According to official notes and code maturity, Ark currently offers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Runnable&lt;/li&gt;
&lt;li&gt;Learnable&lt;/li&gt;
&lt;li&gt;Extensible&lt;/li&gt;
&lt;li&gt;But not recommended for large-scale production use yet&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Main reasons include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CRD structures may change&lt;/li&gt;
&lt;li&gt;APIs are not yet stable&lt;/li&gt;
&lt;li&gt;MCP ecosystem is still forming&lt;/li&gt;
&lt;li&gt;Memory service is still basic&lt;/li&gt;
&lt;li&gt;Multi-agent team execution strategies are primitive&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the engineering system is already taking shape, which is crucial.&lt;/p&gt;
&lt;h2 id="community-activity-small-but-elite-strong-mckinsey-drive"&gt;Community Activity: Small but Elite, Strong McKinsey Drive&lt;/h2&gt;
&lt;p&gt;From GitHub data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stars: 222&lt;/li&gt;
&lt;li&gt;Forks: 50&lt;/li&gt;
&lt;li&gt;Contributors: 48&lt;/li&gt;
&lt;li&gt;Commit frequency is steady&lt;/li&gt;
&lt;li&gt;The vast majority of contributions come from within McKinsey&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Note: Data as of December 2, 2025.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;High stability, but limited openness.&lt;/p&gt;
&lt;p&gt;This is also ArkSphere&amp;rsquo;s opportunity:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The paradigm is right, but the ecosystem needs community-driven growth.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="trends-for-2026-from-framework-era-to-runtime-era"&gt;Trends for 2026: From Framework Era to Runtime Era&lt;/h2&gt;
&lt;p&gt;After deep analysis, I&amp;rsquo;m increasingly convinced:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2023–2024: Large model API call era&lt;/li&gt;
&lt;li&gt;2024–2025: Agent framework era&lt;/li&gt;
&lt;li&gt;2025–2027: Agent Runtime / Control Plane era (Ark&amp;rsquo;s direction)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While everyone is writing Python scripts for agents, the real value lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-agent task scheduling&lt;/li&gt;
&lt;li&gt;Tool registration and governance&lt;/li&gt;
&lt;li&gt;Session/Memory lifecycle&lt;/li&gt;
&lt;li&gt;Result reproducibility&lt;/li&gt;
&lt;li&gt;RBAC, auditing, tenant isolation&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Enterprise internal personalized agent systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ark is providing a practical path forward.&lt;/p&gt;
&lt;h2 id="inspiration-for-arksphere"&gt;Inspiration for ArkSphere&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s inspiration for ArkSphere is both critical and direct:&lt;/p&gt;
&lt;h3 id="arksphere-should-focus-on-paradigm-building-not-feature-stacking"&gt;ArkSphere Should Focus on &amp;ldquo;Paradigm Building,&amp;rdquo; Not &amp;ldquo;Feature Stacking&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Ark offers a prototype for future Agentic Runtime:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Resource model&lt;/li&gt;
&lt;li&gt;Control plane&lt;/li&gt;
&lt;li&gt;Tool registration&lt;/li&gt;
&lt;li&gt;Multi-agent collaboration&lt;/li&gt;
&lt;li&gt;Evaluation and governance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ArkSphere&amp;rsquo;s role should be:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Aggregate paradigms, produce standards, incubate ecosystems, not rewrite Ark itself.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is the &amp;ldquo;CNCF (Cloud Native Computing Foundation) for the AI-native era.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="huge-potential-for-localization-in-china"&gt;Huge Potential for Localization in China&lt;/h3&gt;
&lt;p&gt;Localization opportunities include but are not limited to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Integration with domestic large language models (e.g., Qwen, DeepSeek, Zhipu)&lt;/li&gt;
&lt;li&gt;Enterprise privatization scenarios&lt;/li&gt;
&lt;li&gt;Local tool/MCP discovery ecosystem&lt;/li&gt;
&lt;li&gt;Multi-cluster/edge inference&lt;/li&gt;
&lt;li&gt;Enterprise-grade RBAC, auditing, data isolation&lt;/li&gt;
&lt;li&gt;AgentSpec enhancements for industrial scenarios&lt;/li&gt;
&lt;li&gt;Enhanced versions of Runtime/Controller&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ark solves the &amp;ldquo;model,&amp;rdquo; while ArkSphere can solve the &amp;ldquo;ecosystem.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="what-we-need-is-not-kubernetes-for-the-llm-era-but-an-industry-grade-cognition-system-for-ai-runtime"&gt;What We Need Is Not &amp;ldquo;Kubernetes for the LLM Era,&amp;rdquo; But an &amp;ldquo;Industry-Grade Cognition System for AI Runtime&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;The biggest takeaway from dissecting Ark:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The future of AI-native is not a pile of tools, but an engineering system.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;ArkSphere can be the initiator of this system.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Ark is not a &amp;ldquo;universal runtime,&amp;rdquo; nor is it the &amp;ldquo;ultimate Kubernetes for the AI era.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But it has done one crucial thing right:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It abstracts all the pain points people faced when writing Python agent scripts into Kubernetes resources and controllers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It represents engineering, not just a demo.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not mature yet, but it&amp;rsquo;s heading in the right direction.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not the end, but it gives us a clear roadmap.&lt;/p&gt;
&lt;p&gt;For the ArkSphere community I&amp;rsquo;m running, Ark provides a clear inspiration:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The future belongs to Runtime, to Control Plane, to governable agent systems.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;And the ones who can truly scale this system are not McKinsey, but the community.&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title>ArkSphere Community Launch</title><link>https://jimmysong.io/notice/announcement-arksphere-community/</link><pubDate>Sun, 30 Nov 2025 16:39:08 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/notice/announcement-arksphere-community/</guid><description>ArkSphere Community launches for developers building AI Infrastructure, runtimes, and agent systems. Focused on open-source, verifiable, and evolvable solutions.</description><content:encoded>
&lt;p&gt;ArkSphere Community is now officially launched, built for developers working on AI Infrastructure, runtime systems, and intelligent agent execution environments. The goal is not discussion for its own sake, but the construction of verifiable, evolvable, and running open-source systems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core Focus&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;AI-native runtime and agent execution layer&lt;/li&gt;
&lt;li&gt;GPU, inference serving, and distributed execution architecture&lt;/li&gt;
&lt;li&gt;AI infrastructure stack and production-grade engineering patterns&lt;/li&gt;
&lt;li&gt;Open-source prototypes and specification-driven development&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://arksphere.dev/community/" target="_blank" rel="noopener"&gt;https://arksphere.dev/community/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Statement&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;ArkSphere grows from cloud-native experience, but it is not a continuation — it is a forward move. Past work remains accessible, future work is built here.&lt;/p&gt;
&lt;p&gt;Participation means contribution: design proposals, architecture validation, OSS mapping, implementation work, and real systems running at scale.&lt;/p&gt;</content:encoded></item><item><title>From Using AI to Relying on AI: Why the Era of AI Engineering Has Yet to Begin</title><link>https://jimmysong.io/blog/from-using-ai-to-building-ai-systems/</link><pubDate>Sat, 29 Nov 2025 12:40:54 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/from-using-ai-to-building-ai-systems/</guid><description>AI&amp;#39;s real turning point is moving from using AI tools to building AI systems. Why the era of AI engineering hasn&amp;#39;t begun, and the developer opportunity in the next three years.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The real inflection point for AI engineering is not &amp;ldquo;how many people use it,&amp;rdquo; but &amp;ldquo;how many people cannot do without it.&amp;rdquo; Only when not using AI leads to direct loss of opportunity and efficiency, can we say the era of AI engineering has truly arrived.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="starting-point-predictions-for-ai-in-2026"&gt;Starting Point: Predictions for AI in 2026&lt;/h2&gt;
&lt;p&gt;Recently, I came across two &lt;a href="https://thenewstack.io/amazon-cto-werner-vogels-predictions-for-2026/" target="_blank" rel="noopener"&gt;predictions for 2026 from Amazon CTO Werner Vogels&lt;/a&gt; that struck me the most:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Renaissance Developer&lt;/strong&gt;: Developers must span code, product, business, and social impact.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Personalized Learning&lt;/strong&gt;: AI will reshape education, focusing on differentiated paths rather than a unified curriculum.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both point to the same trend: AI is not just a tool, but is redefining how people grow and how they are defined.&lt;/p&gt;
&lt;p&gt;There is a gap between prediction and reality, and it is worth exploring.&lt;/p&gt;
&lt;h2 id="correction-will-ai-really-be-saturated-by-2026"&gt;Correction: Will AI Really Be &amp;ldquo;Saturated&amp;rdquo; by 2026?&lt;/h2&gt;
&lt;p&gt;My initial prediction was that AI usage would reach saturation by 2026. Reality has shown me this is too optimistic.&lt;/p&gt;
&lt;p&gt;By the end of 2025, even among internet professionals, most people&amp;rsquo;s use of AI remains at the &amp;ldquo;heard of it&amp;rdquo; or &amp;ldquo;tried it a few times&amp;rdquo; stage. It is still far from being a daily workflow necessity.&lt;/p&gt;
&lt;p&gt;More importantly, this judgment is &lt;strong&gt;conditional&lt;/strong&gt;: infrastructure supply, regulation, and compute costs must not reverse in the next 3–6 years. If any variable breaks down (costs double, models go offline, policy shifts), the adoption curve will be disrupted.&lt;/p&gt;
&lt;h2 id="the-truth-about-the-inflection-point-from-using-to-relying-on"&gt;The Truth About the Inflection Point: From &amp;ldquo;Using&amp;rdquo; to &amp;ldquo;Relying On&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;&amp;ldquo;Relying on&amp;rdquo; is a vague term. A more precise definition requires measurable indicators.&lt;/p&gt;
&lt;p&gt;Here is a diagram that visualizes the metrics for being truly dependent on AI:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This diagram visualizes the quantitative metrics for being truly dependent on AI, comparing target thresholds with current status:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-dependency-metrics.svg" data-img="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-dependency-metrics.svg" alt="Figure 1: Quantitative Definition of AI Dependency" data-caption="Figure 1: Quantitative Definition of AI Dependency"
width="1776"
height="616"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Quantitative Definition of AI Dependency&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Most industries have not reached the &amp;ldquo;cannot operate without&amp;rdquo; stage, unlike the internet, mobile, or payment inflection points. Most metrics are still far below the threshold, which is why the most likely outcome for 2026 is: &lt;strong&gt;more people will use AI, but those who truly rely on it will remain a minority&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="using--building-the-five-level-capability-ladder"&gt;Using ≠ Building: The Five-Level Capability Ladder&lt;/h2&gt;
&lt;p&gt;This difference is not binary, but a clear progression.&lt;/p&gt;
&lt;p&gt;The following table shows the five-level model of AI capability maturity.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Scarcity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Tool User&lt;/td&gt;
&lt;td&gt;ChatGPT/Claude, Coding, Copywriting, Accelerator, Optional&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Integrator&lt;/td&gt;
&lt;td&gt;LLM API + Vector DB, AI layered on existing systems, Usable, not critical&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Settler&lt;/td&gt;
&lt;td&gt;Restructuring data flow, business decisions, AI becomes critical path&lt;/td&gt;
&lt;td&gt;Rising&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Engineering Abstraction&lt;/td&gt;
&lt;td&gt;Extracting frameworks, runtimes, providing infra for ecosystem&lt;/td&gt;
&lt;td&gt;Extremely High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Autonomous System&lt;/td&gt;
&lt;td&gt;Self-feedback, self-optimizing, redefining human-AI relationship&lt;/td&gt;
&lt;td&gt;Future&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Five-Level Model of AI Capability Maturity
&lt;/figcaption&gt;
&lt;p&gt;Currently, the biggest gap is at &lt;strong&gt;Level 3 and Level 4&lt;/strong&gt;. Most people are stuck at Level 1 or 2, with very few reaching Level 4. This means &lt;strong&gt;high-value scarcity will not disappear, but will continue to rise&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="why-the-era-of-ai-engineering-has-not-arrived-three-dimensional-delaying-factors"&gt;Why the Era of AI Engineering Has Not Arrived: Three-Dimensional Delaying Factors&lt;/h2&gt;
&lt;p&gt;It is not technology alone that is holding things back, but constraints in three dimensions.&lt;/p&gt;
&lt;p&gt;The following diagram illustrates the three main constraints delaying AI engineering maturity:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This diagram illustrates the three main constraints (technical, institutional, and organizational) that are delaying AI engineering maturity, along with their delay metrics:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-engineering-constraints.svg" data-img="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-engineering-constraints.svg" alt="Figure 2: Three-Dimensional Constraints on AI Engineering Maturity" data-caption="Figure 2: Three-Dimensional Constraints on AI Engineering Maturity"
width="1783"
height="803"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Three-Dimensional Constraints on AI Engineering Maturity&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The key observation: &lt;strong&gt;If any one dimension is stuck, the entire ecosystem&amp;rsquo;s maturity will be delayed&lt;/strong&gt;. Currently, none of the three dimensions have fully mature solutions.&lt;/p&gt;
&lt;h2 id="the-realistic-window-three-paths-for-capability-advancement"&gt;The Realistic Window: Three Paths for Capability Advancement&lt;/h2&gt;
&lt;p&gt;The next three years will not be &amp;ldquo;winner takes all,&amp;rdquo; but rather a period where multiple capability levels appreciate simultaneously.&lt;/p&gt;
&lt;p&gt;Below is a table comparing the value and bottlenecks of different capability advancement paths:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability Path&lt;/th&gt;
&lt;th&gt;Short-Term Value&lt;/th&gt;
&lt;th&gt;Long-Term Outlook&lt;/th&gt;
&lt;th&gt;Bottleneck&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Level 1→2 (Tool→Integration)&lt;/td&gt;
&lt;td&gt;⭐⭐ Rapid Depreciation&lt;/td&gt;
&lt;td&gt;⭐ Saturation&lt;/td&gt;
&lt;td&gt;Low barrier, fierce competition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Level 2→3 (Integration→Settlement)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐ Scarce&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐ Continual Appreciation&lt;/td&gt;
&lt;td&gt;Requires industry depth, long-term iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Level 3→4 (Settlement→Abstraction)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐ Extremely Scarce&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐ Defines Ecosystem&lt;/td&gt;
&lt;td&gt;Large cognitive leap, needs community influence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: AI Capability Advancement Paths and Value Comparison
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Key conclusion&lt;/strong&gt;: While the number of &amp;ldquo;AI users&amp;rdquo; is rapidly increasing (depressing Level 1 value), due to the three-dimensional delaying factors, scarcity at Level 3 and 4 will only rise.&lt;/p&gt;
&lt;h2 id="what-im-doing-on-arkspheredev"&gt;What I&amp;rsquo;m Doing on arksphere.dev&lt;/h2&gt;
&lt;p&gt;Based on the above judgment, I focus on exploring the architectural evolution of AI Native Infrastructure. The goal is not to catalog model usage, but to study the foundational capability stack supporting scalable intelligent systems: scheduling, storage, inference, Agent Runtime, autonomous control, observability, and reliability.&lt;/p&gt;
&lt;p&gt;The content is no longer a collection of courses or tips, but a continuous record of evolution around Infra → Runtime → System Abstraction. &lt;a href="https://arksphere.dev" target="_blank" rel="noopener"&gt;arksphere.dev&lt;/a&gt; is the site for this experiment and settlement.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The inflection point for the era of AI engineering is not &amp;ldquo;how many people use it,&amp;rdquo; but &amp;ldquo;how many people cannot do without it.&amp;rdquo; The latter requires five measurable indicators to reach their thresholds, and we are still far from that.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Using ≠ Building&amp;rdquo; is not a binary, but a five-level progression. &lt;strong&gt;Scarcity at Level 3 and 4 will rise as the number of Level 1 users increases&lt;/strong&gt;—this is the biggest opportunity window in the next three years.&lt;/p&gt;
&lt;p&gt;But the width of this window depends largely on how technology, institutions, and organizations evolve together. I hope more people working on AI engineering will not only focus on technical innovation, but also invest equal thought into institutional development, talent growth, and risk governance—these &amp;ldquo;invisible engineering&amp;rdquo; challenges.&lt;/p&gt;</content:encoded></item><item><title>Antigravity VS Code Setup Guide: Build a Practical AI IDE Workflow</title><link>https://jimmysong.io/blog/antigravity-vscode-style-ide/</link><pubDate>Thu, 20 Nov 2025 03:55:30 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/antigravity-vscode-style-ide/</guid><description>A practical Antigravity setup guide for developers who want a VS Code-style AI IDE, including marketplace switch, AMP and CodeX installation, and workflow tuning.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The biggest pain point when switching IDEs is user habits. By installing a series of plugins and tweaking configurations, you can make Antigravity feel much more like VS Code—preserving familiar workflows while adding Open Agent Manager capabilities.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you searched for a practical Antigravity VS Code setup, this walkthrough is optimized for that exact use case. The goal is not to replicate VS Code pixel by pixel, but to restore a familiar extension marketplace, keep your daily coding ergonomics, and still use Antigravity&amp;rsquo;s stronger agent-style execution. I focus on the concrete setup steps that materially change productivity: marketplace migration, AMP and CodeX installation, editor behavior alignment, and the trade-offs versus GitHub Copilot in real daily work.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/antigravity-ui.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/antigravity-ui.webp" alt="Figure 1: Antigravity IDE UI" data-caption="Figure 1: Antigravity IDE UI"
width="5120"
height="2880"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Antigravity IDE UI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="continue-reading"&gt;Continue Reading&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/qoder-alibaba-ai-ide-personal-review/"&gt;Qoder AI IDE review and hands-on comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/open-source-ai-agent-workflow-comparison/"&gt;Open-source AI Agent and workflow platform comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/vibe-coding-free-tools/"&gt;Free Vibe Coding tools I actually use&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/ai/oh-my-opencode/"&gt;Oh My OpenCode in AI OSS Landscape&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below are the configurations and steps I actually use. Feel free to follow along.&lt;/p&gt;
&lt;h2 id="first-impressions-of-antigravity"&gt;First Impressions of Antigravity&lt;/h2&gt;
&lt;p&gt;A few subjective observations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The interface is split between agent management and editor views, somewhat like AgentHQ + VS Code.&lt;/li&gt;
&lt;li&gt;Agents modify code very quickly, with a much higher completion rate than typical &amp;ldquo;chat-based&amp;rdquo; assistants.&lt;/li&gt;
&lt;li&gt;The editor and context windows are large, ideal for long diffs and logs.&lt;/li&gt;
&lt;li&gt;By default, it uses OpenVSX / OpenVSCode Gallery, so the extension ecosystem isn&amp;rsquo;t identical to my VS Code setup.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All subsequent steps focus on one goal: keep Antigravity&amp;rsquo;s agent features while maintaining my VS Code workflow.&lt;/p&gt;
&lt;h2 id="switching-the-extension-marketplace-to-vs-code-official"&gt;Switching the Extension Marketplace to VS Code Official&lt;/h2&gt;
&lt;p&gt;Antigravity is essentially a VS Code fork, so you can directly change the Marketplace configuration.&lt;/p&gt;
&lt;p&gt;In Antigravity:&lt;/p&gt;
&lt;p&gt;Go to &lt;strong&gt;Settings&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Antigravity Settings&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Editor&lt;/strong&gt;, and update the following URLs to point to VS Code:&lt;/p&gt;
&lt;p&gt;Marketplace Item URL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;https://marketplace.visualstudio.com/items
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Marketplace Gallery URL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;https://marketplace.visualstudio.com/_apis/public/gallery
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/vscode-marketplace.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/vscode-marketplace.webp" alt="Figure 2: VSCode Marketplace Configuration" data-caption="Figure 2: VSCode Marketplace Configuration"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: VSCode Marketplace Configuration&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Restart Antigravity.&lt;/p&gt;
&lt;p&gt;After this change, searching and installing extensions works just like the official VS Code Marketplace. Installing AMP, GitHub Theme, VS Code Icon, etc., all follow this process.&lt;/p&gt;
&lt;h2 id="installing-the-amp-extension"&gt;Installing the AMP Extension&lt;/h2&gt;
&lt;p&gt;AMP isn&amp;rsquo;t officially supported on Antigravity yet, but you can install it directly via the VS Code Marketplace.&lt;/p&gt;
&lt;p&gt;Steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open the Extensions panel (the same icon as in VS Code).&lt;/li&gt;
&lt;li&gt;Search for the AMP extension and install it as usual.&lt;/li&gt;
&lt;li&gt;Log in using your AMP API Key.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Currently, Antigravity doesn&amp;rsquo;t support one-click account login like VS Code; you have to use the API key.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Summary
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
Once installed, AMP works almost identically in Antigravity as in VS Code—completion and refactoring features are available. The only difference is manual login configuration.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I recommend AMP because it offers a free mode. In my experience, it&amp;rsquo;s great for writing documentation, running scripts, and as a daily command-line tool. It&amp;rsquo;s fast, and especially useful for optimizing prompts.&lt;/p&gt;
&lt;h2 id="importing-the-codex-extension"&gt;Importing the CodeX Extension&lt;/h2&gt;
&lt;p&gt;CodeX doesn&amp;rsquo;t provide a direct VSIX download link on the web. My approach is to export it from VS Code and then import it into Antigravity.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/codex-extension.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/codex-extension.webp" alt="Figure 3: Exporting Codex Extension in VS Code" data-caption="Figure 3: Exporting Codex Extension in VS Code"
width="3016"
height="2264"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Exporting Codex Extension in VS Code&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install the CodeX extension in VS Code (if you haven&amp;rsquo;t already).&lt;/li&gt;
&lt;li&gt;In VS Code&amp;rsquo;s extension manager, find CodeX and export it as a &lt;code&gt;.vsix&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;Switch to Antigravity, open the Extensions panel, and select &amp;ldquo;Install from VSIX&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Choose the exported &lt;code&gt;codex-x.x.x.vsix&lt;/code&gt; file to complete installation.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="alert alert-tip-container"&gt;
&lt;div class="alert-tip-title px-2"&gt;
Tip
&lt;/div&gt;
&lt;div class="alert-tip px-2"&gt;
Since my local VS Code is already logged into CodeX, importing it into Antigravity automatically reuses the login state—I didn&amp;rsquo;t need to log in again.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="optimizing-editor-settings"&gt;Optimizing Editor Settings&lt;/h2&gt;
&lt;p&gt;Beyond the marketplace and plugins, a few tweaks make the experience even closer to VS Code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Theme&lt;/strong&gt;: Choose the same color scheme as VS Code to minimize visual switching. I use GitHub Theme and vscode-icons.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Editor Settings&lt;/strong&gt;: In &amp;ldquo;Open Editor Settings&amp;rdquo;, set indentation, formatting, line width, etc., to match your VS Code preferences. I define these in the workspace&amp;rsquo;s &lt;code&gt;settings.json&lt;/code&gt;, so no migration is needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After these changes, the editing area is essentially &amp;ldquo;VS Code with an agent console&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="remaining-issues"&gt;Remaining Issues&lt;/h2&gt;
&lt;p&gt;To fully migrate from VS Code/GitHub Copilot to Antigravity, I think there are still several key challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Limited Customization&lt;/strong&gt;: Antigravity can&amp;rsquo;t support custom prompts and agents like Copilot Chat. Currently, only &amp;ldquo;rules&amp;rdquo; configuration is available, which limits workflow flexibility.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Ecosystem Needs Improvement&lt;/strong&gt;: Antigravity hasn&amp;rsquo;t natively integrated the latest models from major vendors (OpenAI, Anthropic, Microsoft, xAI, etc.), whereas GitHub Copilot excels here.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Considerations&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Future pricing may start at $20/month.&lt;/li&gt;
&lt;li&gt;No free models are supported, unlike GitHub Copilot (even Copilot Pro users have free model options).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stability Issues&lt;/strong&gt;: Agents often encounter &amp;ldquo;Agent terminated due to error&amp;rdquo; during operation, requiring manual retries or new sessions. This affects workflow smoothness, though I expect improvements in the future.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="github-copilot-vs-antigravity"&gt;GitHub Copilot VS. Antigravity&lt;/h2&gt;
&lt;p&gt;Although Antigravity excels in several areas, there is still significant room for improvement compared to the combination of GitHub Copilot and VS Code.&lt;/p&gt;
&lt;p&gt;The large language models (LLMs, Large Language Models) I frequently use are all supported in VS Code:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/models.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/models.webp" alt="Figure 4: Copilot-supported LLMs (partial)" data-caption="Figure 4: Copilot-supported LLMs (partial)"
width="1252"
height="1240"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Copilot-supported LLMs (partial)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;My long-accumulated custom prompts:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/prompts.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/prompts.webp" alt="Figure 5: Copilot Chat enables quick access to custom prompts" data-caption="Figure 5: Copilot Chat enables quick access to custom prompts"
width="1252"
height="852"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Copilot Chat enables quick access to custom prompts&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;My collection of agents:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/agents.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/agents.webp" alt="Figure 6: Copilot Chat allows selection of custom agents" data-caption="Figure 6: Copilot Chat allows selection of custom agents"
width="1258"
height="2594"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: Copilot Chat allows selection of custom agents&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here are some personal experiences using VS Code and Copilot that, for now, are hard to replace with other IDEs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Ask/Edit/Agent/Plan workflow perfectly fits my working habits.&lt;/li&gt;
&lt;li&gt;Support for custom prompts and agents is essential. Many of my prompts and agents have been refined over time and are deeply integrated into my daily workflow—it&amp;rsquo;s hard to find alternatives elsewhere.&lt;/li&gt;
&lt;li&gt;New models are integrated at lightning speed. Whenever a new model is released, GitHub Copilot is among the first to support it.&lt;/li&gt;
&lt;li&gt;The integration with VS Code is seamless—no extra configuration required, making it extremely convenient.&lt;/li&gt;
&lt;li&gt;Frequent updates: just a few days ago, a bug I reported to VS Code was fixed the same night.&lt;/li&gt;
&lt;li&gt;Copilot Chat&amp;rsquo;s keyboard shortcuts make it easy to quickly access various features.&lt;/li&gt;
&lt;li&gt;GitHub has granted me a free Pro account. Although the monthly premium quota is only 300 calls, combining Copilot with other plugins like AMP, Codex, Droid, and Qwen enables a highly efficient workflow. Even if I upgrade to a paid account in the future, the $10/month fee is very cost-effective compared to similar products.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practical-experience"&gt;Practical Experience&lt;/h2&gt;
&lt;p&gt;A few subjective tips from my actual usage—take them as reference:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t treat Antigravity as &amp;ldquo;VS Code + chat box&amp;rdquo;. Use its agent features for complete tasks: let the agent propose a plan, then execute changes.&lt;/li&gt;
&lt;li&gt;For major changes, always create a new Git branch and restrict agent actions to that branch. Handle all diffs via standard Pull Request (PR) workflows.&lt;/li&gt;
&lt;li&gt;Ask agents to produce &amp;ldquo;artifacts&amp;rdquo; (plans, proposals, test descriptions), not just final code. This makes it easier to review and track changes.&lt;/li&gt;
&lt;li&gt;Plugins you&amp;rsquo;re already comfortable with in VS Code (like AMP, CodeX) can be migrated directly, reducing cognitive load and letting you focus on new agent workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;My current experience: Antigravity delivers powerful agent capabilities and multi-view consoles. By following these steps to align the interface and plugin ecosystem with VS Code, you can smoothly transition your daily development workflow.&lt;/p&gt;</content:encoded></item><item><title>Cloudflare November 18 Global Outage: The Dangers of Implicit Assumptions in Modern Infrastructure</title><link>https://jimmysong.io/blog/cloudflare-2025-11-18-outage-analysis/</link><pubDate>Wed, 19 Nov 2025 18:56:34 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/cloudflare-2025-11-18-outage-analysis/</guid><description>An analysis of the Cloudflare global outage on November 18, 2025, exploring implicit assumptions, automated configuration pipelines, and systemic risks in modern infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The greatest risks to modern internet infrastructure often aren&amp;rsquo;t in the code itself, but in those implicit assumptions and automated configuration pipelines that go undefined. Cloudflare&amp;rsquo;s outage is a wake-up call every Infra/AI engineer must heed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Yesterday (November 18), Cloudflare experienced its largest global outage since 2019. As this site is hosted on Cloudflare, it was also affected—one of the rare times in eight years that the site was inaccessible due to an outage (the last time was a GitHub Pages failure, which happened the year Microsoft acquired GitHub).&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloudflare-2025-11-18-outage-analysis/jimmysongio-down.webp" data-img="https://assets.jimmysong.io/images/blog/cloudflare-2025-11-18-outage-analysis/jimmysongio-down.webp" alt="Figure 1: jimmysong.io was down for 27 minutes due to the Cloudflare outage" data-caption="Figure 1: jimmysong.io was down for 27 minutes due to the Cloudflare outage"
width="2694"
height="1424"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: jimmysong.io was down for 27 minutes due to the Cloudflare outage&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This incident was not caused by an attack or a traditional software bug, but by a seemingly &amp;ldquo;safe&amp;rdquo; permissions update that triggered the weakest link in modern infrastructure: &lt;strong&gt;implicit assumptions (Implicit Assumption) and automated configuration pipelines (Automated Configuration Pipeline)&lt;/strong&gt;. Cloudflare has published a blog post &lt;a href="https://blog.cloudflare.com/18-november-2025-outage/" target="_blank" rel="noopener"&gt;Cloudflare outage on November 18, 2025&lt;/a&gt; explaining the cause.&lt;/p&gt;
&lt;p&gt;Here is the chain reaction process of the outage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A permissions adjustment led to metadata changes;&lt;/li&gt;
&lt;li&gt;The metadata change doubled the lines in the feature file;&lt;/li&gt;
&lt;li&gt;The doubled lines triggered the proxy module&amp;rsquo;s memory limit;&lt;/li&gt;
&lt;li&gt;The memory limit caused the core proxy to panic;&lt;/li&gt;
&lt;li&gt;The proxy panic led to a cascade failure in downstream systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This kind of chain reaction is the most typical—and dangerous—systemic failure mode at today&amp;rsquo;s internet scale.&lt;/p&gt;
&lt;h2 id="root-cause-implicit-assumptions-are-not-contracts"&gt;Root Cause: Implicit Assumptions Are Not Contracts&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s first look at the core hidden risk in this incident. The Bot Management feature file is automatically generated every five minutes, relying on a default premise:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The system.columns query result contains only the default database.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This assumption was not documented or validated in configuration—it existed only in the engineer&amp;rsquo;s mental model.&lt;/p&gt;
&lt;p&gt;After a ClickHouse permissions update, the underlying r0 tables were exposed, instantly doubling the query results. The file size exceeded the &lt;a href="https://blog.cloudflare.com/20-percent-internet-upgrade/" target="_blank" rel="noopener"&gt;FL2&lt;/a&gt; preset of 200 features in memory, ultimately causing a panic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Once an implicit assumption is broken, the system lacks a buffer and is highly prone to cascading failures.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="configuration-pipelines-are-riskier-than-code-pipelines"&gt;Configuration Pipelines Are Riskier Than Code Pipelines&lt;/h2&gt;
&lt;p&gt;This incident was not caused by code changes, but by data-plane changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SQL query behavior changed;&lt;/li&gt;
&lt;li&gt;Feature files were automatically generated;&lt;/li&gt;
&lt;li&gt;The files were broadcast across the network.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A typical phenomenon in modern infrastructure: &lt;strong&gt;data, schema, and metadata are far more likely to destabilize systems than code.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Cloudflare&amp;rsquo;s feature file is a &amp;ldquo;supply chain input,&amp;rdquo; not a regular configuration. Anything entering the automated broadcast path is equivalent to a system-level command.&lt;/p&gt;
&lt;h2 id="language-safety-cant-eliminate-boundary-layer-complexity"&gt;Language Safety Can&amp;rsquo;t Eliminate Boundary Layer Complexity&lt;/h2&gt;
&lt;p&gt;A former Cloudflare engineer summarized it well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Rust can prevent a class of errors, but the complexity of boundary layers, data contracts, and configuration pipelines does not disappear.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The FL2 panic stemmed from a single &lt;code&gt;unwrap()&lt;/code&gt;. This isn&amp;rsquo;t a language issue, but a lack of system contracts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No upper-bound validation for feature count;&lt;/li&gt;
&lt;li&gt;File schema lacked version constraints;&lt;/li&gt;
&lt;li&gt;Feature generation logic depended on implicit behavior;&lt;/li&gt;
&lt;li&gt;Core proxy error mode was panic, not graceful degradation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Most incidents in modern distributed systems (Distributed System) come from &amp;ldquo;bad input,&amp;rdquo; not &amp;ldquo;bad memory.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="core-proxies-need-controllable-failure-paths"&gt;Core Proxies Need Controllable Failure Paths&lt;/h2&gt;
&lt;p&gt;FL/FL2 are Cloudflare&amp;rsquo;s core proxies; all requests must pass through them. Such components should not fail with a panic, but have the following capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ignore abnormal features;&lt;/li&gt;
&lt;li&gt;Truncate over-limit fields;&lt;/li&gt;
&lt;li&gt;Roll back to previous versions;&lt;/li&gt;
&lt;li&gt;Fail-open or fail-close;&lt;/li&gt;
&lt;li&gt;Skip the Bot module and continue processing traffic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As long as the proxy &amp;ldquo;stays alive,&amp;rdquo; the entire network won&amp;rsquo;t be completely paralyzed.&lt;/p&gt;
&lt;h2 id="data-changes-are-more-uncontrollable-than-code-changes"&gt;Data Changes Are More Uncontrollable Than Code Changes&lt;/h2&gt;
&lt;p&gt;The essence of this incident:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Subtle permission changes;&lt;/li&gt;
&lt;li&gt;ClickHouse default behavior changed;&lt;/li&gt;
&lt;li&gt;Query results propagated to distributed systems;&lt;/li&gt;
&lt;li&gt;Automated publishing amplified the error;&lt;/li&gt;
&lt;li&gt;Edge proxies crashed due to uncontrolled input.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Future AI Infra (AI Infrastructure) will be even more complex: models, tokenizers, adapters, RAG indexes, and KV snapshots all require frequent updates.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In future AI infrastructure, data-plane risks will far exceed those of the code-plane.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="recovery-process-shows-engineering-maturity"&gt;Recovery Process Shows Engineering Maturity&lt;/h2&gt;
&lt;p&gt;During the incident, Cloudflare took several measures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stopped generating erroneous feature files;&lt;/li&gt;
&lt;li&gt;Force-distributed the previous version of the file;&lt;/li&gt;
&lt;li&gt;Rolled back Bot module configuration;&lt;/li&gt;
&lt;li&gt;Ran Workers KV and Access outside the core proxy;&lt;/li&gt;
&lt;li&gt;Restored traffic in stages.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Restoring hundreds of PoPs worldwide simultaneously demonstrates a high level of engineering maturity.&lt;/p&gt;
&lt;h2 id="lessons-for-infraaicloud-native-engineers"&gt;Lessons for Infra/AI/Cloud Native Engineers&lt;/h2&gt;
&lt;p&gt;The Cloudflare event highlights four common risks in large-scale systems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implicit assumptions fail;&lt;/li&gt;
&lt;li&gt;Configuration supply chain contamination;&lt;/li&gt;
&lt;li&gt;Automated publishing amplifies errors;&lt;/li&gt;
&lt;li&gt;Core proxies lack graceful degradation paths.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For AI Infra practitioners, these risks are even more relevant:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model weight updates without schema validation;&lt;/li&gt;
&lt;li&gt;Adapter merges may be contaminated;&lt;/li&gt;
&lt;li&gt;RAG index incremental builds are unstable;&lt;/li&gt;
&lt;li&gt;Inference graph configuration may be broken by bad data;&lt;/li&gt;
&lt;li&gt;Automatically rolled-out models may propagate errors network-wide.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;AI engineering is replaying Cloudflare&amp;rsquo;s infrastructure dilemmas—just at greater speed and scale.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary-of-former-cloudflare-engineers-views"&gt;Summary of Former Cloudflare Engineer&amp;rsquo;s Views&lt;/h2&gt;
&lt;p&gt;His insights pinpoint the hardest problems in distributed systems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The issue isn&amp;rsquo;t code, but missing contracts;&lt;/li&gt;
&lt;li&gt;Not the language, but undefined input boundaries;&lt;/li&gt;
&lt;li&gt;Not modules, but lack of validation in the configuration supply chain;&lt;/li&gt;
&lt;li&gt;Not bugs, but absence of fail-safe mechanisms.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This incident proves: &lt;strong&gt;The real fragility in modern infrastructure lies in &amp;ldquo;behavioral boundaries,&amp;rdquo; not &amp;ldquo;memory boundaries.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The Cloudflare November 18 outage was not a coincidence, but an inevitable result of modern internet infrastructure evolving to large-scale, highly automated stages.&lt;/p&gt;
&lt;p&gt;Key takeaways from this event:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;System assumptions must be made explicit;&lt;/li&gt;
&lt;li&gt;Configuration pipelines must be validated;&lt;/li&gt;
&lt;li&gt;Automated publishing needs &amp;ldquo;dead-end&amp;rdquo; mechanisms;&lt;/li&gt;
&lt;li&gt;Core proxies must be designed with controllable failure paths;&lt;/li&gt;
&lt;li&gt;Data-plane contracts must be stricter than code-plane contracts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the AI-native Infra era, these requirements will only become more stringent.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.cloudflare.com/18-november-2025-outage/" target="_blank" rel="noopener"&gt;Cloudflare outage on November 18, 2025 - blog.cloudflare.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.cloudflare.com/20-percent-internet-upgrade/" target="_blank" rel="noopener"&gt;20% of the Internet upgraded - blog.cloudflare.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>The Second Half of Cloud Native: The Era of AI Native Platform Engineering Has Arrived</title><link>https://jimmysong.io/blog/cloud-native-second-half-ai-native-platform-engineering/</link><pubDate>Mon, 17 Nov 2025 11:07:40 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/cloud-native-second-half-ai-native-platform-engineering/</guid><description>A decade of cloud native evolution, a look ahead to AI-Native Platform engineering, technical layers, and key changes. KubeCon NA 2025 signals a new era.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The second half of cloud native isn&amp;rsquo;t about being replaced by AI, but being rewritten by it. The future of platform engineering will revolve around models and agents, reshaping the tech stack and developer experience.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Since I first encountered Docker and Kubernetes in 2015, I&amp;rsquo;ve followed the cloud native journey: from writing Deployments in YAML, to exploring Service Mesh and observability, and in recent years, focusing on AI Infra and AI-Native Platforms. Looking back from 2025, the years 2015–2025 can be seen as the &amp;ldquo;first half&amp;rdquo; of cloud native. Marked by &lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/" target="_blank" rel="noopener"&gt;KubeCon / CloudNativeCon NA 2025&lt;/a&gt;, the industry is collectively entering the &amp;ldquo;second half&amp;rdquo;: the era of AI-Native Platform engineering.&lt;/p&gt;
&lt;p&gt;This article reviews the past decade of cloud native, and, combined with KubeCon NA 2025, outlines key turning points and the technical coordinates for the next ten years.&lt;/p&gt;
&lt;h2 id="20152025-the-first-half-of-cloud-native"&gt;2015–2025: The &amp;ldquo;First Half&amp;rdquo; of Cloud Native&lt;/h2&gt;
&lt;p&gt;Over the past decade, cloud native technology themes have evolved through three main stages. The following flowchart illustrates the progression.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The flowchart below illustrates the progression of cloud native technology themes over the past decade:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/cloud-native-decade-evolution.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/cloud-native-decade-evolution.svg" alt="Figure 1: Cloud Native Decade Technology Evolution Flow" data-caption="Figure 1: Cloud Native Decade Technology Evolution Flow"
width="2543"
height="223"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Cloud Native Decade Technology Evolution Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The first stage focused on containerization and orchestration standardization.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Docker realized the engineering dream of &amp;ldquo;build once, run anywhere&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Kubernetes won the orchestration wars and became the de facto standard&lt;/li&gt;
&lt;li&gt;CNCF was founded, with Prometheus, Envoy, and other projects joining&lt;/li&gt;
&lt;li&gt;Enterprises focused on migrating applications to Kubernetes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Typical tasks during this phase involved moving Java services from VMs to containers and K8s, emphasizing understanding of Deployment, Service, and Ingress.&lt;/p&gt;
&lt;p&gt;The second stage, 2018–2020, saw complexity shift from &amp;ldquo;deployment&amp;rdquo; to &amp;ldquo;communication&amp;rdquo; and &amp;ldquo;operations&amp;rdquo;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Service Mesh (Istio / Linkerd / Consul) addressed east-west traffic management&lt;/li&gt;
&lt;li&gt;The observability trio (Logs / Metrics / Traces) became default configurations&lt;/li&gt;
&lt;li&gt;Multi-cluster and multi-region practices matured&lt;/li&gt;
&lt;li&gt;Enterprises focused on managing large microservice systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During this period, I spent significant time researching Istio, service mesh, and traffic management, and authored Kubernetes and Istio books. The focus shifted to system stability, observability, and reliability.&lt;/p&gt;
&lt;p&gt;The third stage, 2021–2025, is defined by Platform Engineering and GitOps.&lt;/p&gt;
&lt;p&gt;As microservices and tools proliferated, platform complexity began to overwhelm developers, making Platform Engineering a key industry term.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitOps (Argo CD / Flux) drove declarative delivery processes&lt;/li&gt;
&lt;li&gt;Internal Developer Platforms (IDP) became priorities for large enterprises&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Platform as a product&amp;rdquo; philosophy spread&lt;/li&gt;
&lt;li&gt;FinOps, cost management, and compliance auditing became platform concerns&lt;/li&gt;
&lt;li&gt;DevOps evolved from &amp;ldquo;tool practice&amp;rdquo; to &amp;ldquo;organizational + platform capability&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My takeaway: simply giving developers a pile of tools isn&amp;rsquo;t enough. End-to-end delivery paths and stable abstraction layers are needed so developers can focus on business, not tool integration.&lt;/p&gt;
&lt;p&gt;The table below summarizes the main features of each &amp;ldquo;first half&amp;rdquo; stage.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Core Challenge&lt;/th&gt;
&lt;th&gt;Key Tech Stack&lt;/th&gt;
&lt;th&gt;Typical Issues&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2015–2017 Orchestration&lt;/td&gt;
&lt;td&gt;Migrating from VM to containers&lt;/td&gt;
&lt;td&gt;Docker, Kubernetes, CNI&lt;/td&gt;
&lt;td&gt;Reliable deployment, rolling upgrades&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2018–2020 Mesh&lt;/td&gt;
&lt;td&gt;Microservice scale, complex communication &amp;amp; observability&lt;/td&gt;
&lt;td&gt;Istio/Linkerd, Prometheus, Jaeger&lt;/td&gt;
&lt;td&gt;Troubleshooting, fragmented observability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021–2025 Platform&lt;/td&gt;
&lt;td&gt;Tool sprawl, declining developer experience&lt;/td&gt;
&lt;td&gt;GitOps, IDP, FinOps, Policy-as-Code&lt;/td&gt;
&lt;td&gt;Developer fatigue, platform team overload&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Cloud Native First Half Stage Features
&lt;/figcaption&gt;
&lt;h2 id="kubecon-na-2025-signals-of-cloud-natives-second-half"&gt;KubeCon NA 2025: Signals of Cloud Native&amp;rsquo;s &amp;ldquo;Second Half&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;The main theme of KubeCon 2025 is no longer &amp;ldquo;how to use Kubernetes well,&amp;rdquo; but how to reconstruct Kubernetes and the cloud native ecosystem into AI-Native Platforms for the AI era.&lt;/p&gt;
&lt;p&gt;Key signals from KubeCon NA 2025 include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CNCF released the &lt;a href="https://github.com/cncf/k8s-ai-conformance" target="_blank" rel="noopener"&gt;Certified Kubernetes AI Conformance Program&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dynamic Resource Allocation (DRA) entered mainstream discussions&lt;/li&gt;
&lt;li&gt;Model Runtime / Agent Runtime projects became conference hotspots&lt;/li&gt;
&lt;li&gt;Vendors focused on AI SRE, AI-assisted development, AI security, and supply chain governance&lt;/li&gt;
&lt;li&gt;Speakers like Alex Zenla openly stated that Kubernetes&amp;rsquo; underlying structure needs rethinking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Together, these mark a clear dividing line: cloud native has officially entered its &amp;ldquo;second half.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="first-half-vs-second-half-shifting-the-cloud-native-narrative"&gt;First Half vs Second Half: Shifting the Cloud Native Narrative&lt;/h2&gt;
&lt;p&gt;If 2015–2025 is the &amp;ldquo;first half,&amp;rdquo; then 2025–2035 is likely the &amp;ldquo;second half.&amp;rdquo; The table below compares their core differences.&lt;/p&gt;
&lt;p&gt;It highlights changes in platform objects, goals, abstraction layers, and more.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;First Half (2015–2025)&lt;/th&gt;
&lt;th&gt;Second Half (2025–2035, AI Native)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Objects&lt;/td&gt;
&lt;td&gt;Containers, Pods, Microservices&lt;/td&gt;
&lt;td&gt;Models, inference tasks, Agents, data pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platform Goals&lt;/td&gt;
&lt;td&gt;Stable application delivery&lt;/td&gt;
&lt;td&gt;Efficient, continuous AI workload &amp;amp; agent orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abstraction Layers&lt;/td&gt;
&lt;td&gt;Deployment / Service / Ingress / Job&lt;/td&gt;
&lt;td&gt;Model / Endpoint / Graph / Policy / Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource Scheduling&lt;/td&gt;
&lt;td&gt;CPU / Memory / Node&lt;/td&gt;
&lt;td&gt;GPU / TPU / ASIC / KV Cache / Bandwidth / Power&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering Focus&lt;/td&gt;
&lt;td&gt;DevOps / GitOps / Platform Engineering 1.0&lt;/td&gt;
&lt;td&gt;AI Native Platform Engineering / AI SRE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security &amp;amp; Compliance&lt;/td&gt;
&lt;td&gt;Image security, CVE, supply chain SBOM&lt;/td&gt;
&lt;td&gt;Model security, data security, AI supply chain &amp;amp; &amp;ldquo;hallucination dependencies&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime Forms&lt;/td&gt;
&lt;td&gt;Container + VM + Serverless&lt;/td&gt;
&lt;td&gt;Container + WASM + Nix + Agent Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Core Differences: First vs Second Half of Cloud Native
&lt;/figcaption&gt;
&lt;p&gt;From a developer&amp;rsquo;s perspective, the most direct change is: future platforms will no longer treat &amp;ldquo;services&amp;rdquo; as first-class citizens, but will center on &amp;ldquo;models + agents.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="example-technical-layers-of-an-ai-native-platform"&gt;Example: Technical Layers of an AI Native Platform&lt;/h2&gt;
&lt;p&gt;To clarify the structure of an AI-Native Platform, the following layered diagram shows the relationships between technical levels.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The layered diagram below shows the relationships between different technical levels in an AI-Native Platform:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/ai-native-platform-layering.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/ai-native-platform-layering.svg" alt="Figure 2: AI Native Platform Layering Diagram" data-caption="Figure 2: AI Native Platform Layering Diagram"
width="2063"
height="1643"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: AI Native Platform Layering Diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Historically, cloud native focused on L0 + L2 (Kubernetes + platform engineering), but in the AI Native era, L1 (Model Runtime, Agent Runtime, heterogeneous resource scheduling) becomes the new battleground.&lt;/p&gt;
&lt;h2 id="key-change-1-from-container-centric-to-model-centric"&gt;Key Change 1: From &amp;ldquo;Container-Centric&amp;rdquo; to &amp;ldquo;Model-Centric&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;In the first half, cloud native&amp;rsquo;s main object was the application process, with containers as packaging. The second half requires handling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model version management and canary releases&lt;/li&gt;
&lt;li&gt;Balancing inference performance, latency, and cost&lt;/li&gt;
&lt;li&gt;Multi-model composition, routing, A/B testing&lt;/li&gt;
&lt;li&gt;Relationships between models, data, features, and vector indexes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At KubeCon NA 2025, CNCF&amp;rsquo;s AI Conformance Program aims to standardize model workloads, managing them like Deployments. Platform engineering will gain new abstractions—not just &amp;ldquo;deploying services,&amp;rdquo; but &amp;ldquo;deploying model capabilities.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="key-change-2-dra-and-the-golden-window-for-heterogeneous-resource-scheduling"&gt;Key Change 2: DRA and the Golden Window for Heterogeneous Resource Scheduling&lt;/h2&gt;
&lt;p&gt;Previously, writing a Deployment meant focusing on CPU and memory. Now, GPU inference, training, and Agent Runtime scenarios demand more than static quotas.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jimmysong.io/book/kubernetes-handbook/ai-native/k8s-device-plugin/"&gt;Dynamic Resource Allocation (DRA)&lt;/a&gt; brings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pluggable resource types (GPU/TPU/FPGA/ASIC)&lt;/li&gt;
&lt;li&gt;Topology-aware, NUMA, and memory fragmentation scheduling&lt;/li&gt;
&lt;li&gt;Binding inference requests to compute allocation for fine-grained QoS&lt;/li&gt;
&lt;li&gt;Cost optimization and power control in scheduling decisions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the most significant &amp;ldquo;resource perspective&amp;rdquo; upgrade since Kubernetes&amp;rsquo; inception. The scheduler is no longer just a cluster component, but the AI platform&amp;rsquo;s policy engine.&lt;/p&gt;
&lt;h2 id="key-change-3-agent-runtime-as-the-new-generation-of-runtime"&gt;Key Change 3: Agent Runtime as the New Generation of Runtime&lt;/h2&gt;
&lt;p&gt;KubeCon showcased several representative projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://edera.dev" target="_blank" rel="noopener"&gt;Edera&lt;/a&gt;: Minimal, verifiable runtime redesign&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/flox/flox" target="_blank" rel="noopener"&gt;Flox&lt;/a&gt;: Nix-based &amp;ldquo;uncontained&amp;rdquo; runtime environment&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/golemcloud/golem" target="_blank" rel="noopener"&gt;Golem&lt;/a&gt;: WASM-based large-scale agent orchestration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The consensus: AI agents aren&amp;rsquo;t suited to traditional container runtime models. Agents have these traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Strong statefulness: context, memory, sessions&lt;/li&gt;
&lt;li&gt;High concurrency but fine granularity: massive lightweight tasks&lt;/li&gt;
&lt;li&gt;Extremely sensitive to latency and cold starts&lt;/li&gt;
&lt;li&gt;Need to resume after failure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Next-gen runtimes focus on reliably executing, managing state, and auditing &amp;ldquo;hundreds of thousands of agents,&amp;rdquo; not just &amp;ldquo;spinning up more Pods.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="key-change-4-ai-sre-and-ai-security"&gt;Key Change 4: AI SRE and AI Security&lt;/h2&gt;
&lt;p&gt;At KubeCon NA 2025, security and operations topics were amplified by AI:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Software supply chain attacks and CVEs continue to rise&lt;/li&gt;
&lt;li&gt;LLM-assisted coding introduces &amp;ldquo;hallucination dependencies&amp;rdquo; and &amp;ldquo;vibecoded vulnerabilities&amp;rdquo;&lt;/li&gt;
&lt;li&gt;AI-driven artifact scanning, dependency auditing, and license analysis&lt;/li&gt;
&lt;li&gt;&amp;ldquo;AI SRE&amp;rdquo; is now a formal product category&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Traditional cloud native already emphasized security and SRE, but now must address model weights, datasets, vector stores, and agent workflows. AI-Native Platform engineering must answer:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Are code and dependencies secure?&lt;/li&gt;
&lt;li&gt;Are models and data trustworthy?&lt;/li&gt;
&lt;li&gt;Are agent behaviors controllable?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This will drive deep integration of Policy-as-Code, MCP, graph permission systems, and AI.&lt;/p&gt;
&lt;h2 id="key-change-5-open-source-participation-becomes-a-baseline"&gt;Key Change 5: Open Source Participation Becomes a Baseline&lt;/h2&gt;
&lt;p&gt;In interviews, platform engineering leaders noted:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hiring increasingly values upstream contributions to Kubernetes and related projects&lt;/li&gt;
&lt;li&gt;Open source involvement shortens ramp-up time&lt;/li&gt;
&lt;li&gt;New AI Native projects (Model Runtime, Agent Runtime, Scheduler) are also open source&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For career growth, contributing to AI Native open source projects will become a basic requirement for platform engineering and AI Infra roles—not just a resume bonus.&lt;/p&gt;
&lt;h2 id="the-contours-of-cloud-natives-second-half"&gt;The Contours of Cloud Native&amp;rsquo;s &amp;ldquo;Second Half&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;The table below summarizes the technical focus and essential differences of the &amp;ldquo;second half.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It highlights the key coordinates of AI-Native Platform engineering.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;th&gt;Technical Focus&lt;/th&gt;
&lt;th&gt;Essential Difference from First Half&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI Native Platform&lt;/td&gt;
&lt;td&gt;Models/Agents as first-class citizens, unified abstraction &amp;amp; governance&lt;/td&gt;
&lt;td&gt;Objects shift from services to models &amp;amp; inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource Scheduling&lt;/td&gt;
&lt;td&gt;DRA, heterogeneous compute, topology awareness, power &amp;amp; cost&lt;/td&gt;
&lt;td&gt;From static quotas to dynamic, policy-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime&lt;/td&gt;
&lt;td&gt;Container + WASM + Nix + Agent Runtime&lt;/td&gt;
&lt;td&gt;From &amp;ldquo;process containerization&amp;rdquo; to &amp;ldquo;execution graph containerization&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platform Engineering&lt;/td&gt;
&lt;td&gt;IDP + AI SRE + Security + Cost + Compliance&lt;/td&gt;
&lt;td&gt;From toolset to &amp;ldquo;autonomous platform&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security &amp;amp; Supply Chain&lt;/td&gt;
&lt;td&gt;LLM dependencies, model weights, datasets, vector store governance&lt;/td&gt;
&lt;td&gt;Protection expands from images to &amp;ldquo;all AI engineering assets&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Source &amp;amp; Ecosystem&lt;/td&gt;
&lt;td&gt;AI Infra / Model Runtime / Agent Runtime upstream collaboration&lt;/td&gt;
&lt;td&gt;Not just &amp;ldquo;using open source,&amp;rdquo; but &amp;ldquo;building the future in open source&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Cloud Native Second Half Technical Coordinates
&lt;/figcaption&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Over the past decade, cloud native evolved from container orchestration to platform engineering 1.0. With KubeCon NA 2025 as a milestone, the industry systematically brings AI into cloud native technology and organizational stacks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes is no longer just &amp;ldquo;infrastructure for microservices,&amp;rdquo; but &amp;ldquo;runtime for AI workloads&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Platform Engineering is no longer just &amp;ldquo;tool integration,&amp;rdquo; but &amp;ldquo;autonomous platforms for models and agents&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Security, SRE, runtime, scheduling, and networking will all be reimagined under AI&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For me, the past ten years were about &amp;ldquo;making applications more stable in the cloud native world.&amp;rdquo; The next ten will focus on &amp;ldquo;making AI better, safer, and more controllable in the cloud native world.&amp;rdquo; This is, in my view, the opening whistle for cloud native&amp;rsquo;s &amp;ldquo;second half.&amp;rdquo;&lt;/p&gt;</content:encoded></item><item><title>NotebookLM: My Most Recommended AI Tool for Learning and Knowledge Organization</title><link>https://jimmysong.io/blog/notebooklm-learning-and-knowledge-organization/</link><pubDate>Mon, 17 Nov 2025 08:44:45 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/notebooklm-learning-and-knowledge-organization/</guid><description>Based on months of deep usage, this article analyzes how NotebookLM helps me learn new technologies, read complex documents, generate teaching outlines, and shares future improvement expectations.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;NotebookLM is the most tailored AI tool I&amp;rsquo;ve used for knowledge workers. It truly helps me structure massive information and dramatically boosts my learning and content creation efficiency.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As a lifelong learner who reads technical specs and researches open-source projects, I&amp;rsquo;ve always sought a tool that can &amp;ldquo;shortcut&amp;rdquo; my way through mountains of material, reduce mechanical reading, and help me quickly build a global understanding. &lt;a href="https://notebooklm.google.com" target="_blank" rel="noopener"&gt;NotebookLM&lt;/a&gt; has been the smoothest and most reliable experience for me over the past year.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not a traditional &amp;ldquo;chat-style AI tool&amp;rdquo;—it&amp;rsquo;s more like an &lt;strong&gt;AI-native learning and content organization system&lt;/strong&gt; that ingests your materials, organizes them, and presents them in various structured formats. The more I use it, the more I realize its help in learning new technologies, understanding unfamiliar fields, organizing large project documents, and building teaching materials—things that general large language models (LLM, Large Language Model) simply can&amp;rsquo;t match.&lt;/p&gt;
&lt;h2 id="the-core-value-notebooklm-brings-me"&gt;The Core Value NotebookLM Brings Me&lt;/h2&gt;
&lt;p&gt;NotebookLM has significantly improved my workflow, especially in learning new technologies, organizing documents, and content creation.&lt;/p&gt;
&lt;h2 id="quickly-understanding-new-technologies-feed-in-complex-materials-get-a-learnable-version"&gt;Quickly Understanding New Technologies: Feed in Complex Materials, Get a &amp;ldquo;Learnable Version&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;My most frequent and indispensable scenario is &lt;strong&gt;learning a completely unfamiliar technology or development framework&lt;/strong&gt;. Faced with dozens or even hundreds of pages of documentation, my typical approach is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add official docs, README files, design documents, and architecture diagrams into a single Notebook&lt;/li&gt;
&lt;li&gt;Let NotebookLM generate:
&lt;ul&gt;
&lt;li&gt;Study guides&lt;/li&gt;
&lt;li&gt;Briefings&lt;/li&gt;
&lt;li&gt;Key knowledge points&lt;/li&gt;
&lt;li&gt;FAQs&lt;/li&gt;
&lt;li&gt;Quizzes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Ultimately, I get a clearly structured &amp;ldquo;learning entry point&amp;rdquo; instead of a flood of raw materials.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following flowchart illustrates how NotebookLM compresses complex documents into a learnable structure:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/042f3817d5b5c24e7bd54b9638272151.svg" data-img="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/042f3817d5b5c24e7bd54b9638272151.svg" alt="Figure 1: NotebookLM Document Structuring Flow" data-caption="Figure 1: NotebookLM Document Structuring Flow"
width="551"
height="833"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: NotebookLM Document Structuring Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In the end, what I gain is an &amp;ldquo;organized knowledge system&amp;rdquo; rather than a pile of PDFs waiting to be consumed.&lt;/p&gt;
&lt;h2 id="generating-mindmaps-instantly-turning-large-documents-into-structured-knowledge-graphs"&gt;Generating MindMaps: Instantly Turning Large Documents into Structured Knowledge Graphs&lt;/h2&gt;
&lt;p&gt;I rely heavily on MindMaps to build the &amp;ldquo;skeleton of knowledge.&amp;rdquo; NotebookLM&amp;rsquo;s MindMap feature stands out for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Automatically identifying relationships between topics&lt;/li&gt;
&lt;li&gt;Interactive node expansion and collapse&lt;/li&gt;
&lt;li&gt;Integrating multiple source documents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although it currently only exports PNG, the logical structure itself is already an excellent &amp;ldquo;knowledge compression.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The table below compares the auto-generation and visualization capabilities of different tools:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Auto-Generation&lt;/th&gt;
&lt;th&gt;Multi-Doc Integration&lt;/th&gt;
&lt;th&gt;Visualization Quality&lt;/th&gt;
&lt;th&gt;Export Formats&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NotebookLM&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;PNG only (SVG not yet supported)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Common LLM Tools&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Poor&lt;/td&gt;
&lt;td&gt;Depends on tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MindMap Software (Manual)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Fully supported&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Comparison of MindMap Capabilities in Mainstream Tools
&lt;/figcaption&gt;
&lt;p&gt;NotebookLM&amp;rsquo;s greatest advantage is &lt;strong&gt;automation&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="generating-teaching-outlines-training-scripts-and-book-structures-truly-saving-me-time"&gt;Generating Teaching Outlines, Training Scripts, and Book Structures: Truly Saving Me Time&lt;/h2&gt;
&lt;p&gt;NotebookLM is more than just &amp;ldquo;summarization&amp;rdquo;—it can generate &lt;strong&gt;formal teaching structures&lt;/strong&gt; based on my prompts. By feeding in project docs, API references, architecture designs, case studies, videos, and blogs, and prompting it to generate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Teaching outlines&lt;/li&gt;
&lt;li&gt;Project training manuals&lt;/li&gt;
&lt;li&gt;Course structures&lt;/li&gt;
&lt;li&gt;Book chapter frameworks&lt;/li&gt;
&lt;li&gt;Slide text&lt;/li&gt;
&lt;li&gt;Training case descriptions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For anyone who needs to create content, conduct training, or give presentations, this feature is a huge time-saver.&lt;/p&gt;
&lt;p&gt;Below is a typical prompt I actually use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Based on the provided content excerpts, write a detailed training manual that systematically explains the core principles covered. The manual should use a professional and instructional tone, breaking down complex concepts into actionable steps and lessons. Ensure all content is strictly based on the source material and covers every aspect mentioned.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;The training manual should include:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1. Training objectives and expected outcomes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2. Training content and structure
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;3. Training methods and tools
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;4. Training evaluation and feedback
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;5. Training summary and follow-up actions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;6. Training cases and examples
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;7. Training resources and references
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The results are often surprisingly good.&lt;/p&gt;
&lt;h2 id="multi-format-input-capability-the-most-stable-ive-seen"&gt;Multi-Format Input Capability: The Most Stable I&amp;rsquo;ve Seen&lt;/h2&gt;
&lt;p&gt;NotebookLM supports direct ingestion of various material types, with extremely stable parsing. The table below summarizes my actual experience:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input Type&lt;/th&gt;
&lt;th&gt;My Actual Experience&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PDF&lt;/td&gt;
&lt;td&gt;Most stable, clear structure parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Docs&lt;/td&gt;
&lt;td&gt;Syncs instantly, very smooth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Word / PPT&lt;/td&gt;
&lt;td&gt;Recognized normally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YouTube Video&lt;/td&gt;
&lt;td&gt;Auto-summary + key content extraction, very useful&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Website URL&lt;/td&gt;
&lt;td&gt;Depends on site structure, high success rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plain Text&lt;/td&gt;
&lt;td&gt;No issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Images&lt;/td&gt;
&lt;td&gt;Partial success, sufficient for screenshots&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: NotebookLM Multi-Format Input Experience
&lt;/figcaption&gt;
&lt;p&gt;By contrast, other tools often have format parsing issues, garbled text, missing content, or skipped paragraphs. NotebookLM is especially stable in &amp;ldquo;multi-format ingestion.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="my-most-common-notebooklm-workflow"&gt;My Most Common NotebookLM Workflow&lt;/h2&gt;
&lt;p&gt;The following flowchart shows my daily workflow with NotebookLM:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/95790bef2620a5625da7e72caea7bb00.svg" data-img="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/95790bef2620a5625da7e72caea7bb00.svg" alt="Figure 2: NotebookLM Daily Workflow" data-caption="Figure 2: NotebookLM Daily Workflow"
width="1566"
height="532"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: NotebookLM Daily Workflow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Essentially: let AI help me grasp the big picture → then dive deeper → then output content.&lt;/p&gt;
&lt;h2 id="my-suggestions-and-minor-regrets"&gt;My Suggestions and Minor Regrets&lt;/h2&gt;
&lt;p&gt;NotebookLM is already excellent, but I still have some strong expectations for future improvements:&lt;/p&gt;
&lt;h3 id="mindmap-export-formats-should-support-svg-or-text-based-markmap"&gt;MindMap Export Formats Should Support SVG or Text-Based (Markmap)&lt;/h3&gt;
&lt;p&gt;Currently, only PNG is supported, which gets blurry when enlarged. The table below lists my expectations for future features:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Expected Feature&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SVG Export&lt;/td&gt;
&lt;td&gt;For writing books, making slides, scalable without loss&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markmap Output&lt;/td&gt;
&lt;td&gt;Most friendly for Markdown writers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw JSON&lt;/td&gt;
&lt;td&gt;Allows custom rendering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Expected MindMap Export Formats
&lt;/figcaption&gt;
&lt;p&gt;I&amp;rsquo;m especially looking forward to NotebookLM supporting &lt;a href="https://markmap.js.org" target="_blank" rel="noopener"&gt;Markmap format&lt;/a&gt; export, which would be extremely friendly for users who write blogs and docs in Markdown.&lt;/p&gt;
&lt;p&gt;Recently, Google also launched &lt;a href="https://codewiki.google" target="_blank" rel="noopener"&gt;CodeWiki&lt;/a&gt;, similar to &lt;a href="https://deepwiki.com" target="_blank" rel="noopener"&gt;DeepWiki&lt;/a&gt;, which auto-generates image-rich Wikis for GitHub projects, but currently does not support Mermaid or Markmap.&lt;/p&gt;
&lt;h3 id="conversation-history-should-support-long-term-saving"&gt;Conversation History Should Support Long-Term Saving&lt;/h3&gt;
&lt;p&gt;Currently:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chats are not persistently saved&lt;/li&gt;
&lt;li&gt;Only manually &amp;ldquo;add to notes&amp;rdquo; preserves results&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This causes some knowledge context to be lost. I hope to see a &amp;ldquo;Notebook conversation history&amp;rdquo; feature in the future.&lt;/p&gt;
&lt;h3 id="slide-generation-should-support-templates-for-content-creators"&gt;Slide Generation Should Support Templates for Content Creators&lt;/h3&gt;
&lt;p&gt;Currently, Video Overview offers various visual styles, but cannot:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Upload custom PPT templates&lt;/li&gt;
&lt;li&gt;Apply enterprise/personal branding templates&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If PPT template support is added, NotebookLM could become the &amp;ldquo;video generation hub&amp;rdquo; for content creators.&lt;/p&gt;
&lt;h3 id="deep-research-should-launch-soon-and-be-fully-open"&gt;Deep Research Should Launch Soon and Be Fully Open&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;m especially looking forward to this feature, as it could upgrade NotebookLM from a &amp;ldquo;knowledge organization tool&amp;rdquo; to a &amp;ldquo;research-grade tool.&amp;rdquo; I hope it will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reliably crawl more public web pages&lt;/li&gt;
&lt;li&gt;Ensure citation quality&lt;/li&gt;
&lt;li&gt;Integrate with existing Notebook materials&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a major upgrade I personally care about.&lt;/p&gt;
&lt;h3 id="mobile-experience-should-be-enhanced-beyond-content-playback"&gt;Mobile Experience Should Be Enhanced Beyond Content Playback&lt;/h3&gt;
&lt;p&gt;Currently, the mobile experience is minimal, only allowing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Listening to audio&lt;/li&gt;
&lt;li&gt;Viewing Notebook Guide summaries&lt;/li&gt;
&lt;li&gt;Simple Q&amp;amp;A&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I hope mobile will soon support:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Editing Notebooks&lt;/li&gt;
&lt;li&gt;Deep conversations&lt;/li&gt;
&lt;li&gt;MindMap interaction&lt;/li&gt;
&lt;li&gt;Content output (generating docs, outlines, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;NotebookLM is truly one of the AI tools I use every single day because it achieves a critical goal:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Organizing information, structuring knowledge, so I don&amp;rsquo;t have to start from scratch with massive documents.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Whether it&amp;rsquo;s:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learning new technologies&lt;/li&gt;
&lt;li&gt;Reading long documents&lt;/li&gt;
&lt;li&gt;Creating courses&lt;/li&gt;
&lt;li&gt;Conducting training&lt;/li&gt;
&lt;li&gt;Writing books&lt;/li&gt;
&lt;li&gt;Drafting speeches&lt;/li&gt;
&lt;li&gt;Summarizing content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It saves me a huge amount of time upfront, letting me focus on &amp;ldquo;understanding&amp;rdquo; and &amp;ldquo;creating.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll continue to use NotebookLM as one of my essential tools and keep an eye on its progress in Deep Research, template systems, and mobile features.&lt;/p&gt;
&lt;p&gt;This is a tool truly designed for &amp;ldquo;knowledge workers&amp;rdquo; and deserves to be known by more people.&lt;/p&gt;</content:encoded></item><item><title>Helm v4: Paradigm Convergence and Plugin System Rebuild</title><link>https://jimmysong.io/blog/helm-4-delivery-and-plugin-rebuild/</link><pubDate>Fri, 14 Nov 2025 11:18:30 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/helm-4-delivery-and-plugin-rebuild/</guid><description>An analysis of Helm 4&amp;#39;s core changes, including Server-Side Apply, WASM plugin system, kstatus status model, reproducible builds, and content hash caching, with a timeline review of Helm&amp;#39;s history.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The release of Helm 4 is not just a technical upgrade, but a deep convergence of cloud-native delivery paradigms. The rebuilt plugin system and supply chain governance capabilities make Helm once again a driving force in the Kubernetes ecosystem.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Since its first release in 2016, Helm has been one of the most important application distribution tools in the Kubernetes ecosystem. &lt;a href="https://github.com/helm/helm/releases/tag/v4.0.0" target="_blank" rel="noopener"&gt;Helm v4&lt;/a&gt; is not a &amp;ldquo;minor enhancement,&amp;rdquo; but a comprehensive update around &lt;strong&gt;delivery methods, extension mechanisms, and supply chain approaches&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This article reconstructs Helm&amp;rsquo;s historical context and focuses on why Helm 4 represents a paradigm-converging release.&lt;/p&gt;
&lt;h2 id="helm-from-tiller-to-declarative-delivery"&gt;Helm: From Tiller to Declarative Delivery&lt;/h2&gt;
&lt;p&gt;Below is a textual timeline showing key milestones from Helm v2 to v4, helping you understand its technical evolution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2016: Helm v2 released, using the Tiller architecture.&lt;/li&gt;
&lt;li&gt;2017: Chart Hub expands, major projects begin providing official Charts.&lt;/li&gt;
&lt;li&gt;2018: Security model controversies intensify, Tiller&amp;rsquo;s permission issues become apparent.&lt;/li&gt;
&lt;li&gt;2019: Helm v3 released, Tiller removed, OCI support introduced.&lt;/li&gt;
&lt;li&gt;2021: GitOps becomes widespread, Server-Side Apply (SSA) becomes the mainstream delivery semantic.&lt;/li&gt;
&lt;li&gt;2023: kstatus widely adopted for controller status assessment and health calculation.&lt;/li&gt;
&lt;li&gt;2025: Helm v4 released, bringing SSA, WASM plugins, reproducible builds, and content hash caching.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each major Helm release closely follows Kubernetes paradigms, driving progress in declarative delivery and ecosystem tooling.&lt;/p&gt;
&lt;h2 id="fundamental-changes-in-helm-v4"&gt;Fundamental Changes in Helm v4&lt;/h2&gt;
&lt;p&gt;This section analyzes the core technical upgrades and paradigm shifts in Helm v4.&lt;/p&gt;
&lt;h3 id="delivery-paradigm-update-default-server-side-apply-ssa-server-side-apply"&gt;Delivery Paradigm Update: Default Server-Side Apply (SSA, Server-Side Apply)&lt;/h3&gt;
&lt;p&gt;In Helm v3 and earlier, Helm used a &amp;ldquo;three-way merge&amp;rdquo; model for resource delivery. Helm v4 fully switches to &lt;strong&gt;Server-Side Apply (SSA, Server-Side Apply)&lt;/strong&gt;, meaning the API Server determines field ownership.&lt;/p&gt;
&lt;p&gt;This shift brings several direct results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Full semantic alignment with &lt;code&gt;kubectl apply&lt;/code&gt; and GitOps controllers (such as Argo, Flux)&lt;/li&gt;
&lt;li&gt;When multiple controllers manage the same object, silent overrides are avoided and conflicts are explainable&lt;/li&gt;
&lt;li&gt;Helm&amp;rsquo;s behavior now follows Kubernetes&amp;rsquo; officially recommended declarative delivery paradigm&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following flowchart compares the delivery semantics of Helm v3 and v4.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/helm-4-delivery-and-plugin-rebuild/f34683c90a9f13678e2cde12ab355e2f.svg" data-img="https://assets.jimmysong.io/images/blog/helm-4-delivery-and-plugin-rebuild/f34683c90a9f13678e2cde12ab355e2f.svg" alt="Figure 1: Helm v3/v4 Delivery Semantics Comparison" data-caption="Figure 1: Helm v3/v4 Delivery Semantics Comparison"
width="2400"
height="377"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Helm v3/v4 Delivery Semantics Comparison&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Helm is now aligned with the delivery semantics of modern Kubernetes versions, improving predictability and safety in resource management.&lt;/p&gt;
&lt;h3 id="kstatus-driven-wait-behavior-and-readiness-annotations"&gt;kstatus-Driven Wait Behavior and Readiness Annotations&lt;/h3&gt;
&lt;p&gt;In Helm 3, &lt;code&gt;--wait&lt;/code&gt; could only make fuzzy status judgments on limited resources, lacking extensibility and explainability.&lt;/p&gt;
&lt;p&gt;Helm 4 introduces &lt;strong&gt;kstatus (Kubernetes Status)&lt;/strong&gt; as the basis for health status parsing, and supports two key annotations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;helm.sh/readiness-success&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;helm.sh/readiness-failure&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Chart authors can precisely define conditions for installation success or failure. Helm&amp;rsquo;s waiting model now offers &amp;ldquo;explainability + extensibility,&amp;rdquo; upgrading from a &amp;ldquo;templating tool&amp;rdquo; to a true &amp;ldquo;deployment orchestrator.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="extension-system-rebuild-wasm-plugin-system"&gt;Extension System Rebuild: WASM Plugin System&lt;/h3&gt;
&lt;p&gt;Helm 4 thoroughly reconstructs the plugin model, mainly including:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Typed and Structured Plugins&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Arbitrary scripts are no longer allowed; plugins must follow typed and structured standards&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;WebAssembly Plugin Runtime (Extism)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More secure (sandbox isolation)&lt;/li&gt;
&lt;li&gt;Cross-language support&lt;/li&gt;
&lt;li&gt;Easy unified management in CI/CD and enterprise platforms&lt;/li&gt;
&lt;li&gt;Predictable and testable&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Post-renderer Integrated into Plugin System&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Moves beyond the &amp;ldquo;external executable black box&amp;rdquo; era&lt;/li&gt;
&lt;li&gt;Helm becomes a programmable platform, not just a template renderer&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="engineering-capabilities-upgrade-reproducible-builds-content-hash-caching-chart-api-v3"&gt;Engineering Capabilities Upgrade: Reproducible Builds, Content Hash Caching, chart API v3&lt;/h3&gt;
&lt;p&gt;Helm v4 brings the following engineering improvements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chart packaging is reproducible (supports signing, SBOM, SLSA, etc. for supply chain governance)&lt;/li&gt;
&lt;li&gt;Local cache uses content hashes, avoiding version-based conflicts&lt;/li&gt;
&lt;li&gt;chart API v3 (experimental) is stricter and more flexible&lt;/li&gt;
&lt;li&gt;SDK logging system upgraded to Go slog (modern logging)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities enable Helm charts to enter serious software supply chain governance.&lt;/p&gt;
&lt;h2 id="feature-comparison-helm-v3--v4"&gt;Feature Comparison (Helm v3 → v4)&lt;/h2&gt;
&lt;p&gt;The table below compares core features between Helm v3 and v4 for a quick understanding of the upgrade value.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Helm 3&lt;/th&gt;
&lt;th&gt;Helm 4&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Apply Model&lt;/td&gt;
&lt;td&gt;Three-way merge&lt;/td&gt;
&lt;td&gt;Default SSA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wait Behavior&lt;/td&gt;
&lt;td&gt;Fuzzy, not extensible&lt;/td&gt;
&lt;td&gt;kstatus + annotation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plugin System&lt;/td&gt;
&lt;td&gt;Script, uncontrollable&lt;/td&gt;
&lt;td&gt;WASM, typed plugins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-renderer&lt;/td&gt;
&lt;td&gt;External executable&lt;/td&gt;
&lt;td&gt;Plugin subsystem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build&lt;/td&gt;
&lt;td&gt;Not reproducible&lt;/td&gt;
&lt;td&gt;Reproducible build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;name/version&lt;/td&gt;
&lt;td&gt;Content hash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chart API&lt;/td&gt;
&lt;td&gt;v2&lt;/td&gt;
&lt;td&gt;v2 + v3 (experimental)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDK Logs&lt;/td&gt;
&lt;td&gt;stdlib log&lt;/td&gt;
&lt;td&gt;slog&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Helm v3 vs v4 Feature Comparison
&lt;/figcaption&gt;
&lt;p&gt;This is a release that &amp;ldquo;repays technical debt in bulk + aligns with contemporary Kubernetes semantics.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="why-is-helm-v4-a-paradigm-convergence-event"&gt;Why Is Helm v4 a Paradigm Convergence Event?&lt;/h2&gt;
&lt;p&gt;The release of Helm v4 is not just a feature upgrade, but a deep convergence of delivery paradigms, mainly in three aspects:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes Delivery Semantics Unified to SSA&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Previously: kubectl, GitOps, and Helm each had their own logic.
Now: All unified to SSA, consistent delivery behavior, smoother ecosystem collaboration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Plugin System Enters the Platform Era&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;WASM (WebAssembly) brings a secure, universal, and controllable plugin runtime. Infrastructure projects widely adopt WASM: Envoy → WASM Filters, Kubernetes → WASM CRI/OCI, and now Helm joins the platform camp.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Charts Enter Supply Chain Governance&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reproducible builds and digest verification allow Helm charts to be managed as seriously as container images, greatly enhancing supply chain security.&lt;/p&gt;
&lt;p&gt;The entire ecosystem moves to a unified capability baseline, driving cloud-native delivery standardization.&lt;/p&gt;
&lt;h2 id="my-helm-history-and-observations"&gt;My Helm History and Observations&lt;/h2&gt;
&lt;p&gt;As an early user from the Helm v2 era, I have experienced the following stages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tiller security controversies&lt;/li&gt;
&lt;li&gt;v3 migration (state stored in secrets)&lt;/li&gt;
&lt;li&gt;Large-scale chart consolidation in the community&lt;/li&gt;
&lt;li&gt;OCI adoption&lt;/li&gt;
&lt;li&gt;Today&amp;rsquo;s SSA / WASM / reproducible build&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each major Helm version upgrade is not about chasing trends, but proactively aligning with Kubernetes paradigms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;v3 aligns with K8s &amp;ldquo;no cluster-side runtime&amp;rdquo; principle&lt;/li&gt;
&lt;li&gt;v4 aligns with SSA, kstatus, WASM, OCI, and other advances from the past five years&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Helm exemplifies the evolution rhythm of infrastructure projects: &lt;strong&gt;not by piling on features, but by evolving in semantic alignment with the platform.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The release of Helm v4 marks a new paradigm for Kubernetes application delivery. SSA, WASM plugins, kstatus, and reproducible builds make Helm not just a templating tool, but a core for supply chain governance and platform extensibility. For cloud-native developers and platform teams, Helm v4 is a paradigm upgrade worth attention.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/helm/helm/releases/tag/v4.0.0" target="_blank" rel="noopener"&gt;Helm v4.0.0 Release - github.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://helm.sh/docs/overview/" target="_blank" rel="noopener"&gt;Helm Documentation Overview - helm.sh&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://artifacthub.io/" target="_blank" rel="noopener"&gt;ArtifactHub Charts Index - artifacthub.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Kimi K2 Thinking: The True Awakening of China's Thinking Model</title><link>https://jimmysong.io/blog/kimi-k2-thinking-cn-awakening/</link><pubDate>Fri, 14 Nov 2025 08:25:26 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/kimi-k2-thinking-cn-awakening/</guid><description>Kimi K2 Thinking&amp;#39;s open source marks China&amp;#39;s entry into thinking models. This article reviews its technical approach and compares it with Claude and Gemini.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;China&amp;rsquo;s large language models have finally moved from &amp;ldquo;writing like humans&amp;rdquo; to &amp;ldquo;thinking like humans.&amp;rdquo; The open-sourcing of Kimi K2 is a watershed moment for China&amp;rsquo;s AI trajectory.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The narrative around China&amp;rsquo;s large language models is shifting from &amp;ldquo;Chat-style models&amp;rdquo; to &amp;ldquo;Thinking models (Thinking Model, Thinking Model).&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Moonshot AI&amp;rsquo;s open-sourcing of &lt;strong&gt;Kimi K2 Thinking&lt;/strong&gt; marks the first real landing of this transition. K2 is not just another iteration like ChatGLM or Qwen; it&amp;rsquo;s the first time a Chinese team has unified &amp;ldquo;deep reasoning + long context + tool invocation continuity&amp;rdquo; in training. This is the core of the thinking model approach and the reason why models like Claude and Gemini have led the field.&lt;/p&gt;
&lt;h2 id="the-significance-of-k2s-open-source-china-enters-the-era-of-thinking-models"&gt;The Significance of K2&amp;rsquo;s Open Source: China Enters the Era of Thinking Models&lt;/h2&gt;
&lt;p&gt;Why is K2&amp;rsquo;s open source a turning point? Because it enables Chinese models to achieve the following capabilities for the first time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stable execution of 200–300 tool invocations (toolchain reasoning stability)&lt;/li&gt;
&lt;li&gt;Deep, multi-stage reasoning chain execution (CoT Consistency, Chain-of-Thought Consistency)&lt;/li&gt;
&lt;li&gt;256k context as a &amp;ldquo;working memory&amp;rdquo; (Working Memory, Working Memory)&lt;/li&gt;
&lt;li&gt;Native INT4 acceleration + MoE activation sparsity scheduling&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a completely different path from &amp;ldquo;stacking parameters → stacking benchmarks,&amp;rdquo; emphasizing reasoning ability over parameter scale.&lt;/p&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;K2 is the first time a Chinese model has entered the sequence of thinking models (Thinking Model, Thinking Model).&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="dissecting-k2s-technical-approach"&gt;Dissecting K2&amp;rsquo;s Technical Approach&lt;/h2&gt;
&lt;p&gt;K2&amp;rsquo;s technical approach can be broken down into five key points, each directly impacting the model&amp;rsquo;s reasoning ability and ecosystem adaptability.&lt;/p&gt;
&lt;h3 id="moe-expert-division-cognitive-division-rather-than-parameter-expansion"&gt;MoE Expert Division: Cognitive Division Rather Than Parameter Expansion&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s MoE (Mixture of Experts, Mixture of Experts) design philosophy is distinct from previous models. The core is not about activating fewer parameters or running larger models more cheaply, but about assigning different cognitive sub-skills to different experts. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mathematical reasoning expert&lt;/li&gt;
&lt;li&gt;Planning expert&lt;/li&gt;
&lt;li&gt;Tool invocation expert&lt;/li&gt;
&lt;li&gt;Browser task expert&lt;/li&gt;
&lt;li&gt;Code generation expert&lt;/li&gt;
&lt;li&gt;Long-chain retention expert&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This division aligns directly with Claude 3.5&amp;rsquo;s cognitive layering (Cognitive Layering, Cognitive Layering) approach. K2&amp;rsquo;s MoE is about &amp;ldquo;dividing thinking among the model,&amp;rdquo; not just &amp;ldquo;making computation cheaper.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="256k-context-building-the-models-working-memory"&gt;256K Context: Building the Model&amp;rsquo;s Working Memory&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s ultra-long context is not just a parameter showcase; it&amp;rsquo;s designed to build the model&amp;rsquo;s &amp;ldquo;thinking buffer.&amp;rdquo; It allows the entire process to retain reasoning chains, tool invocation states, multi-stage reflection, and uninterrupted long tasks (such as research or code refactoring), stably executing multi-stage agent workflows. Long-term thinking requires long-term memory support, and K2&amp;rsquo;s long context is the &amp;ldquo;memory&amp;rdquo; for sustained reasoning chains.&lt;/p&gt;
&lt;h3 id="intertwined-training-of-tool-invocation-and-reasoning-chains"&gt;Intertwined Training of Tool Invocation and Reasoning Chains&lt;/h3&gt;
&lt;p&gt;K2 excels in the intertwined training of tool invocation and reasoning chains. Traditional open-source models typically follow this process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Generate reasoning&lt;/li&gt;
&lt;li&gt;Output JSON function call&lt;/li&gt;
&lt;li&gt;Tool returns result&lt;/li&gt;
&lt;li&gt;Continue reasoning&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this approach, the reasoning chain and invocation chain are separated. K2&amp;rsquo;s training allows the reasoning chain to invoke tools at any time and feed tool results back into the reasoning chain for the next stage of thinking. It supports 200–300 consecutive tool invocations without interruption, fully aligning with Claude 3.5&amp;rsquo;s Interleaved CoT + Tool Use.&lt;/p&gt;
&lt;h3 id="native-int4-quantization-ensuring-reasoning-chain-stability"&gt;Native INT4 Quantization: Ensuring Reasoning Chain Stability&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s INT4 (INT4, 4-bit Integer Quantization) approach is not ordinary post-quantization. Its purpose is not only to reduce memory usage and increase throughput, but more importantly, to ensure that deep reasoning chains do not break due to insufficient computing power. The biggest killer of deep thinking chains is timeout, freezing, or unstable workers. INT4 enables Chinese GPUs (non-H100) to run complete reasoning chains, which is highly significant for China&amp;rsquo;s ecosystem.&lt;/p&gt;
&lt;h3 id="moe--long-context--toolchain-unified-training-rather-than-module-stitching"&gt;MoE + Long Context + Toolchain: Unified Training Rather Than Module Stitching&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s most important feature is its holistic training approach: expert division, long context-driven consistency, tool invocation trained through real execution, browser tasks and long-step task reinforcement, and INT4 entering the training loop. It&amp;rsquo;s not a &amp;ldquo;ChatLLM + Memory + RAG + Tools&amp;rdquo; patchwork, but an integrated reasoning system.&lt;/p&gt;
&lt;h2 id="alignment-and-differences-between-k2-and-international-mainstream-approaches"&gt;Alignment and Differences Between K2 and International Mainstream Approaches&lt;/h2&gt;
&lt;p&gt;K2 is highly aligned with international mainstream models (such as Claude, Gemini, OpenAI) in cognitive reasoning, ultra-long context, and tool invocation mechanisms, but also has unique advantages for Chinese models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Native INT4 + adaptation to Chinese computing power is rare globally&lt;/li&gt;
&lt;li&gt;Toolchain continuity is more stable than most open-source models&lt;/li&gt;
&lt;li&gt;Higher degree of open source, stronger ecosystem reusability&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="collaborative-value-of-chinas-ai-infra-k2--rlinf--mem-alpha"&gt;Collaborative Value of China&amp;rsquo;s AI Infra: K2 × RLinf × Mem-alpha&lt;/h2&gt;
&lt;p&gt;A series of important open-source infrastructures have emerged in the K2 ecosystem. The table below summarizes these project types and their value to K2:&lt;/p&gt;
&lt;p&gt;Here is a comparison table of the collaborative value of each infrastructure with K2:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Value to K2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RLinf&lt;/td&gt;
&lt;td&gt;Reinforcement Learning&lt;/td&gt;
&lt;td&gt;Used to train stronger planning/browser task capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mem-alpha&lt;/td&gt;
&lt;td&gt;Memory Enhancement&lt;/td&gt;
&lt;td&gt;Can be combined with K2 to form long-term memory agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentDebug&lt;/td&gt;
&lt;td&gt;Agent Error Debugging&lt;/td&gt;
&lt;td&gt;Used to analyze K2&amp;rsquo;s toolchain errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI-Genie&lt;/td&gt;
&lt;td&gt;GUI Agent Training&lt;/td&gt;
&lt;td&gt;Can serve as an experimental field for K2&amp;rsquo;s agent capability expansion&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Collaborative Value of China&amp;rsquo;s AI Infra Ecosystem
&lt;/figcaption&gt;
&lt;p&gt;This combination is already forming a China AI Agent Infra Stack.&lt;/p&gt;
&lt;h2 id="personal-view-the-significance-of-k2s-approach"&gt;Personal View: The Significance of K2&amp;rsquo;s Approach&lt;/h2&gt;
&lt;p&gt;I believe the significance of K2 lies not in the model itself, but in its technical approach:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;K2 marks the first time Chinese models have shifted from &amp;ldquo;language generation competition&amp;rdquo; to &amp;ldquo;thinking ability competition.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For the past three years, the main line of China&amp;rsquo;s open-source models has been evaluation scores, parameter scale, instruction following, and alignment data. But K2 is the first to clearly take the path of deep reasoning, tool intertwining, cognitive division, long-term task chains, and native performance optimization. This means China&amp;rsquo;s model trajectory is now synchronized with the US, rather than chasing old paths.&lt;/p&gt;
&lt;h2 id="key-directions-to-watch-in-k2s-ecosystem-over-the-next-year"&gt;Key Directions to Watch in K2&amp;rsquo;s Ecosystem Over the Next Year&lt;/h2&gt;
&lt;p&gt;K2&amp;rsquo;s future ecosystem influence will depend on several key points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether it opens the tool registry (Tool Registry, Tool Registry)&lt;/li&gt;
&lt;li&gt;Whether it supports dynamic memory (Mem-alpha integration)&lt;/li&gt;
&lt;li&gt;Whether it opens the MoE expert structure&lt;/li&gt;
&lt;li&gt;Whether it can form a Chinese reasoning chain optimization path with vLLM / llm-d / KServe&lt;/li&gt;
&lt;li&gt;Whether it supports fault tolerance for multi-node continuous reasoning chains&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities will determine K2&amp;rsquo;s ecosystem influence and technical extensibility.&lt;/p&gt;
&lt;h2 id="k2-thinking-model-architecture-diagram"&gt;K2 Thinking Model Architecture Diagram&lt;/h2&gt;
&lt;p&gt;The following flowchart illustrates the core architecture of the K2 thinking model and its collaboration with external agents/applications:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kimi-k2-thinking-cn-awakening/8883c2cf12acbe9362d56d664577b67c.svg" data-img="https://assets.jimmysong.io/images/blog/kimi-k2-thinking-cn-awakening/8883c2cf12acbe9362d56d664577b67c.svg" alt="Figure 1: K2 Thinking Model Architecture" data-caption="Figure 1: K2 Thinking Model Architecture"
width="1600"
height="1158"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: K2 Thinking Model Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;K2 is the first time China&amp;rsquo;s model trajectory is heading in the right direction:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;From &amp;ldquo;writing like humans&amp;rdquo; to &amp;ldquo;thinking like humans.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The era of thinking models is coming, and Chinese models are finally standing on the same roadmap as the international forefront.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://moonshotai.github.io/Kimi-K2/thinking.html" target="_blank" rel="noopener"&gt;Introducing Kimi K2 Thinking - moonshot.github.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/moonshotai/Kimi-K2-Thinking" target="_blank" rel="noopener"&gt;moonshotai/Kimi-K2-Thinking - huggingface.co&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item></channel></rss>