<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Jimmy Song – Blog</title><link>https://jimmysong.io/blog/</link><description>Recent content in Blog on Jimmy Song</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>Jimmy Song</managingEditor><webMaster>Jimmy Song</webMaster><follow_challenge><feedId>51621818828612637</feedId><userId>59800919738273792</userId></follow_challenge><lastBuildDate>Tue, 26 Aug 2025 10:14:34 +0800</lastBuildDate><atom:link href="https://jimmysong.io/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>Kubernetes as the GPU Control Plane: HAMi v2.9 and Next-Gen AI Infra</title><link>https://jimmysong.io/blog/kubernetes-gpu-control-plane-hami-v29-ai-infra/</link><pubDate>Thu, 14 May 2026 06:34:19 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/kubernetes-gpu-control-plane-hami-v29-ai-infra/</guid><description>Observations on the evolution of AI infrastructure control planes, focusing on HAMi v2.9, GPU scheduling, and Kubernetes resource models.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Recently, I&amp;rsquo;ve been following the progress of domestic GPU scheduling and Kubernetes AI resource models. With the release of &lt;a href="https://project-hami.io/blog/hami-v2-9-0-release" target="_blank" rel="noopener"&gt;HAMi v2.9&lt;/a&gt;, I want to share several observations on how the AI Infra control plane is evolving.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubernetes-gpu-control-plane-hami-v29-ai-infra/banner.webp" data-img="https://assets.jimmysong.io/images/blog/kubernetes-gpu-control-plane-hami-v29-ai-infra/banner.webp" alt="Figure 1: Kubernetes as the GPU Control Plane for AI" data-caption="Figure 1: Kubernetes as the GPU Control Plane for AI"
width="1983"
height="793"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Kubernetes as the GPU Control Plane for AI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="why-discuss-this-topic-now"&gt;Why Discuss This Topic Now&lt;/h2&gt;
&lt;p&gt;When DeepSeek R1 was released in early 2025, most people focused on the fact that it trained a model competitive with OpenAI o1 for just $5.6 million. What struck me, however, was that as inference costs plummeted, GPU utilization issues would quickly come to the forefront.&lt;/p&gt;
&lt;p&gt;As models became more useful and inference demand exploded, &amp;ldquo;one GPU per model&amp;rdquo; rapidly became a luxury. Meanwhile, the NVIDIA H200 export saga accelerated the adoption of domestic compute. First, a sales ban; then, at the end of 2025, a 25% tariff under Trump; and by January 2026, Chinese customs had cleared zero units. Policy now mandates that over 40% of data center chips must be domestically produced by 2026.&lt;/p&gt;
&lt;p&gt;The reality is harsh: not only are GPUs scarce, but you must also learn to use NVIDIA, Ascend, Cambricon, Hygon, and other very different platforms simultaneously.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s why I believe HAMi v2.9 is more significant than it appears on the surface.&lt;/p&gt;
&lt;h2 id="gpus-are-no-longer-just-about-the-card"&gt;GPUs Are No Longer Just About the Card&lt;/h2&gt;
&lt;p&gt;Kubernetes has always managed GPUs in a rather crude way:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;limits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;nvidia.com/gpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This was sufficient in 2019, when the main question was simply whether a Pod needed a GPU or not.&lt;/p&gt;
&lt;p&gt;But that&amp;rsquo;s no longer enough. An inference service might only need 4GB of VRAM, multiple small models can share a single card, training jobs care about GPU topology and interconnect bandwidth, and multi-tenancy requires fault domain isolation. Treating GPUs as integer resources is like using an abacus for statistical analysis—not impossible, but the mental model is out of sync with reality.&lt;/p&gt;
&lt;p&gt;The most notable feature in HAMi v2.9 is the HAMi-core mode for the Ascend 910C. Previously, sharing Ascend cards relied on SR-IOV hardware virtualization, which was coarse-grained and inflexible. HAMi-core takes a different approach: it uses &lt;code&gt;LD_PRELOAD&lt;/code&gt; to intercept ACL calls in user space, enabling memory isolation at the MB level and compute throttling by percentage.&lt;/p&gt;
&lt;p&gt;In short: it&amp;rsquo;s managed by software, not hardware slicing.&lt;/p&gt;
&lt;p&gt;This is reminiscent of how SDN abstracted the network control plane from hardware devices—GPU partitioning is shifting from a hardware capability to a cluster control plane capability. Considering that Huawei shipped 810,000 Ascend 910C cards last year—nearly half of all domestic chips—this capability has significant real-world impact.&lt;/p&gt;
&lt;h2 id="dra-kubernetes-finally-has-a-robust-device-resource-model"&gt;DRA: Kubernetes Finally Has a Robust Device Resource Model&lt;/h2&gt;
&lt;p&gt;Kubernetes v1.34 (September 2025) officially promoted DRA (Dynamic Resource Allocation) to GA, and Red Hat OpenShift 4.21 followed suit. This is a big deal.&lt;/p&gt;
&lt;p&gt;The Device Plugin solved &amp;ldquo;how to connect GPUs to K8s,&amp;rdquo; but not &amp;ldquo;how to express complex AI resource requirements.&amp;rdquo; Device Plugins only know how many cards are on a node, not how much VRAM you need, what topology, or what isolation level.&lt;/p&gt;
&lt;p&gt;DRA standardizes device resource declaration, allocation, and management via &lt;code&gt;ResourceClaim&lt;/code&gt; and &lt;code&gt;DeviceClass&lt;/code&gt;. HAMi-DRA takes a pragmatic approach: it doesn&amp;rsquo;t require users to change how they declare resources. Instead, it uses a Mutating Webhook to automatically convert existing Device Plugin-style declarations into the DRA model. Legacy systems don&amp;rsquo;t need to change, but can still leverage new capabilities.&lt;/p&gt;
&lt;p&gt;I liken this to what CSI did for storage: it didn&amp;rsquo;t eliminate vendor differences, but allowed Kubernetes to consume different storage capabilities in a unified way. DRA does the same for AI accelerators—NVIDIA, Ascend, AMD, Vastai cards will never be identical, but the scheduling layer should speak a common language.&lt;/p&gt;
&lt;h2 id="a-complete-control-plane-path"&gt;A Complete Control Plane Path&lt;/h2&gt;
&lt;p&gt;If we look beyond individual features and consider HAMi-core, DRA, CDI, and the scheduler together, they actually correspond to different layers of GPU resource management:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HAMi-core&lt;/strong&gt;: How to partition and isolate devices internally&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DRA&lt;/strong&gt;: How to declare, allocate, and bind resources&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CDI&lt;/strong&gt;: How to standardize device injection into container runtimes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduler/Webhook&lt;/strong&gt;: How to schedule, admit, and observe&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Connecting these layers, from top to bottom, forms the complete Kubernetes GPU Control Plane:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubernetes-gpu-control-plane-hami-v29-ai-infra/kubernetes-gpu-control-plane-en.svg" data-img="https://assets.jimmysong.io/images/blog/kubernetes-gpu-control-plane-hami-v29-ai-infra/kubernetes-gpu-control-plane-en.svg" alt="Figure 2: Kubernetes as the GPU Control Plane for AI Workloads" data-caption="Figure 2: Kubernetes as the GPU Control Plane for AI Workloads"
width="1296"
height="1302"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Kubernetes as the GPU Control Plane for AI Workloads&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is a complete control plane path. In v2.9, Volcano vGPU was upgraded to v0.19 with enhanced CDI support. While this may seem like a minor improvement in device injection, it actually completes a critical link in this chain.&lt;/p&gt;
&lt;h2 id="heterogeneity-is-the-main-battlefield-for-domestic-ai-infra"&gt;Heterogeneity Is the Main Battlefield for Domestic AI Infra&lt;/h2&gt;
&lt;p&gt;The reality for domestic AI clusters: you can&amp;rsquo;t build infrastructure around just one type of GPU.&lt;/p&gt;
&lt;p&gt;Enterprise environments often have NVIDIA, Ascend, Biren, Cambricon, Hygon, Muxi, Kunlunxin, Vastai, and other devices coexisting. Each card has different drivers, runtimes, virtualization capabilities, and monitoring methods. HAMi v2.9 adds support for Vastai, covering more than ten types of heterogeneous compute devices. Mixed training and inference, online and offline workloads, domestic and overseas GPUs, multi-team and multi-tenant resource pools—in these scenarios, unified scheduling is far more important than single-card performance.&lt;/p&gt;
&lt;h2 id="key-judgments"&gt;Key Judgments&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;GPU sharing will shift from a cost-saving measure to a default requirement.&lt;/strong&gt; After the explosion of inference workloads, not every workload deserves exclusive access to an entire card. Exclusive allocation will increasingly become a luxury.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;DRA is the way forward, but migration will be gradual.&lt;/strong&gt; The Device Plugin ecosystem is too large to disappear overnight. HAMi-DRA&amp;rsquo;s compatibility layer shows the project team understands this.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Heterogeneous scheduling will become the core challenge for AI Infra.&lt;/strong&gt; Whoever can abstract different vendor devices into a unified scheduling language will control the key position in the control plane.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes will be reshaped by AI workloads.&lt;/strong&gt; From scheduling semantics to resource models, AI requires much greater expressiveness than traditional web services. DRA, CDI, topology-aware scheduling—these are not isolated evolutions, but all point to one thing: Kubernetes is evolving from a container orchestrator to the control plane for AI computing.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The significance of HAMi v2.9 is not just in supporting a particular device or partitioning method, but in making one thing clear: the next generation of AI infrastructure competition is not just about model frameworks or GPU counts, but about the &lt;strong&gt;control plane&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;GPUs are shifting from external devices on nodes to native resources within the Kubernetes control plane. Whoever defines the resource model for the AI era will define the long-term boundaries of AI Infra.&lt;/p&gt;</content:encoded></item><item><title>Kubernetes's Anxiety and Rebirth in the AI Wave</title><link>https://jimmysong.io/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/</link><pubDate>Fri, 03 Apr 2026 05:20:28 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/</guid><description>At KubeCon EU 2026, I witnessed Kubernetes&amp;#39; anxiety and transformation in the AI era. This article explores the challenges and future opportunities for Kubernetes in the age of AI.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Kubernetes hasn&amp;rsquo;t been replaced by AI, but it&amp;rsquo;s being redefined by it. Anxiety is the prelude to rebirth.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After attending KubeCon EU 2026 in Amsterdam, I&amp;rsquo;ve been pondering a key question: Kubernetes isn&amp;rsquo;t obsolete, but it&amp;rsquo;s no longer &amp;ldquo;enough&amp;rdquo;; it hasn&amp;rsquo;t been replaced by AI, but it&amp;rsquo;s being redefined by AI.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/keep-cloud-native-moving.webp" data-img="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/keep-cloud-native-moving.webp" alt="Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date." data-caption="Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date."
width="2048"
height="1365"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: KubeCon EU 2026 slogan: Keep Cloud Native Moving. This event had over 13,000 registrations, making it the largest KubeCon to date.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This was my third time attending KubeCon in Europe. Over the past few years, you can actually see the community&amp;rsquo;s mindset shift through the event slogans:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;2024 Paris: &lt;strong&gt;La vie en Cloud Native&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;→ Cloud Native has become a &amp;ldquo;way of life,&amp;rdquo; the default state&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;2025 London: &lt;strong&gt;No slogan, just the 10th anniversary&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;→ Kubernetes reached a milestone, focusing on retrospection rather than moving forward&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;2026 Amsterdam: &lt;strong&gt;Keep Cloud Native Moving&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;→ But the question is: where is it moving?&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The absence of a slogan in 2025 was a signal in itself:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When an ecosystem starts commemorating the past instead of defining the future, it&amp;rsquo;s already at an inflection point.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This article doesn&amp;rsquo;t recap the talks, but instead distills my observations at KubeCon into insights about Kubernetes&amp;rsquo; anxiety and rebirth in the AI wave.&lt;/p&gt;
&lt;h2 id="the-root-of-anxiety-is-kubernetes-facing-a-crisis"&gt;The Root of Anxiety: Is Kubernetes Facing a &amp;ldquo;Crisis&amp;rdquo;?&lt;/h2&gt;
&lt;p&gt;The biggest change at KubeCon was that &lt;strong&gt;AI has completely replaced traditional cloud native topics&lt;/strong&gt;. The focus shifted from service optimization and microservices management to how to deploy and manage AI workloads on Kubernetes, especially inference tasks and GPU scheduling.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/maintainer-summit.webp" data-img="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/maintainer-summit.webp" alt="Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI." data-caption="Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI."
width="4000"
height="2668"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Before KubeCon officially started, the Maintainer Summit was all about AI.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Kubernetes, as the foundational infrastructure, was once the core of the cloud native world. With the explosive growth of AI models, &lt;strong&gt;the question now is whether Kubernetes can still serve as a &amp;ldquo;universal&amp;rdquo; platform for everything&lt;/strong&gt;, which has become a new source of anxiety.&lt;/p&gt;
&lt;p&gt;The AI boom brings real challenges: &lt;strong&gt;Can Kubernetes&amp;rsquo; &amp;ldquo;universality&amp;rdquo; adapt to the complexity of AI workloads?&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="the-focus-brought-by-the-ai-boom"&gt;The Focus Brought by the AI Boom&lt;/h2&gt;
&lt;p&gt;AI&amp;rsquo;s popularity has shifted the cloud native spotlight entirely to artificial intelligence. AI coding, OpenClaw, large language models, and generative models have all drawn widespread attention. AI has become the core computing demand in the real world.&lt;/p&gt;
&lt;p&gt;This surge in demand raises the question: Can Kubernetes continue to serve as the infrastructure platform for complex tasks? Especially with issues like GPU sharing, inference model scheduling, VRAM allocation, and device attribute selection, is the traditional Kubernetes resource model sufficient?&lt;/p&gt;
&lt;p&gt;In the past, Kubernetes handled compute, storage, and networking as foundational infrastructure. But with the rapid development of AI, its &amp;ldquo;universality&amp;rdquo; is being challenged. Particularly for inference tasks, Kubernetes&amp;rsquo; model appears thin.&lt;/p&gt;
&lt;h2 id="comparing-with-openstack-will-kubernetes-repeat-history"&gt;Comparing with OpenStack: Will Kubernetes Repeat History?&lt;/h2&gt;
&lt;p&gt;OpenStack once aimed to be a complete open-source cloud platform, but ultimately failed to sustain growth due to &lt;strong&gt;complexity&lt;/strong&gt; and a lack of &lt;strong&gt;flexibility&lt;/strong&gt; in adapting to new technologies.&lt;/p&gt;
&lt;p&gt;Will Kubernetes follow the same path? I believe Kubernetes has different strengths: as a container and microservices orchestration platform, it&amp;rsquo;s widely adopted and has strong community and vendor support. It doesn&amp;rsquo;t try to replace all cloud provider capabilities but serves as an infrastructure control plane to help users manage resources.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/maintainers-summit-group-photo.webp" data-img="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/maintainers-summit-group-photo.webp" alt="Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality." data-caption="Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality."
width="2048"
height="1365"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Cloud native contributors remain active. The crowd at the KubeCon EU 2026 Maintainer Summit shows the community’s vitality.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;However, as AI workloads become mainstream, Kubernetes must find a new position to avoid being replaced by &amp;ldquo;AI-optimized platforms.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="kubernetes-challenge-the-gpu-resource-management-gap"&gt;Kubernetes&amp;rsquo; Challenge: The GPU Resource Management Gap&lt;/h3&gt;
&lt;p&gt;At KubeCon, NVIDIA announced the donation of the &lt;strong&gt;&lt;a href="https://github.com/kubernetes-sigs/nvidia-dra-driver-gpu" target="_blank" rel="noopener"&gt;GPU DRA&lt;/a&gt; (Dynamic Resource Allocation) driver&lt;/strong&gt; to the CNCF, marking the upstreaming of GPU resource management. GPU sharing and scheduling have become urgent issues for Kubernetes.&lt;/p&gt;
&lt;p&gt;Traditionally, Kubernetes relied on the &lt;strong&gt;Device Plugin&lt;/strong&gt; model to schedule GPUs, only supporting allocation by device count (e.g., &lt;code&gt;nvidia.com/gpu: 1&lt;/code&gt;). But for AI inference tasks, more information is needed for resource scheduling, such as &lt;strong&gt;VRAM size&lt;/strong&gt;, &lt;strong&gt;GPU topology&lt;/strong&gt;, and &lt;strong&gt;sharing strategies&lt;/strong&gt;. NVIDIA DRA makes GPU resource management more flexible and intelligent, gradually easing the &amp;ldquo;GPU resource crunch&amp;rdquo; in AI workloads.&lt;/p&gt;
&lt;p&gt;This shift means Kubernetes is no longer just a &amp;ldquo;container orchestration platform,&amp;rdquo; but is becoming the &lt;strong&gt;infrastructure layer for AI-specific resource scheduling&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Against this backdrop, both the community and industry are exploring finer-grained GPU resource abstraction and scheduling mechanisms. For example, the open-source project &lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; is building a GPU resource management layer for AI workloads on top of Kubernetes, supporting GPU sharing, VRAM-level allocation, and heterogeneous device scheduling.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/hami-kubecon-demo.webp" data-img="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/hami-kubecon-demo.webp" alt="Figure 4: HAMi demo at KubeCon EU 2026 Keynote" data-caption="Figure 4: HAMi demo at KubeCon EU 2026 Keynote"
width="2048"
height="1365"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: HAMi demo at KubeCon EU 2026 Keynote&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;These efforts are not about replacing Kubernetes, but about filling the resource model gaps for the AI era. In the long run, this layer may evolve into a &amp;ldquo;GPU Abstraction Layer&amp;rdquo; similar to CNI/CSI, becoming a key part of AI-native infrastructure.&lt;/p&gt;
&lt;h3 id="the-production-gap-many-ai-pocs-few-in-production"&gt;The Production &amp;ldquo;Gap&amp;rdquo;: Many AI PoCs, Few in Production&lt;/h3&gt;
&lt;p&gt;A common post-event summary was: &lt;strong&gt;Many PoCs, but &amp;ldquo;everyday production deployments&amp;rdquo; are still rare&lt;/strong&gt;. Pulumi summarized it as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;lots of working demos, very few production setups people trust&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This shows that while many AI workload solutions succeed in technical demos, the transition from &lt;strong&gt;experimentation to production&lt;/strong&gt; remains difficult. Whether it&amp;rsquo;s GPU resource sharing or inference request scheduling, &lt;strong&gt;whether Kubernetes as the foundation can support this transformation&lt;/strong&gt; is still an open question.&lt;/p&gt;
&lt;h2 id="the-rise-of-inference-systems-kubernetes-scheduling-boundaries-are-challenged"&gt;The Rise of Inference Systems: Kubernetes&amp;rsquo; Scheduling Boundaries Are Challenged&lt;/h2&gt;
&lt;p&gt;Another major event at this KubeCon was &lt;a href="https://github.com/llm-d/llm-d" target="_blank" rel="noopener"&gt;llm-d&lt;/a&gt; being contributed to the CNCF as a Sandbox project.&lt;/p&gt;
&lt;p&gt;If GPU DRA represents the upstreaming of device resource models, then llm-d represents another critical evolution: &lt;strong&gt;Distributed LLM inference capabilities are moving from proprietary engineering implementations to standardized, community-driven collaboration in cloud native.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is significant not just because it&amp;rsquo;s another open-source project, but because it shows that Kubernetes&amp;rsquo; challenges in the AI era are no longer just about &amp;ldquo;how to schedule GPUs,&amp;rdquo; but also &amp;ldquo;how to host inference systems themselves.&amp;rdquo; As prefill/decode separation, request routing, KV cache management, and throughput optimization move into the infrastructure layer, Kubernetes&amp;rsquo; boundaries are being redefined.&lt;/p&gt;
&lt;p&gt;Traditionally, the Kubernetes scheduler focused on Pod scheduling. But in AI inference scenarios, scheduling is not just about picking a node—it&amp;rsquo;s about &lt;strong&gt;selecting the most suitable inference instance based on request characteristics&lt;/strong&gt;. Factors like model state, request queue depth, and cache hit rate all need to be considered. This process is increasingly managed by inference runtimes, forming new &amp;ldquo;request-level scheduling&amp;rdquo; systems.&lt;/p&gt;
&lt;p&gt;This leads to an &lt;strong&gt;overlap between the Kubernetes scheduler and inference systems&lt;/strong&gt;, forcing Kubernetes to rethink its role: should it keep expanding, or collaborate with inference systems?&lt;/p&gt;
&lt;h2 id="ai-native-infrastructure-the-key-challenge-for-production"&gt;AI-Native Infrastructure: The Key Challenge for Production&lt;/h2&gt;
&lt;p&gt;At the &lt;strong&gt;AI Native Summit&lt;/strong&gt;, the real needs for AI-native infrastructure were especially clear. The focus was no longer &amp;ldquo;can it run on Kubernetes,&amp;rdquo; but how to make AI workloads routine, stable, and production-ready on Kubernetes.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/ai-native-summit.webp" data-img="https://assets.jimmysong.io/images/blog/kubernetes-in-ai-wave-anxiety-and-rebirth/ai-native-summit.webp" alt="Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era." data-caption="Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era."
width="2970"
height="1980"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: At the AI Native Summit after KubeCon, Linux Foundation Chairman Jonathan said cloud native is entering the AI-native era.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The core challenge is &lt;strong&gt;delivery&lt;/strong&gt;. Unlike traditional apps, AI model weights are often huge—tens of GB or even TB—making model delivery and data management extremely complex. Traditional container delivery systems (like image layers) struggle with such massive data and complex versioning.&lt;/p&gt;
&lt;p&gt;A key direction for Kubernetes is to &lt;strong&gt;standardize model weight and data delivery&lt;/strong&gt;, using &lt;strong&gt;ImageVolume&lt;/strong&gt; and &lt;strong&gt;OCI artifacts&lt;/strong&gt; to solve AI model delivery and version management on Kubernetes. This not only reduces &amp;ldquo;cold start&amp;rdquo; times but also provides infrastructure support for multi-tenancy and compliance.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Kubernetes won&amp;rsquo;t be replaced by AI, but it&amp;rsquo;s being reshaped as the core of infrastructure. This anxiety is the force driving its evolution—it&amp;rsquo;s moving from a &lt;strong&gt;&amp;ldquo;general-purpose infrastructure platform&amp;rdquo;&lt;/strong&gt; to an &lt;strong&gt;&amp;ldquo;AI-powered multifunctional base&amp;rdquo;&lt;/strong&gt;. Some even call it the AI operating system.&lt;/p&gt;
&lt;p&gt;In the future, Kubernetes&amp;rsquo; core competitiveness will no longer be just container management, but &lt;strong&gt;how effectively it can schedule and manage AI workloads&lt;/strong&gt;, and how it can make AI a routine part of operations. This was my biggest takeaway from the AI Native Summit and KubeCon, and it&amp;rsquo;s what I look forward to in the Kubernetes ecosystem over the next few years.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blogs.nvidia.com/blog/nvidia-at-kubecon-2026/" target="_blank" rel="noopener"&gt;Advancing Open Source AI, NVIDIA Donates Dynamic Resource Allocation Driver for GPUs to Kubernetes Community - blog.nvidia.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.pulumi.com/blog/kubecon-eu-2026-recap/" target="_blank" rel="noopener"&gt;KubeCon EU 2026 Recap: The Year AI Moved Into Production on Kubernetes - pulumi.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Day One in Amsterdam: Kubernetes Is Rethinking AI</title><link>https://jimmysong.io/blog/kubecon-eu-2026-day1-ai-infra/</link><pubDate>Sun, 22 Mar 2026 20:41:19 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/kubecon-eu-2026-day1-ai-infra/</guid><description>KubeCon Europe 2026 Day One: How Kubernetes is adapting to the AI infrastructure wave and the evolution of the GPU resource layer.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Today marks my first day at &lt;strong&gt;KubeCon Europe 2026&lt;/strong&gt;. The most striking feeling is: the world is vast, but this community is truly small.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/jimmy-at-kubecon-eu.webp" data-img="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/jimmy-at-kubecon-eu.webp" alt="Figure 11: Jimmy on the first day of KubeCon EU 2026" data-caption="Figure 11: Jimmy on the first day of KubeCon EU 2026"
width="2400"
height="2400"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 11: Jimmy on the first day of KubeCon EU 2026&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;One strong impression stands out:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The world is big, but this circle is really small.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="old-friends-new-cycle"&gt;&lt;strong&gt;Old Friends, New Cycle&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;At the Maintainer Summit, I met many familiar faces—&lt;/p&gt;
&lt;p&gt;Colleagues from Ant Group, friends from Tetrate, and some people I&amp;rsquo;ve known for nearly a decade. Together, we&amp;rsquo;ve journeyed from the early days of Kubernetes, Service Mesh, and cloud native infrastructure to today.&lt;/p&gt;
&lt;p&gt;In a sense, this generation has fully experienced:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The rise of Kubernetes&lt;/li&gt;
&lt;li&gt;The standardization of Cloud Native&lt;/li&gt;
&lt;li&gt;The microservices and service mesh boom&lt;/li&gt;
&lt;li&gt;And now, the era of AI Infrastructure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This isn&amp;rsquo;t about &amp;ldquo;new people entering the field,&amp;rdquo; but rather—&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The same group stepping into a new technology cycle.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="what-is-the-maintainer-summit-discussing"&gt;&lt;strong&gt;What Is the Maintainer Summit Discussing?&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;If you ask:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What is the Kubernetes community most concerned about right now?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Today&amp;rsquo;s answer is very clear:&lt;/p&gt;
&lt;p&gt;👉 &lt;strong&gt;How to run AI workloads better on Kubernetes&lt;/strong&gt;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/kubecon-eu-maintainer-summit.webp" data-img="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/kubecon-eu-maintainer-summit.webp" alt="Figure 12: The Maintainer Summit’s main topic is AI Infra" data-caption="Figure 12: The Maintainer Summit’s main topic is AI Infra"
width="1440"
height="960"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 12: The Maintainer Summit’s main topic is AI Infra&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Many topics at the Maintainer Summit revolved around:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scheduling models for LLM / AI workloads&lt;/li&gt;
&lt;li&gt;GPU / accelerator resource management&lt;/li&gt;
&lt;li&gt;Integrating inference systems with Kubernetes&lt;/li&gt;
&lt;li&gt;Redefining the roles of data plane vs. control plane&lt;/li&gt;
&lt;li&gt;How observability tools like OTel monitor AI workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Kubernetes hasn&amp;rsquo;t been replaced by AI; it&amp;rsquo;s actively &amp;ldquo;absorbing&amp;rdquo; AI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="key-signal-gpus-are-becoming-the"&gt;&lt;strong&gt;Key Signal: GPUs Are Becoming the &amp;ldquo;Infrastructure Layer&amp;rdquo;&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Today, I had an in-depth discussion with CNCF TOC, Red Hat, and the vLLM community.&lt;/p&gt;
&lt;p&gt;The core question was:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;How should GPUs be &amp;ldquo;platformized&amp;rdquo;?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Some consensus is already clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GPUs are no longer just devices&lt;/li&gt;
&lt;li&gt;They are now &lt;strong&gt;a schedulable, partitionable, and shareable resource layer&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/toc-meeting.webp" data-img="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/toc-meeting.webp" alt="Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration" data-caption="Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration"
width="2400"
height="1648"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 13: TOC meeting discussing GPU resource management and LLM Serving integration&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;At the Maintainer Summit in Amsterdam, we had deep discussions with CNCF TOC, Red Hat, and the vLLM community about GPU resource management and LLM Serving integration in Kubernetes scenarios, and explored potential collaboration between vLLM and HAMi.&lt;/p&gt;
&lt;p&gt;Behind this is a major paradigm shift:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Past&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Now&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU = Node resource&lt;/td&gt;
&lt;td&gt;GPU = Infrastructure layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exclusive use&lt;/td&gt;
&lt;td&gt;Multi-tenant sharing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Static binding&lt;/td&gt;
&lt;td&gt;Dynamic scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managed within frameworks&lt;/td&gt;
&lt;td&gt;Unified management at the platform layer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This is exactly what we&amp;rsquo;ve been working on in &lt;a href="https://github.com/project-hami/hami" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="hami-from"&gt;&lt;strong&gt;HAMi: From &amp;ldquo;Project&amp;rdquo; to &amp;ldquo;Reference Pattern&amp;rdquo;&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;Another interesting change today:&lt;/p&gt;
&lt;p&gt;HAMi is no longer just a &amp;ldquo;community project&amp;rdquo;—it&amp;rsquo;s becoming:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;A reference implementation (reference pattern) for AI Infra&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/kubecon-eu-maintainer-summit-hami.webp" data-img="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/kubecon-eu-maintainer-summit-hami.webp" alt="Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit" data-caption="Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit"
width="1440"
height="960"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 14: Li Mengxuan, CTO of Dynamia, sharing HAMi’s design and practice at KubeCon EU 2026 Maintainer Summit&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is reflected in several ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Invited to present at the Maintainer Summit&lt;/li&gt;
&lt;li&gt;Participating in CNCF TOC discussions&lt;/li&gt;
&lt;li&gt;Involved in incubating review demos&lt;/li&gt;
&lt;li&gt;Exploring joint content with the vLLM community (even discussing a joint blog 👀)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Especially in conversations with Red Hat and vLLM, a clear trend emerged:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GPU resource management and LLM serving are becoming coupled&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Upper layer: vLLM / inference frameworks&lt;/li&gt;
&lt;li&gt;Lower layer: GPU scheduling / sharing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A new &amp;ldquo;interface layer&amp;rdquo; is gradually forming.&lt;/p&gt;
&lt;p&gt;This is a direction worth betting on.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/incubating-review.webp" data-img="https://assets.jimmysong.io/images/blog/kubecon-eu-2026-day1-ai-infra/incubating-review.webp" alt="Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo" data-caption="Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo"
width="2400"
height="1489"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 15: At the TAG Workshop, HAMi was discussed as an Incubating demo&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="a-caution-the-ai-infra-startup-boom-hasn"&gt;&lt;strong&gt;A Caution: The AI Infra Startup Boom Hasn&amp;rsquo;t Really Begun&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;At the same time, I have a somewhat &amp;ldquo;counterintuitive&amp;rdquo; observation:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;We haven&amp;rsquo;t yet seen a large wave of AI Infra (K8s-focused) startups.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Most companies I saw today:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many are pivoting from CI/CD, Service Mesh, or Gateway&lt;/li&gt;
&lt;li&gt;Many are traditional cloud vendors extending into AI&lt;/li&gt;
&lt;li&gt;Many are working on models, agents, or even lower-level tech&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But those truly focused on:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Making AI workloads run better on Kubernetes&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are actually not many startups at this layer.&lt;/p&gt;
&lt;p&gt;This could mean two things:&lt;/p&gt;
&lt;h3 id="1-this-layer-isn"&gt;&lt;strong&gt;1) This Layer Isn&amp;rsquo;t Fully Formed Yet&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Currently, most activity is at:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model layer (LLM / foundation models)&lt;/li&gt;
&lt;li&gt;The application layer (Agent / Copilot)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But not at:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The scheduling layer&lt;/li&gt;
&lt;li&gt;The resource layer&lt;/li&gt;
&lt;li&gt;The runtime layer&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="2-or-the-barrier-to-entry-is-very-high"&gt;&lt;strong&gt;2) Or, the Barrier to Entry Is Very High&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;Because at its core, this is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The intersection of Cloud Native × GPU × AI workload&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It&amp;rsquo;s not just &amp;ldquo;wrapping AI,&amp;rdquo; but a fundamental re-architecture at the infrastructure level.&lt;/p&gt;
&lt;h2 id="my-take"&gt;&lt;strong&gt;My Take&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;If we break down the AI technology stack:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Agent / Application
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;LLM Serving (vLLM, etc.)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;AI Runtime / Scheduling
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;GPU Resource Layer
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Hardware
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Most innovation today is concentrated in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The top two layers (Agent / LLM)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the real long-term moat lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The middle two layers (Runtime + Resource Layer)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And Kubernetes is very likely to remain:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The default platform for this middle layer&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Today&amp;rsquo;s takeaway:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Kubernetes is not obsolete; it&amp;rsquo;s being redefined.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And our generation is shifting from:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Cloud Native Builders&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;to:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;AI Infrastructure Builders&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;More to come tomorrow.&lt;/p&gt;</content:encoded></item><item><title>HAMi Website Refactor: Why HAMi Docs and Website Underwent a Complete Redesign</title><link>https://jimmysong.io/blog/hami-website-redesign-retrospective/</link><pubDate>Tue, 17 Mar 2026 08:55:52 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/hami-website-redesign-retrospective/</guid><description>A systematic upgrade to HAMi’s website and docs, improving community visibility, content structure, search, and usability.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;This redesign is more than a style update—it&amp;rsquo;s a step toward clearer technical communication and better user experience. Try the new HAMi website at &lt;a href="https://project-hami.io" target="_blank" rel="noopener"&gt;https://project-hami.io&lt;/a&gt; and submit issues &lt;a href="https://github.com/Project-HAMi/website/issues" target="_blank" rel="noopener"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Over the past two months, I conducted a thorough refactor of the documentation website (see &lt;a href="https://github.com/Project-HAMi/website/pulls?q=is%3Apr&amp;#43;is%3Aclosed&amp;#43;author%3Arootsongjc" target="_blank" rel="noopener"&gt;GitHub&lt;/a&gt;). Externally, it looks like a &amp;ldquo;visual redesign&amp;rdquo;, but from the perspective of community maintainers and content builders, it&amp;rsquo;s a comprehensive upgrade of information architecture, content system, and frontend experience.&lt;/p&gt;
&lt;p&gt;This article aims to systematically explain three things: why we did this refactor, what exactly changed, and what these changes mean for the HAMi community.&lt;/p&gt;
&lt;h2 id="why-refactor-the-website-and-documentation"&gt;Why Refactor the Website and Documentation&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/project-hami/hami" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; is a CNCF-hosted open source project initiated and contributed by &lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia&lt;/a&gt;, with growing influence in GPU virtualization, heterogeneous compute scheduling, and AI infrastructure. The community content is expanding, and user types are becoming more diverse: from first-time visitors to engineers and enterprise users seeking deployment docs, architecture diagrams, case studies, and ecosystem information.&lt;/p&gt;
&lt;p&gt;The original site was functional, but as content grew, several issues became apparent:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The homepage lacked information density, making it hard to quickly grasp the project&amp;rsquo;s overall value.&lt;/li&gt;
&lt;li&gt;Connections between docs, blogs, and community info were not smooth; content entry points were scattered.&lt;/li&gt;
&lt;li&gt;Search experience was unstable; external solutions were not ideal in practice.&lt;/li&gt;
&lt;li&gt;Mobile experience had many details needing improvement, especially navigation, card layouts, and footer areas.&lt;/li&gt;
&lt;li&gt;Visual style was inconsistent, making it hard to convey community influence and engineering maturity.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a fast-evolving open source community, the website is not just a &amp;ldquo;place for docs&amp;rdquo;, but the public interface of the community. It needs to serve as project introduction, knowledge gateway, adoption proof, community connector, and brand expression.&lt;/p&gt;
&lt;p&gt;So the goal of this refactor was clear: not just superficial beautification, but to truly upgrade the website into HAMi&amp;rsquo;s systematic community entry point.&lt;/p&gt;
&lt;h2 id="what-was-done-in-this-refactor"&gt;What Was Done in This Refactor&lt;/h2&gt;
&lt;p&gt;This update was not a single-point change, but a series of systematic improvements.&lt;/p&gt;
&lt;h3 id="homepage-redesign-and-complete-information-architecture-overhaul"&gt;Homepage Redesign and Complete Information Architecture Overhaul&lt;/h3&gt;
&lt;p&gt;The most obvious change is the homepage.&lt;/p&gt;
&lt;p&gt;We redesigned the homepage structure, moving away from simply stacking content blocks, and instead organizing the page around the main narrative: &amp;ldquo;Project Positioning → Core Capabilities → Ecosystem Entry → Content Accumulation → Community Trust&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Specifically, the homepage received several key upgrades:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rebuilt the Hero section to strengthen first-screen information delivery and action entry.&lt;/li&gt;
&lt;li&gt;Optimized CTA design so users can quickly access docs, blogs, and resources.&lt;/li&gt;
&lt;li&gt;Added and enhanced multiple homepage sections to showcase project value and community reach in a more structured way.&lt;/li&gt;
&lt;li&gt;Adjusted visual hierarchy, background atmosphere, and scroll rhythm, transforming the homepage from a &amp;ldquo;content list&amp;rdquo; into a &amp;ldquo;narrative page&amp;rdquo;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These changes include Hero animations and atmosphere layers, research/story sections, new resource entry sections, refreshed CTAs, unified background design, and ongoing reduction of visual noise. Together, they solve a core problem: enabling visitors to understand what HAMi is and why it&amp;rsquo;s worth exploring further within seconds.&lt;/p&gt;
&lt;h3 id="architecture-diagrams"&gt;Architecture Diagrams&lt;/h3&gt;
&lt;p&gt;Key diagrams were redrawn for clearer technical communication. This helps users grasp HAMi&amp;rsquo;s role in AI infrastructure.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/hami-hero-diagram.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/hami-hero-diagram.webp" alt="Figure 1: HAMi website homepage architecture diagram" data-caption="Figure 1: HAMi website homepage architecture diagram"
width="3160"
height="1714"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: HAMi website homepage architecture diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For HAMi, this change is critical. The community faces not just a single feature, but a set of system-level challenges involving Kubernetes, schedulers, GPU Operators, heterogeneous devices, and enterprise platforms. Improved diagrams make the website a better technical entry point.&lt;/p&gt;
&lt;h3 id="added-case-studies-community-and-ecosystem-sections-to-make-impact-visible"&gt;Added Case Studies, Community, and Ecosystem Sections to Make Impact Visible&lt;/h3&gt;
&lt;p&gt;Another important direction was strengthening the &amp;ldquo;community proof&amp;rdquo; layer.&lt;/p&gt;
&lt;p&gt;Many open source project sites fall into the trap of having complete docs, but users can&amp;rsquo;t tell if the project is truly adopted, if the community is active, or if the ecosystem is expanding. The HAMi website redesign consciously addresses this.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/ecosystem.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/ecosystem.webp" alt="Figure 2: HAMi ecosystem and device support" data-caption="Figure 2: HAMi ecosystem and device support"
width="2200"
height="454"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: HAMi ecosystem and device support&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/adopters.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/adopters.webp" alt="Figure 3: HAMi adopters" data-caption="Figure 3: HAMi adopters"
width="3688"
height="1534"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: HAMi adopters&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/contributors.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/contributors.webp" alt="Figure 4: HAMi contributor organizations" data-caption="Figure 4: HAMi contributor organizations"
width="3662"
height="674"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: HAMi contributor organizations&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="blog--reading-experience"&gt;Blog &amp;amp; Reading Experience&lt;/h3&gt;
&lt;p&gt;Blog cards, lists, and metadata were unified for easier reading and sharing. Blogs are now a core communication layer.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/hami-blog.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/hami-blog.webp" alt="Figure 5: HAMi website blog list page" data-caption="Figure 5: HAMi website blog list page"
width="2318"
height="1088"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: HAMi website blog list page&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="mobile-optimization"&gt;Mobile Optimization&lt;/h3&gt;
&lt;p&gt;Navigation, card layouts, footer, and search were improved for smoother mobile browsing.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/mobile.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/mobile.webp" alt="Figure 6: HAMi website mobile view" data-caption="Figure 6: HAMi website mobile view"
width="654"
height="1418"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: HAMi website mobile view&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="footer--search"&gt;Footer &amp;amp; Search&lt;/h3&gt;
&lt;p&gt;Footer layout was enhanced for better navigation and credibility. Built-in search replaced unreliable external solutions, improving content accessibility.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/footer.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/footer.webp" alt="Figure 7: HAMi website footer" data-caption="Figure 7: HAMi website footer"
width="2472"
height="718"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 7: HAMi website footer&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/hami-website-redesign/search.webp" data-img="https://assets.jimmysong.io/images/blog/hami-website-redesign/search.webp" alt="Figure 8: HAMi website built-in search" data-caption="Figure 8: HAMi website built-in search"
width="1330"
height="1214"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 8: HAMi website built-in search&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="what-this-redesign-means-for-the-hami-community"&gt;What This Redesign Means for the HAMi Community&lt;/h2&gt;
&lt;p&gt;From screenshots, it looks like &amp;ldquo;the website looks better&amp;rdquo;. But from a community-building perspective, its significance is deeper.&lt;/p&gt;
&lt;p&gt;First, HAMi&amp;rsquo;s external expression is more systematic.&lt;/p&gt;
&lt;p&gt;The website is no longer just a collection of scattered pages, but is forming a complete narrative chain: users can understand project value from the homepage, capability details from docs, practical paths from blogs, and community impact from ecosystem modules.&lt;/p&gt;
&lt;p&gt;Second, community content assets are reorganized.&lt;/p&gt;
&lt;p&gt;Previously, valuable articles, diagrams, and explanations existed but were hard to find. Now, through homepage sections, navigation, and search refactor, these contents are more effectively connected.&lt;/p&gt;
&lt;p&gt;Third, HAMi&amp;rsquo;s community image is more mature.&lt;/p&gt;
&lt;p&gt;A mature open source project needs not just an active code repository, but clear, stable, and sustainable website expression. Structure, style, and usability are part of the community&amp;rsquo;s engineering capability.&lt;/p&gt;
&lt;p&gt;Fourth, this lays the foundation for expanding case studies, adopters, contributors, and ecosystem content.&lt;/p&gt;
&lt;p&gt;With the framework sorted, adding more case studies, collaboration entry points, or showcasing more adopters and partners will be more natural and easier for users to understand.&lt;/p&gt;
&lt;h2 id="as-a-community-contributor-my-top-three-takeaways-from-this-redesign"&gt;As a Community Contributor, My Top Three Takeaways from This Redesign&lt;/h2&gt;
&lt;p&gt;In summary, I believe this refactor got three things right:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Upgraded the website from a &amp;ldquo;content dump&amp;rdquo; to a &amp;ldquo;community gateway&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Combined visual optimization with information architecture adjustment, not just a skin change.&lt;/li&gt;
&lt;li&gt;Improved basic experiences like search, mobile, navigation, and footer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These may not be as flashy as launching a new feature, but they directly impact content dissemination, user comprehension, and the project&amp;rsquo;s long-term image.&lt;/p&gt;
&lt;p&gt;For infrastructure projects like HAMi, technical capability is fundamental, but clearly communicating, organizing, and continuously presenting that capability is also a form of infrastructure.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;This HAMi documentation and website refactor is essentially an upgrade to the community&amp;rsquo;s &amp;ldquo;expression layer&amp;rdquo; infrastructure.&lt;/p&gt;
&lt;p&gt;It improves visual and reading experience, reorganizes content, homepage narrative, search paths, mobile access, and community signal display. Homepage redesign, architecture diagram redraw, unified blog style, mobile optimization, enhanced footer, and switching from external to built-in search together constitute a true &amp;ldquo;refactor&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Externally, it helps more people quickly understand HAMi; internally, it provides a stable platform for the community to accumulate case studies, expand the ecosystem, and serve adopters and contributors.&lt;/p&gt;
&lt;p&gt;The website is not an accessory to the open source community, but part of its long-term influence. HAMi&amp;rsquo;s redesign is about taking this seriously.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re interested in Kubernetes GPU virtualization, add me on WeChat &lt;code&gt;jimmysong&lt;/code&gt; or scan the QR code below.&lt;/p&gt;
&lt;div class="cta-group"&gt;
&lt;a href="https://github.com/project-hami/hami" class="btn btn-sm btn-primary"&gt;Check out the HAMi project on GitHub&lt;/a&gt;
&lt;/div&gt;</content:encoded></item><item><title>GTC 2026 Eve: AI is Becoming the New Infrastructure</title><link>https://jimmysong.io/blog/gtc-2026-ai-native-infrastructure/</link><pubDate>Sun, 15 Mar 2026 11:34:06 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/gtc-2026-ai-native-infrastructure/</guid><description>On the eve of GTC 2026, rethinking whether AI is becoming the new infrastructure from NVIDIA&amp;#39;s AI Five-Layer Cake, the rise of agent runtime, to AI-native infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;AI is quietly reshaping the infrastructure landscape, and GTC 2026 may become a key node in this transformation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Next week, one of the most important technology conferences in the AI industry, &lt;a href="https://www.nvidia.com/gtc/" target="_blank" rel="noopener"&gt;&lt;strong&gt;NVIDIA GTC 2026&lt;/strong&gt;&lt;/a&gt;, will be held in San Jose, USA.&lt;/p&gt;
&lt;p&gt;For many people, GTC is just a GPU technology conference. But if you follow the development of the AI industry over the past few years, you&amp;rsquo;ll find an interesting phenomenon:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Many important narratives about AI infrastructure are gradually taking shape at GTC.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From CUDA, DGX, to AI Factory, and most recently Jensen Huang&amp;rsquo;s proposed &lt;strong&gt;AI Five-Layer Cake&lt;/strong&gt;, NVIDIA is constantly attempting to redefine the computing infrastructure of the AI era.&lt;/p&gt;
&lt;p&gt;This is why many people call GTC:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI&amp;rsquo;s &amp;ldquo;Woodstock.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/nvidia-gtc.webp" data-img="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/nvidia-gtc.webp" alt="Figure 1: NVIDIA GTC Conference" data-caption="Figure 1: NVIDIA GTC Conference"
width="2212"
height="1152"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: NVIDIA GTC Conference&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This year&amp;rsquo;s GTC (March 16-19) is expected to cover various levels of the AI stack, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI Chips&lt;/li&gt;
&lt;li&gt;AI Data Centers&lt;/li&gt;
&lt;li&gt;AI Agents&lt;/li&gt;
&lt;li&gt;Robotics&lt;/li&gt;
&lt;li&gt;Inference Computing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;According to &lt;a href="https://blogs.nvidia.com/blog/gtc-2026-news/" target="_blank" rel="noopener"&gt;NVIDIA&amp;rsquo;s official blog&lt;/a&gt;, this year&amp;rsquo;s keynote will focus on &lt;strong&gt;the complete AI stack from chips to applications&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If we put these signals together, we can actually see a larger trend:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI is transforming from an &amp;ldquo;applied technology&amp;rdquo; into &amp;ldquo;infrastructure.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="the-perspective-of-industrial-revolutions"&gt;The Perspective of Industrial Revolutions&lt;/h2&gt;
&lt;p&gt;From a longer time scale, the technological revolutions in human history are essentially infrastructure revolutions.&lt;/p&gt;
&lt;p&gt;We usually divide industrial revolutions into four times.&lt;/p&gt;
&lt;p&gt;In the table below, you can see the infrastructure corresponding to each industrial revolution:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Industrial Revolution&lt;/th&gt;
&lt;th&gt;Infrastructure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Steam Revolution&lt;/td&gt;
&lt;td&gt;Steam Engine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electrical Revolution&lt;/td&gt;
&lt;td&gt;Power Grid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Digital Revolution&lt;/td&gt;
&lt;td&gt;Computer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internet Era&lt;/td&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Industrial Revolutions and Corresponding Infrastructure
&lt;/figcaption&gt;
&lt;h3 id="first-industrial-revolution-steam"&gt;First Industrial Revolution: Steam&lt;/h3&gt;
&lt;p&gt;The steam engine allowed humans to utilize mechanical power on a large scale for the first time. Production no longer relied on human or animal power, but on machines.&lt;/p&gt;
&lt;h3 id="second-industrial-revolution-electricity"&gt;Second Industrial Revolution: Electricity&lt;/h3&gt;
&lt;p&gt;Electricity changed not only the source of power, but also the organization of production. Assembly lines, large-scale manufacturing, and modern industrial systems are all built on the foundation of the power grid.&lt;/p&gt;
&lt;h3 id="third-industrial-revolution-computers"&gt;Third Industrial Revolution: Computers&lt;/h3&gt;
&lt;p&gt;Computers allowed information to be processed digitally. Software became a production tool.&lt;/p&gt;
&lt;h3 id="fourth-industrial-revolution-internet-and-intelligence"&gt;Fourth Industrial Revolution: Internet and Intelligence&lt;/h3&gt;
&lt;p&gt;The internet connects all computers together. Cloud computing transforms computing resources into infrastructure. And AI gives machines a certain degree of &amp;ldquo;cognitive ability.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-true-significance-of-ai"&gt;The True Significance of AI&lt;/h2&gt;
&lt;p&gt;If we observe these industrial revolutions, we discover a pattern:&lt;/p&gt;
&lt;p&gt;Each industrial revolution produces a new &lt;strong&gt;General Purpose Infrastructure&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;And AI is likely to become the next-generation infrastructure.&lt;/p&gt;
&lt;p&gt;NVIDIA even directly stated in a &lt;a href="https://blogs.nvidia.com/blog/ai-5-layer-cake/" target="_blank" rel="noopener"&gt;recent article&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI is essential infrastructure, like electricity and the internet.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In other words:&lt;/p&gt;
&lt;p&gt;AI is no longer just an applied technology, but a &lt;strong&gt;new factor of production&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="nvidias-five-layer-cake"&gt;NVIDIA&amp;rsquo;s Five-Layer Cake&lt;/h2&gt;
&lt;p&gt;Recently, Jensen Huang proposed a very interesting concept: &lt;strong&gt;AI Five-Layer Cake&lt;/strong&gt;.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/ai-five-layer-cake.webp" data-img="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/ai-five-layer-cake.webp" alt="Figure 2: AI Five Layer Cake (Image source: &amp;lt;a href=&amp;#34;https://blogs.nvidia.com/blog/ai-5-layer-cake/&amp;#34; target=&amp;#34;_blank&amp;#34; rel=&amp;#34;noopener&amp;#34;&amp;gt;NVIDIA&amp;lt;/a&amp;gt;)" data-caption="Figure 2: AI Five Layer Cake (Image source: &amp;lt;a href=&amp;#34;https://blogs.nvidia.com/blog/ai-5-layer-cake/&amp;#34; target=&amp;#34;_blank&amp;#34; rel=&amp;#34;noopener&amp;#34;&amp;gt;NVIDIA&amp;lt;/a&amp;gt;)"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: AI Five Layer Cake (Image source: &lt;a href="https://blogs.nvidia.com/blog/ai-5-layer-cake/" target="_blank" rel="noopener"&gt;NVIDIA&lt;/a&gt;)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;AI is broken down into five layers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Energy&lt;/li&gt;
&lt;li&gt;Chips&lt;/li&gt;
&lt;li&gt;AI Infrastructure&lt;/li&gt;
&lt;li&gt;Models&lt;/li&gt;
&lt;li&gt;Applications&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This model actually illustrates one thing:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI is a complete industrial system.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Jensen Huang even described AI at Davos as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;One of the largest-scale infrastructure constructions in human history.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="signals-gtc-2026-may-release"&gt;Signals GTC 2026 May Release&lt;/h2&gt;
&lt;p&gt;This year&amp;rsquo;s GTC is expected to release several important directions.&lt;/p&gt;
&lt;h3 id="inference-computing"&gt;Inference Computing&lt;/h3&gt;
&lt;p&gt;The focus of AI in the past was training. But the main load of AI in the future is likely to be &lt;strong&gt;Inference&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Analysts expect that by 2030, &lt;strong&gt;75% of computing demand in the AI data center market will come from inference&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="agentic-ai"&gt;Agentic AI&lt;/h3&gt;
&lt;p&gt;The past AI model was:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;User → Model → Answer
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The Agent model is more complex:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;User → Agent → Tools → Model → Action
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The flowchart below shows the main interaction paths in the Agent model:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/agentic-ai-interaction-en.svg" data-img="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/agentic-ai-interaction-en.svg" alt="Figure 3: Agentic AI Interaction Flow" data-caption="Figure 3: Agentic AI Interaction Flow"
width="936"
height="536"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Agentic AI Interaction Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;AI is no longer just answering questions, but &lt;strong&gt;executing tasks&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="agent-platform"&gt;Agent Platform&lt;/h3&gt;
&lt;p&gt;Recent media reports suggest that NVIDIA may launch a new Agent platform: &lt;strong&gt;NemoClaw&lt;/strong&gt;, aimed at helping enterprises deploy AI Agents.&lt;/p&gt;
&lt;p&gt;If this project is truly released, it means NVIDIA&amp;rsquo;s stack will become the following structure:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/nvidia-agent-platform-en.svg" data-img="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/nvidia-agent-platform-en.svg" alt="Figure 4: NVIDIA Agent Platform Architecture" data-caption="Figure 4: NVIDIA Agent Platform Architecture"
width="416"
height="816"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: NVIDIA Agent Platform Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is actually a complete AI stack.&lt;/p&gt;
&lt;h2 id="agents-change-computing-workloads"&gt;Agents Change Computing Workloads&lt;/h2&gt;
&lt;p&gt;The emergence of Agents brings new computing workload issues.&lt;/p&gt;
&lt;p&gt;Past AI workloads were mainly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training&lt;/li&gt;
&lt;li&gt;Inference&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But Agents bring a third type of workload:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent Workloads&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The figure below shows the diverse workload types related to Agents:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/agent-workloads-en.svg" data-img="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/agent-workloads-en.svg" alt="Figure 5: Agent Workloads Structure" data-caption="Figure 5: Agent Workloads Structure"
width="1376"
height="316"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Agent Workloads Structure&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The characteristic of this workload is &lt;strong&gt;highly fragmented&lt;/strong&gt;. GPUs are no longer occupied for long periods, but rather face many small requests. This poses new challenges for infrastructure.&lt;/p&gt;
&lt;h2 id="ai-native-infrastructure"&gt;AI-Native Infrastructure&lt;/h2&gt;
&lt;p&gt;For the past few years, I&amp;rsquo;ve been thinking about a question:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is AI-native infrastructure?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It is clearly not just &amp;ldquo;Kubernetes with GPUs.&amp;rdquo; I&amp;rsquo;m more inclined to believe it needs to possess several characteristics.&lt;/p&gt;
&lt;h3 id="gpu-as-a-first-class-resource"&gt;GPU as a First-Class Resource&lt;/h3&gt;
&lt;p&gt;In the cloud computing era, CPU is the core resource. In the AI era, &lt;strong&gt;GPU is the core resource&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="heterogeneous-computing"&gt;Heterogeneous Computing&lt;/h3&gt;
&lt;p&gt;Real-world AI chips are not limited to NVIDIA:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NVIDIA&lt;/li&gt;
&lt;li&gt;Ascend&lt;/li&gt;
&lt;li&gt;Cambricon&lt;/li&gt;
&lt;li&gt;Metax&lt;/li&gt;
&lt;li&gt;Moore Threads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Future AI infrastructure must be able to manage &lt;strong&gt;heterogeneous computing&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="gpu-sharing"&gt;GPU Sharing&lt;/h3&gt;
&lt;p&gt;GPU is a very expensive resource. If it cannot be shared, utilization will be very low. This is why GPU virtualization and slicing are becoming increasingly important.&lt;/p&gt;
&lt;h3 id="ai-scheduling"&gt;AI Scheduling&lt;/h3&gt;
&lt;p&gt;AI scheduling includes not only traditional CPU and Memory, but also:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;GPU
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;VRAM
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Topology
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Bandwidth
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="a-possible-ai-tech-stack"&gt;A Possible AI Tech Stack&lt;/h2&gt;
&lt;p&gt;Combining the above trends, the future AI stack may present the following structure:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/ai-tech-stack-en.svg" data-img="https://assets.jimmysong.io/images/blog/gtc-2026-ai-native-infrastructure/ai-tech-stack-en.svg" alt="Figure 6: AI Tech Stack Evolution" data-caption="Figure 6: AI Tech Stack Evolution"
width="416"
height="956"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: AI Tech Stack Evolution&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This structure is very close to NVIDIA&amp;rsquo;s Five-Layer Cake.&lt;/p&gt;
&lt;h2 id="my-judgment"&gt;My Judgment&lt;/h2&gt;
&lt;p&gt;Combining signals from GTC, AI Factory, Agents, and AI Five-Layer Cake, we can see a very obvious trend:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI is rewriting computing infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Future competition may not just be &amp;ldquo;who has the best model,&amp;rdquo; but:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Who has the best AI Infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Just like the past few decades:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Electricity determines industrial capability&lt;/li&gt;
&lt;li&gt;Internet determines information capability&lt;/li&gt;
&lt;li&gt;Cloud computing determines software capability&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The future may be:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI Infrastructure determines intelligence capability.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;If we stretch the time scale a bit longer, we may be in a new historical stage.&lt;/p&gt;
&lt;p&gt;AI is no longer just a technological tool. It is becoming &lt;strong&gt;new infrastructure&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Just like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Electricity&lt;/li&gt;
&lt;li&gt;Internet&lt;/li&gt;
&lt;li&gt;Cloud computing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And AI-native infrastructure is likely to become one of the most important technology directions for the next decade.&lt;/p&gt;</content:encoded></item><item><title>When GPUs Move Toward Open Scheduling: Structural Shifts in AI Native Infrastructure</title><link>https://jimmysong.io/blog/gpu-open-scheduling-hami-2025/</link><pubDate>Fri, 13 Feb 2026 14:32:46 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/gpu-open-scheduling-hami-2025/</guid><description>A CTO/VP view on open GPU scheduling: CDI, Kubernetes DRA, virtualization data planes, ecosystem governance, and lock-in risk.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The future of GPU scheduling isn&amp;rsquo;t about whose implementation is more &amp;ldquo;black-box&amp;rdquo;—it&amp;rsquo;s about who can standardize device resource contracts into something governable.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/banner.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/banner.webp" alt="Figure 1: GPU Open Scheduling" data-caption="Figure 1: GPU Open Scheduling"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: GPU Open Scheduling&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;Have you ever wondered: why are GPUs so expensive, yet overall utilization often hovers around 10–20%?&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/underutilization.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/underutilization.webp" alt="Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization" data-caption="Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: GPU Utilization Problem: Expensive GPUs with only 10-20% utilization&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This isn&amp;rsquo;t a problem you solve with &amp;ldquo;better scheduling algorithms.&amp;rdquo; It&amp;rsquo;s a &lt;strong&gt;structural problem&lt;/strong&gt; - GPU scheduling is undergoing a shift from &amp;ldquo;proprietary implementation&amp;rdquo; to &amp;ldquo;open scheduling,&amp;rdquo; similar to how networking converged on CNI and storage converged on CSI.&lt;/p&gt;
&lt;p&gt;In the &lt;a href="https://dynamia.ai/blog/hami-2025-recap" target="_blank" rel="noopener"&gt;HAMi 2025 Annual Review&lt;/a&gt;, we noted: &amp;ldquo;HAMi 2025 is no longer just about GPU sharing tools—it&amp;rsquo;s a more structural signal: GPUs are moving toward open scheduling.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;By 2025, the signals of this shift became visible: Kubernetes Dynamic Resource Allocation (DRA) graduated to GA and became enabled by default, NVIDIA GPU Operator started defaulting to &lt;a href="https://github.com/cncf-tags/container-device-interface" target="_blank" rel="noopener"&gt;CDI&lt;/a&gt; (Container Device Interface), and HAMi&amp;rsquo;s production-grade case studies under CNCF are moving &amp;ldquo;GPU sharing&amp;rdquo; from experimental capability to operational excellence.&lt;/p&gt;
&lt;p&gt;This post analyzes this structural shift from an AI Native Infrastructure perspective, and what it means for &lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia&lt;/a&gt; and the industry.&lt;/p&gt;
&lt;h2 id="why-open-scheduling-matters"&gt;Why &amp;ldquo;Open Scheduling&amp;rdquo; Matters&lt;/h2&gt;
&lt;p&gt;In multi-cloud and hybrid cloud environments, GPU model diversity significantly amplifies operational costs. One large internet company&amp;rsquo;s platform spans H200/H100/A100/V100/4090 GPUs across five clusters. If you can only allocate &amp;ldquo;whole GPUs,&amp;rdquo; resource misalignment becomes inevitable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Open scheduling&amp;rdquo; isn&amp;rsquo;t a slogan—it&amp;rsquo;s a set of engineering contracts being solidified into the mainstream stack.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="standardized-resource-expression"&gt;Standardized Resource Expression&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; GPUs were extended resources. The scheduler didn&amp;rsquo;t understand if they represented memory, compute, or device types.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dra-evolution.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dra-evolution.webp" alt="Figure 3: Open Scheduling Standardization Evolution" data-caption="Figure 3: Open Scheduling Standardization Evolution"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Open Scheduling Standardization Evolution&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; Kubernetes DRA provides objects like DeviceClass, ResourceClaim, and ResourceSlice. This lets drivers and cluster administrators define device categories and selection logic (including CEL-based selectors), while Kubernetes handles the full loop: match devices → bind claims → place Pods onto nodes with access to allocated devices.&lt;/p&gt;
&lt;p&gt;Even more importantly, Kubernetes 1.34 stated that core APIs in the &lt;code&gt;resource.k8s.io&lt;/code&gt; group graduated to GA, DRA became stable and enabled by default, and the community committed to avoiding breaking changes going forward. This means the ecosystem can invest with confidence in a stable, standard API.&lt;/p&gt;
&lt;h3 id="standardized-device-injection"&gt;Standardized Device Injection&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt; Device injection relied on vendor-specific hooks and runtime class patterns.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt; The Container Device Interface (CDI) abstracts device injection into an open specification. NVIDIA&amp;rsquo;s Container Toolkit explicitly describes CDI as an open specification for container runtimes, and NVIDIA GPU Operator 25.10.0 defaults to enabling CDI on install/upgrade—directly leveraging runtime-native CDI support (containerd, CRI-O, etc.) for GPU injection.&lt;/p&gt;
&lt;p&gt;This means &amp;ldquo;devices into containers&amp;rdquo; is also moving toward replaceable, standardized interfaces.&lt;/p&gt;
&lt;h2 id="hami-from-sharing-tool-to-governable-data-plane"&gt;HAMi: From &amp;ldquo;Sharing Tool&amp;rdquo; to &amp;ldquo;Governable Data Plane&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;On this standardization path, &lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt;&amp;rsquo;s role needs redefinition: &lt;strong&gt;it&amp;rsquo;s not about replacing Kubernetes—it&amp;rsquo;s about turning GPU virtualization and slicing into a declarative, schedulable, governable data plane.&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="data-plane-perspective"&gt;Data Plane Perspective&lt;/h3&gt;
&lt;p&gt;HAMi&amp;rsquo;s core contribution expands the allocatable unit from &amp;ldquo;whole GPU integers&amp;rdquo; to finer-grained shares (memory and compute), forming a complete allocation chain:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Device discovery:&lt;/strong&gt; Identify available GPU devices and models&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduling placement:&lt;/strong&gt; Use Scheduler Extender to make native schedulers &amp;ldquo;understand&amp;rdquo; vGPU resource models (Filter/Score/Bind phases)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;In-container enforcement:&lt;/strong&gt; Inject share constraints into container runtime&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Metric export:&lt;/strong&gt; Provide observable metrics for utilization, isolation, and more&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This transforms &amp;ldquo;sharing&amp;rdquo; from ad-hoc &amp;ldquo;it runs&amp;rdquo; experimentation into engineering capability that can be declared in YAML, scheduled by policy, and validated by metrics.&lt;/p&gt;
&lt;h3 id="scheduling-mechanism-enhancement-not-replacement"&gt;Scheduling Mechanism: Enhancement, Not Replacement&lt;/h3&gt;
&lt;p&gt;HAMi&amp;rsquo;s scheduling doesn&amp;rsquo;t replace Kubernetes—it uses a &lt;strong&gt;Scheduler Extender&lt;/strong&gt; pattern to let the native scheduler understand vGPU resource models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Filter:&lt;/strong&gt; Filter nodes based on memory, compute, device type, topology, and other constraints&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Score:&lt;/strong&gt; Apply configurable policies like binpack, spread, topology-aware scoring&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bind:&lt;/strong&gt; Complete final device-to-Pod binding&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This architecture positions HAMi naturally as an execution layer under higher-level &amp;ldquo;AI control planes&amp;rdquo; (queuing, quotas, priorities)—working alongside Volcano, Kueue, Koordinator, and others.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/hami-scheduler-extender.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/hami-scheduler-extender.webp" alt="Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)" data-caption="Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: HAMi Scheduling Architecture (Filter → Score → Bind)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="production-evidence-from-can-we-share-to-can-we-operate"&gt;Production Evidence: From &amp;ldquo;Can We Share?&amp;rdquo; to &amp;ldquo;Can We Operate?&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.cncf.io/case-studies/?_sft_lf_project=hami" target="_blank" rel="noopener"&gt;CNCF public case studies&lt;/a&gt; provide concrete answers: &lt;strong&gt;in a hybrid, multi-cloud platform built on Kubernetes and HAMi, 10,000+ Pods run concurrently, and GPU utilization improves from 13% to 37% (nearly 3×).&lt;/strong&gt;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/case-studies.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/case-studies.webp" alt="Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%&amp;#43; utilization, SF Technology 57% savings" data-caption="Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%&amp;#43; utilization, SF Technology 57% savings"
width="2466"
height="1508"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: CNCF Production Case Studies: Ke Holdings 13%→37%, DaoCloud 80%+ utilization, SF Technology 57% savings&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here are highlights from several cases:&lt;/p&gt;
&lt;h3 id="case-study-1-ke-holdings-february-5-2026"&gt;Case Study 1: Ke Holdings (February 5, 2026)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Environment:&lt;/strong&gt; 5 clusters spanning public and private clouds&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU models:&lt;/strong&gt; H200/H100/A100/V100/4090 and more&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Architecture:&lt;/strong&gt; Separate &amp;ldquo;GPU clusters&amp;rdquo; for large training tasks (dedicated allocation) vs &amp;ldquo;vGPU clusters&amp;rdquo; with HAMi fine-grained memory slicing for high-density inference&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Concurrent scale:&lt;/strong&gt; 10,000+ Pods&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outcome:&lt;/strong&gt; Overall GPU utilization improved from 13% to 37% (nearly 3×)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="case-study-2-daocloud-december-2-2025"&gt;Case Study 2: DaoCloud (December 2, 2025)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hard constraints:&lt;/strong&gt; Must remain cloud-native, vendor-agnostic, and compatible with CNCF toolchain&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adoption outcomes:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;Average GPU utilization: 80%+&lt;/li&gt;
&lt;li&gt;GPU-related operating cost reduction: 20–30%&lt;/li&gt;
&lt;li&gt;Coverage: 10+ data centers, 10,000+ GPUs&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Explicit benefit:&lt;/strong&gt; Unified abstraction layer across NVIDIA and domestic GPUs, reducing vendor dependency&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="case-study-3-prep-edu-august-20-2025"&gt;Case Study 3: Prep EDU (August 20, 2025)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Negative experience:&lt;/strong&gt; Isolation failures in other GPU-sharing approaches caused memory conflicts and instability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Positive outcome:&lt;/strong&gt; HAMi&amp;rsquo;s vGPU scheduling, GPU type/UUID targeting, and compatibility with NVIDIA GPU Operator and RKE2 became decisive factors for production adoption&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Environment:&lt;/strong&gt; Heterogeneous RTX 4070/4090 cluster&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="case-study-4-sf-technology-september-18-2025"&gt;Case Study 4: SF Technology (September 18, 2025)&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Project:&lt;/strong&gt; EffectiveGPU (built on HAMi)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use cases:&lt;/strong&gt; Large model inference, test services, speech recognition, domestic AI hardware (Huawei Ascend, Baidu Kunlun, etc.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outcomes:&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;GPU savings: Large model inference runs 65 services on 28 GPUs (37 saved); test cluster runs 19 services on 6 GPUs (13 saved)&lt;/li&gt;
&lt;li&gt;Overall savings: Up to 57% GPU savings for production and test clusters&lt;/li&gt;
&lt;li&gt;Utilization improvement: Up to 100% GPU utilization improvement with GPU virtualization&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Highlights:&lt;/strong&gt; Cross-node collaborative scheduling, priority-based preemption, memory over-subscription&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These cases demonstrate a consistent pattern: &lt;strong&gt;GPU virtualization becomes economically meaningful only when it participates in a governable contract—where utilization, isolation, and policy can be expressed, measured, and improved over time.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="strategic-implications-for-dynamia"&gt;Strategic Implications for Dynamia&lt;/h2&gt;
&lt;p&gt;From Dynamia&amp;rsquo;s perspective (and as VP of Open Source Ecosystem), the strategic value of HAMi becomes clear:&lt;/p&gt;
&lt;h3 id="two-layer-architecture-open-source-vs-commercial"&gt;Two-Layer Architecture: Open Source vs Commercial&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HAMi (CNCF open source project):&lt;/strong&gt; Responsible for &amp;ldquo;adoption and trust,&amp;rdquo; focused on GPU virtualization and compute efficiency&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamia enterprise products and services:&lt;/strong&gt; Responsible for &amp;ldquo;production and scale,&amp;rdquo; providing commercial distributions and enterprise services built on HAMi&lt;/li&gt;
&lt;/ul&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dynamia-hami-dual-mechanism.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/dynamia-hami-dual-mechanism.webp" alt="Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial" data-caption="Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: Dynamia Dual Mechanism: Open Source vs Commercial&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This boundary is the foundation for long-term trust—project and company offerings remain separate, with commercial distributions and services built on the open source project.&lt;/p&gt;
&lt;h3 id="global-narrative-strategy"&gt;Global Narrative Strategy&lt;/h3&gt;
&lt;p&gt;The internal alignment memo recommends a bilingual approach:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First layer:&lt;/strong&gt; Lead globally with &amp;ldquo;GPU virtualization / sharing / utilization&amp;rdquo; (Chinese can directly use &amp;ldquo;GPU virtualization and heterogeneous scheduling,&amp;rdquo; but English first layer should avoid &amp;ldquo;heterogeneous&amp;rdquo; as a headline)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Second layer:&lt;/strong&gt; When users discuss mixed GPUs or workload diversity, introduce &amp;ldquo;heterogeneous&amp;rdquo; to confirm capability boundaries—never as the opening hook&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Core anchor:&lt;/strong&gt; Maintain &amp;ldquo;HAMi (project and community) ≠ company products&amp;rdquo; as the non-negotiable baseline for long-term positioning&lt;/p&gt;
&lt;h3 id="the-right-commercialization-landing"&gt;The Right Commercialization Landing&lt;/h3&gt;
&lt;p&gt;DaoCloud&amp;rsquo;s case study already set vendor-agnostic and CNCF toolchain compatibility as hard constraints, framing vendor dependency reduction as a business and operational benefit—not just a technical detail. Project-HAMi&amp;rsquo;s official documentation lists &amp;ldquo;avoid vendor lock&amp;rdquo; as a core value proposition.&lt;/p&gt;
&lt;p&gt;In this context, &lt;strong&gt;the right commercialization landing isn&amp;rsquo;t &amp;ldquo;closed-source scheduling&amp;rdquo;—it&amp;rsquo;s productizing capabilities around real enterprise complexity:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Systematic compatibility matrix&lt;/li&gt;
&lt;li&gt;SLO and tail-latency governance&lt;/li&gt;
&lt;li&gt;Metering for billing&lt;/li&gt;
&lt;li&gt;RBAC, quotas, multi-cluster governance&lt;/li&gt;
&lt;li&gt;Upgrade and rollback safety&lt;/li&gt;
&lt;li&gt;Faster path-to-production for DRA/CDI and other standardization efforts&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="forward-view-the-next-23-years"&gt;Forward View: The Next 2–3 Years&lt;/h2&gt;
&lt;p&gt;My strong judgment: &lt;strong&gt;over the next 2–3 years, GPU scheduling competition will shift from &amp;ldquo;whose implementation is more black-box&amp;rdquo; to &amp;ldquo;whose contract is more open.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The reasons are practical:&lt;/p&gt;
&lt;h3 id="hardware-form-factors-and-supply-chains-are-diversifying"&gt;Hardware Form Factors and Supply Chains Are Diversifying&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;OpenAI&amp;rsquo;s February 12, 2026 &amp;ldquo;GPT‑5.3‑Codex‑Spark&amp;rdquo; release emphasizes ultra-low latency serving, including persistent WebSockets and a dedicated serving tier on Cerebras hardware&lt;/li&gt;
&lt;li&gt;Large-scale GPU-backed financing announcements (for pan-European deployments) illustrate the infrastructure scale and financial engineering surrounding accelerator fleets&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These signals suggest that heterogeneity will grow: mixed accelerators, mixed clouds, mixed workload types.&lt;/p&gt;
&lt;h3 id="low-latency-inference-tiers-will-force-systematic-scheduling"&gt;Low-Latency Inference Tiers Will Force Systematic Scheduling&lt;/h3&gt;
&lt;p&gt;Low-latency inference tiers (beyond just GPUs) will force resource scheduling toward &amp;ldquo;multi-accelerator, multi-layer cache, multi-class node&amp;rdquo; architectural design—scheduling must inherently be heterogeneous.&lt;/p&gt;
&lt;h3 id="open-scheduling-is-risk-management-not-idealism"&gt;Open Scheduling Is Risk Management, Not Idealism&lt;/h3&gt;
&lt;p&gt;In this world, &amp;ldquo;open scheduling&amp;rdquo; isn&amp;rsquo;t idealism—it&amp;rsquo;s risk management. Building schedulable governable &amp;ldquo;control plane + data plane&amp;rdquo; combinations around DRA/CDI and other solidifying open interfaces, ones that are pluggable, multi-tenant governable, and co-evolvable with the ecosystem—this looks like the truly sustainable path for AI Native Infrastructure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The next battleground isn&amp;rsquo;t &amp;ldquo;whose scheduling is smarter&amp;rdquo;—it&amp;rsquo;s &amp;ldquo;who can standardize device resource contracts into something governable.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="conclusion"&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;When you place HAMi 2025 back in the broader AI Native Infrastructure context, it&amp;rsquo;s no longer just the year of &amp;ldquo;GPU sharing tools&amp;rdquo;—it&amp;rsquo;s a more structural signal: &lt;strong&gt;GPUs are moving toward open scheduling.&lt;/strong&gt;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/future-vision-open-scheduling.webp" data-img="https://assets.jimmysong.io/images/blog/gpu-open-scheduling-hami-2025/future-vision-open-scheduling.webp" alt="Figure 7: Open Scheduling Future Vision" data-caption="Figure 7: Open Scheduling Future Vision"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 7: Open Scheduling Future Vision&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The driving forces come from both ends:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Upstream:&lt;/strong&gt; Standards like DRA/CDI continue to solidify&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Downstream:&lt;/strong&gt; Scale and diversity (multi-cloud, multi-model, even accelerators beyond GPUs)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For Dynamia, HAMi&amp;rsquo;s significance has transcended &amp;ldquo;GPU sharing tool&amp;rdquo;: it turns GPU virtualization and slicing into declarative, schedulable, measurable data planes—letting queues, quotas, priorities, and multi-tenancy actually close the governance loop.&lt;/p&gt;</content:encoded></item><item><title>AI Learning Resources: 44 Curated Collections from Our Cleanup</title><link>https://jimmysong.io/blog/ultimate-ai-learning-resources/</link><pubDate>Sun, 08 Feb 2026 12:20:05 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ultimate-ai-learning-resources/</guid><description>A curated collection of AI learning resources we removed from the AI Resources list: awesome lists, courses, tutorials, and cookbooks. These educational materials deserve their own spotlight.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;The best way to learn AI is to start building. These resources will guide your journey.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ultimate-ai-learning-resources/banner.webp" data-img="https://assets.jimmysong.io/images/blog/ultimate-ai-learning-resources/banner.webp" alt="Figure 1: AI Learning Resources Collection" data-caption="Figure 1: AI Learning Resources Collection"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI Learning Resources Collection&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In my ongoing effort to keep the AI Resources list focused on &lt;strong&gt;production-ready tools and frameworks&lt;/strong&gt;, I&amp;rsquo;ve removed &lt;strong&gt;44 collection-type projects&lt;/strong&gt;—courses, tutorials, awesome lists, and cookbooks.&lt;/p&gt;
&lt;p&gt;These resources aren&amp;rsquo;t gone—they&amp;rsquo;ve been moved here. This post is a &lt;strong&gt;curated collection&lt;/strong&gt; of those educational materials, organized by type and topic. Whether you&amp;rsquo;re a complete beginner or an experienced practitioner, you&amp;rsquo;ll find something valuable here.&lt;/p&gt;
&lt;h2 id="why-remove-collections-from-ai-resources"&gt;Why Remove Collections from AI Resources?&lt;/h2&gt;
&lt;p&gt;My AI Resources list now focuses on &lt;strong&gt;concrete tools and frameworks&lt;/strong&gt;—projects you can directly use in production. Collections, while valuable, serve a different purpose: &lt;strong&gt;education and discovery&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;By separating them, I:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keep the resources list actionable and focused&lt;/li&gt;
&lt;li&gt;Create a dedicated space for learning materials&lt;/li&gt;
&lt;li&gt;Make it easier to find what you need&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-awesome-lists-14-collections"&gt;📚 Awesome Lists (14 Collections)&lt;/h2&gt;
&lt;p&gt;Awesome lists are community-curated collections of the best resources. They&amp;rsquo;re perfect for discovering new tools and staying updated.&lt;/p&gt;
&lt;h3 id="must-explore-awesome-lists"&gt;Must-Explore Awesome Lists&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/filipecalegario/awesome-generative-ai" target="_blank" rel="noopener"&gt;Awesome Generative AI&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Models, tools, tutorials, and research papers&lt;/li&gt;
&lt;li&gt;Great for: Comprehensive overview of generative AI landscape&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hannibal046/awesome-llm" target="_blank" rel="noopener"&gt;Awesome LLM&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM resources: papers, tools, datasets, applications&lt;/li&gt;
&lt;li&gt;Great for: Deep dive into large language models&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/arindam200/awesome-ai-apps" target="_blank" rel="noopener"&gt;Awesome AI Apps&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Practical LLM applications, RAG examples, agent implementations&lt;/li&gt;
&lt;li&gt;Great for: Real-world implementation examples&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hesreallyhim/awesome-claude-code" target="_blank" rel="noopener"&gt;Awesome Claude Code&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Claude Code commands, files, and workflows&lt;/li&gt;
&lt;li&gt;Great for: Maximizing Claude Code productivity&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/punkpeye/awesome-mcp-servers" target="_blank" rel="noopener"&gt;Awesome MCP Servers&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP servers for modular AI backend systems&lt;/li&gt;
&lt;li&gt;Great for: Building with Model Context Protocol&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="specialized-awesome-lists"&gt;Specialized Awesome Lists&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/f/awesome-chatgpt-prompts" target="_blank" rel="noopener"&gt;Awesome ChatGPT Prompts&lt;/a&gt;&lt;/strong&gt; - Prompt examples for various scenarios&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/shubhamsaboo/awesome-llm-apps" target="_blank" rel="noopener"&gt;Awesome LLM Apps&lt;/a&gt;&lt;/strong&gt; - LLM applications with code examples&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/bradyfu/awesome-multimodal-large-language-models" target="_blank" rel="noopener"&gt;Awesome Multimodal LLM&lt;/a&gt;&lt;/strong&gt; - Multimodal model resources&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/punkpeye/awesome-mcp-clients" target="_blank" rel="noopener"&gt;Awesome MCP Clients&lt;/a&gt;&lt;/strong&gt; - MCP client tools and SDKs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/composiohq/awesome-claude-skills" target="_blank" rel="noopener"&gt;Awesome Claude Skills&lt;/a&gt;&lt;/strong&gt; - Claude Skills and workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/github/awesome-copilot" target="_blank" rel="noopener"&gt;Awesome GitHub Copilot&lt;/a&gt;&lt;/strong&gt; - Copilot customizations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/zerolu/awesome-nanobanana-pro" target="_blank" rel="noopener"&gt;Awesome Nano Banana Pro&lt;/a&gt;&lt;/strong&gt; - Image model prompts and examples&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/alchemyst-ai/awesome-saas" target="_blank" rel="noopener"&gt;Awesome SaaS&lt;/a&gt;&lt;/strong&gt; - AI platform templates&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/voltagent/awesome-claude-code-subagents" target="_blank" rel="noopener"&gt;Awesome Claude Code Subagents&lt;/a&gt;&lt;/strong&gt; - Claude Code subagents&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-courses--tutorials-9-curricula"&gt;🎓 Courses &amp;amp; Tutorials (9 Curricula)&lt;/h2&gt;
&lt;p&gt;Structured learning paths from universities and tech companies.&lt;/p&gt;
&lt;h3 id="microsofts-ai-curriculum"&gt;Microsoft&amp;rsquo;s AI Curriculum&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/ai-for-beginners" target="_blank" rel="noopener"&gt;AI for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;12 weeks, 24 lessons covering neural networks, deep learning, CV, NLP&lt;/li&gt;
&lt;li&gt;Great for: Complete AI foundation&lt;/li&gt;
&lt;li&gt;Format: Lessons, quizzes, projects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/ml-for-beginners" target="_blank" rel="noopener"&gt;Machine Learning for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;12-week, 26-lesson curriculum on classic ML&lt;/li&gt;
&lt;li&gt;Great for: ML fundamentals without deep math&lt;/li&gt;
&lt;li&gt;Format: Project-based exercises&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/generative-ai-for-beginners" target="_blank" rel="noopener"&gt;Generative AI for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;18 lessons on building GenAI applications&lt;/li&gt;
&lt;li&gt;Great for: Practical GenAI development&lt;/li&gt;
&lt;li&gt;Format: Hands-on projects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/ai-agents-for-beginners" target="_blank" rel="noopener"&gt;AI Agents for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;11 lessons on agent systems&lt;/li&gt;
&lt;li&gt;Great for: Understanding autonomous agents&lt;/li&gt;
&lt;li&gt;Format: Project-driven learning&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/edgeai-for-beginners" target="_blank" rel="noopener"&gt;EdgeAI for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Optimization, deployment, and real-world Edge AI&lt;/li&gt;
&lt;li&gt;Great for: On-device AI applications&lt;/li&gt;
&lt;li&gt;Format: Practical tutorials&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/microsoft/mcp-for-beginners" target="_blank" rel="noopener"&gt;MCP for Beginners&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model Context Protocol curriculum&lt;/li&gt;
&lt;li&gt;Great for: Building with MCP&lt;/li&gt;
&lt;li&gt;Format: Cross-language examples and labs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="official-platform-courses"&gt;Official Platform Courses&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/huggingface/course" target="_blank" rel="noopener"&gt;Hugging Face Learn Center&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Free courses on LLMs, deep RL, CV, audio&lt;/li&gt;
&lt;li&gt;Great for: Hands-on Hugging Face ecosystem&lt;/li&gt;
&lt;li&gt;Format: Interactive notebooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/openai/openai-cookbook" target="_blank" rel="noopener"&gt;OpenAI Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Runnable examples using OpenAI API&lt;/li&gt;
&lt;li&gt;Great for: OpenAI API best practices&lt;/li&gt;
&lt;li&gt;Format: Code examples and guides&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/pytorch/tutorials" target="_blank" rel="noopener"&gt;PyTorch Tutorials&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Basics to advanced deep learning&lt;/li&gt;
&lt;li&gt;Great for: PyTorch mastery&lt;/li&gt;
&lt;li&gt;Format: Comprehensive tutorials&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-cookbooks--example-collections-5-collections"&gt;🍳 Cookbooks &amp;amp; Example Collections (5 Collections)&lt;/h2&gt;
&lt;p&gt;Practical code examples and recipes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/anthropics/claude-cookbooks" target="_blank" rel="noopener"&gt;Claude Cookbooks&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Notebooks and examples for building with Claude&lt;/li&gt;
&lt;li&gt;Great for: Anthropic Claude integration&lt;/li&gt;
&lt;li&gt;Format: Jupyter notebooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/huggingface/cookbook" target="_blank" rel="noopener"&gt;Hugging Face Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Practical AI cookbook with Jupyter notebooks&lt;/li&gt;
&lt;li&gt;Great for: Open models and tools&lt;/li&gt;
&lt;li&gt;Format: Hands-on examples&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/RationaleInstitute/tinker-cookbook" target="_blank" rel="noopener"&gt;Tinker Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training and fine-tuning examples&lt;/li&gt;
&lt;li&gt;Great for: Fine-tuning workflows&lt;/li&gt;
&lt;li&gt;Format: Platform-specific recipes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/e2b-dev/e2b-cookbook" target="_blank" rel="noopener"&gt;E2B Cookbook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Examples for building LLM apps&lt;/li&gt;
&lt;li&gt;Great for: LLM application development&lt;/li&gt;
&lt;li&gt;Format: Recipes and tutorials&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/jamwithai/arxiv-paper-curator" target="_blank" rel="noopener"&gt;arXiv Paper Curator&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;6-week course on RAG systems&lt;/li&gt;
&lt;li&gt;Great for: Production-ready RAG&lt;/li&gt;
&lt;li&gt;Format: Project-based learning&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-guides--handbooks-5-resources"&gt;📖 Guides &amp;amp; Handbooks (5 Resources)&lt;/h2&gt;
&lt;p&gt;In-depth guides on specific topics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/dair-ai/prompt-engineering-guide" target="_blank" rel="noopener"&gt;Prompt Engineering Guide&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Comprehensive prompt engineering resources&lt;/li&gt;
&lt;li&gt;Great for: Mastering prompt design&lt;/li&gt;
&lt;li&gt;Format: Guides, papers, lectures, notebooks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/huggingface/evaluation-guidebook" target="_blank" rel="noopener"&gt;Evaluation Guidebook&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM evaluation best practices from Hugging Face&lt;/li&gt;
&lt;li&gt;Great for: Assessing LLM performance&lt;/li&gt;
&lt;li&gt;Format: Practical guide&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/davidkimai/context-engineering" target="_blank" rel="noopener"&gt;Context Engineering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Design and optimize context beyond prompt engineering&lt;/li&gt;
&lt;li&gt;Great for: Advanced context management&lt;/li&gt;
&lt;li&gt;Format: Practical handbook&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/coleam00/context-engineering-intro" target="_blank" rel="noopener"&gt;Context Engineering Intro&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Template and guide for context engineering&lt;/li&gt;
&lt;li&gt;Great for: Providing project context to AI assistants&lt;/li&gt;
&lt;li&gt;Format: Template + guide&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/IIETER/IIETER" target="_blank" rel="noopener"&gt;Vibe-Coding Workflow&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;5-step prompt template for building MVPs with LLMs&lt;/li&gt;
&lt;li&gt;Great for: Rapid prototyping with AI&lt;/li&gt;
&lt;li&gt;Format: Workflow template&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-template--workflow-collections"&gt;🗂️ Template &amp;amp; Workflow Collections&lt;/h2&gt;
&lt;p&gt;Reusable templates and workflows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/davila7/claude-code-templates" target="_blank" rel="noopener"&gt;Claude Code Templates&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code templates for various programming scenarios&lt;/li&gt;
&lt;li&gt;Great for: Claude AI development&lt;/li&gt;
&lt;li&gt;Format: Template collection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/zie619/n8n-workflows" target="_blank" rel="noopener"&gt;n8n Workflows&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2,000+ professionally organized n8n workflows&lt;/li&gt;
&lt;li&gt;Great for: Workflow automation&lt;/li&gt;
&lt;li&gt;Format: Searchable catalog&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/nusquama/n8nworkflows.xyz" target="_blank" rel="noopener"&gt;N8N Workflows Catalog&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Community-driven reusable workflow templates&lt;/li&gt;
&lt;li&gt;Great for: Workflow import and versioning&lt;/li&gt;
&lt;li&gt;Format: Template catalog&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-research--evaluation"&gt;📊 Research &amp;amp; Evaluation&lt;/h2&gt;
&lt;p&gt;Academic and evaluation resources.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/amberljc/llmsys-paperlist" target="_blank" rel="noopener"&gt;LLMSys PaperList&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Curated list of LLM systems papers&lt;/li&gt;
&lt;li&gt;Great for: Research on training, inference, serving&lt;/li&gt;
&lt;li&gt;Format: Paper collection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/cheahjs/free-llm-api-resources" target="_blank" rel="noopener"&gt;Free LLM API Resources&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM providers with free/trial API access&lt;/li&gt;
&lt;li&gt;Great for: Experimentation without cost&lt;/li&gt;
&lt;li&gt;Format: Provider list&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-other-notable-resources"&gt;🎨 Other Notable Resources&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools" target="_blank" rel="noopener"&gt;System Prompts and Models of AI Tools&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Community-curated collection of system prompts and AI tool examples&lt;/li&gt;
&lt;li&gt;Great for: Prompt and agent engineering&lt;/li&gt;
&lt;li&gt;Format: Resource collection&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/epfml/ml_course" target="_blank" rel="noopener"&gt;ML Course CS-433&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;EPFL Machine Learning Course&lt;/li&gt;
&lt;li&gt;Great for: Academic ML foundation&lt;/li&gt;
&lt;li&gt;Format: Lectures, labs, projects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/stas00/ml-engineering" target="_blank" rel="noopener"&gt;Machine Learning Engineering&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ML engineering open-book: compute, storage, networking&lt;/li&gt;
&lt;li&gt;Great for: Production ML systems&lt;/li&gt;
&lt;li&gt;Format: Comprehensive guide&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/neural-maze/realtime-phone-agents-course" target="_blank" rel="noopener"&gt;Realtime Phone Agents Course&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build low-latency voice agents&lt;/li&gt;
&lt;li&gt;Great for: Voice AI applications&lt;/li&gt;
&lt;li&gt;Format: Hands-on course&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/johnma2006/m3-workshop" target="_blank" rel="noopener"&gt;LLMs from Scratch&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Build a working LLM from first principles&lt;/li&gt;
&lt;li&gt;Great for: Understanding LLM internals&lt;/li&gt;
&lt;li&gt;Format: Repository + book materials&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="-how-to-use-this-collection"&gt;💡 How to Use This Collection&lt;/h2&gt;
&lt;h3 id="for-complete-beginners"&gt;For Complete Beginners&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Start with&lt;/strong&gt;: Microsoft&amp;rsquo;s AI for Beginners&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Practice with&lt;/strong&gt;: PyTorch Tutorials&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Explore&lt;/strong&gt;: Awesome AI Apps for inspiration&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="for-developers"&gt;For Developers&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Build skills&lt;/strong&gt;: OpenAI Cookbook + Claude Cookbooks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Find tools&lt;/strong&gt;: Awesome Generative AI + Awesome LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Learn workflows&lt;/strong&gt;: n8n Workflows Catalog&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="for-researchers"&gt;For Researchers&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Stay updated&lt;/strong&gt;: Awesome Generative AI + LLMSys PaperList&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deep dive&lt;/strong&gt;: Awesome LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Implement&lt;/strong&gt;: Hugging Face Cookbook&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="for-product-builders"&gt;For Product Builders&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Find examples&lt;/strong&gt;: Awesome AI Apps&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Learn workflows&lt;/strong&gt;: n8n Workflows Catalog&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Study patterns&lt;/strong&gt;: Awesome LLM Apps&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2 id="-what-was-not-removed"&gt;🔄 What Was NOT Removed&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Agent frameworks and production tools remain in the AI Resources list&lt;/strong&gt;, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AutoGen&lt;/strong&gt; - Microsoft&amp;rsquo;s multi-agent framework&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CrewAI&lt;/strong&gt; - High-performance multi-agent orchestration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LangGraph&lt;/strong&gt; - Stateful multi-agent applications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flowise&lt;/strong&gt; - Visual agent platform&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Langflow&lt;/strong&gt; - Visual workflow builder&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;And 80+ more agent frameworks&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are &lt;strong&gt;functional tools&lt;/strong&gt; you can use to build applications, not educational collections. They belong in the AI Resources list.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="-summary"&gt;📝 Summary&lt;/h2&gt;
&lt;p&gt;I removed &lt;strong&gt;44 collection-type projects&lt;/strong&gt; from the AI Resources list to keep it focused on production tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;14 Awesome Lists&lt;/strong&gt; - Discover new tools and stay updated&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;9 Courses &amp;amp; Tutorials&lt;/strong&gt; - Structured learning paths&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5 Cookbooks&lt;/strong&gt; - Practical code examples&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;5 Guides &amp;amp; Handbooks&lt;/strong&gt; - In-depth resources&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;4 Template Collections&lt;/strong&gt; - Reusable workflows&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;7 Other Resources&lt;/strong&gt; - Research and evaluation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These resources remain &lt;strong&gt;incredibly valuable&lt;/strong&gt; for learning and discovery. They just serve a different purpose than the production-focused tools in my AI Resources list.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Next Steps&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Bookmark this post for future reference&lt;/li&gt;
&lt;li&gt;Explore the &lt;a href="https://jimmysong.io/ai/"&gt;AI Resources list&lt;/a&gt; for production tools (agent frameworks, databases, etc.)&lt;/li&gt;
&lt;li&gt;Check out my blog for more AI engineering insights&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Acknowledgments&lt;/strong&gt;: This collection was compiled during my AI Resources cleanup initiative. Special thanks to all the maintainers of these awesome lists, courses, and collections for their invaluable contributions to the AI community.&lt;/p&gt;</content:encoded></item><item><title>Standing on Giants' Shoulders: The Traditional Infrastructure Powering Modern AI</title><link>https://jimmysong.io/blog/giants-beneath-ai-feet/</link><pubDate>Sun, 08 Feb 2026 08:00:00 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/giants-beneath-ai-feet/</guid><description>Before ChatGPT and TensorFlow, there was Hadoop, Kafka, and Kubernetes. This post honors the traditional open source infrastructure that became the foundation of today&amp;#39;s AI revolution.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;If I have seen further, it is by standing on the shoulders of giants.&amp;rdquo; — Isaac Newton&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/giants-beneath-ai-feet/banner.webp" data-img="https://assets.jimmysong.io/images/blog/giants-beneath-ai-feet/banner.webp" alt="Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI" data-caption="Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Standing on Giants’ Shoulders: The Traditional Infrastructure Powering Modern AI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In the excitement surrounding LLMs, vector databases, and AI agents, it&amp;rsquo;s easy to forget that modern AI didn&amp;rsquo;t emerge from a vacuum. Today&amp;rsquo;s AI revolution stands upon decades of infrastructure work—distributed systems, data pipelines, search engines, and orchestration platforms that were built long before &amp;ldquo;AI Native&amp;rdquo; became a buzzword.&lt;/p&gt;
&lt;p&gt;This post is a tribute to those traditional open source projects that became the invisible foundation of AI infrastructure. They&amp;rsquo;re not &amp;ldquo;AI projects&amp;rdquo; per se, but without them, the AI revolution as we know it wouldn&amp;rsquo;t exist.&lt;/p&gt;
&lt;h2 id="the-evolution-from-big-data-to-ai"&gt;The Evolution: From Big Data to AI&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Core Technologies&lt;/th&gt;
&lt;th&gt;AI Connection&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2000s&lt;/td&gt;
&lt;td&gt;Web Search &amp;amp; Indexing&lt;/td&gt;
&lt;td&gt;Lucene, Elasticsearch&lt;/td&gt;
&lt;td&gt;Semantic search foundations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2010s&lt;/td&gt;
&lt;td&gt;Big Data &amp;amp; Distributed Computing&lt;/td&gt;
&lt;td&gt;Hadoop, Spark, Kafka&lt;/td&gt;
&lt;td&gt;Data processing at scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2010s&lt;/td&gt;
&lt;td&gt;Cloud Native&lt;/td&gt;
&lt;td&gt;Docker, Kubernetes&lt;/td&gt;
&lt;td&gt;Model deployment platforms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2010s&lt;/td&gt;
&lt;td&gt;Stream Processing&lt;/td&gt;
&lt;td&gt;Flink, Storm, Pulsar&lt;/td&gt;
&lt;td&gt;Real-time ML inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2020s&lt;/td&gt;
&lt;td&gt;AI Native&lt;/td&gt;
&lt;td&gt;Transformers, Vector DBs&lt;/td&gt;
&lt;td&gt;Built on everything above&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Evolution of Data Infrastructure
&lt;/figcaption&gt;
&lt;h2 id="big-data-frameworks-the-data-engines"&gt;Big Data Frameworks: The Data Engines&lt;/h2&gt;
&lt;p&gt;Before we could train models on petabytes of data, we needed ways to store, process, and move that data.&lt;/p&gt;
&lt;h3 id="apache-hadoop-2006"&gt;Apache Hadoop (2006)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/hadoop" target="_blank" rel="noopener"&gt;https://github.com/apache/hadoop&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Hadoop democratized big data by making distributed computing accessible. Its HDFS filesystem and MapReduce paradigm proved that commodity hardware could process web-scale datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Modern ML training datasets live in HDFS-compatible storage&lt;/li&gt;
&lt;li&gt;Data lakes built on Hadoop became training data reservoirs&lt;/li&gt;
&lt;li&gt;Proved that distributed computing could scale horizontally&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="apache-kafka-2011"&gt;Apache Kafka (2011)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/kafka" target="_blank" rel="noopener"&gt;https://github.com/apache/kafka&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kafka redefined data streaming with its log-based architecture. It became the nervous system for real-time data flows in enterprises worldwide.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Real-time feature pipelines for ML models&lt;/li&gt;
&lt;li&gt;Event-driven architectures for AI agent systems&lt;/li&gt;
&lt;li&gt;Streaming inference pipelines&lt;/li&gt;
&lt;li&gt;Model telemetry and monitoring backbones&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="apache-spark-2014"&gt;Apache Spark (2014)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/spark" target="_blank" rel="noopener"&gt;https://github.com/apache/spark&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Spark brought in-memory computing to big data, making iterative algorithms (like ML training) practical at scale.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MLlib made ML accessible to data engineers&lt;/li&gt;
&lt;li&gt;Distributed data processing for model training&lt;/li&gt;
&lt;li&gt;Spark ML became the de facto standard for big data ML&lt;/li&gt;
&lt;li&gt;Proved that in-memory computing could accelerate ML workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="search-engines-the-retrieval-foundation"&gt;Search Engines: The Retrieval Foundation&lt;/h2&gt;
&lt;p&gt;Before RAG (Retrieval-Augmented Generation) became a buzzword, search engines were solving retrieval at scale.&lt;/p&gt;
&lt;h3 id="elasticsearch-2010"&gt;Elasticsearch (2010)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/elastic/elasticsearch" target="_blank" rel="noopener"&gt;https://github.com/elastic/elasticsearch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Elasticsearch made full-text search accessible and scalable. Its distributed architecture and RESTful API became the standard for search.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pioneered distributed inverted index structures&lt;/li&gt;
&lt;li&gt;Proved that horizontal scaling was possible for search workloads&lt;/li&gt;
&lt;li&gt;Many &amp;ldquo;AI search&amp;rdquo; systems actually use Elasticsearch under the hood&lt;/li&gt;
&lt;li&gt;Query DSL influenced modern vector database query languages&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="opensearch-2021"&gt;OpenSearch (2021)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/opensearch-project/opensearch" target="_blank" rel="noopener"&gt;https://github.com/opensearch-project/opensearch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;When AWS forked Elasticsearch, it ensured search infrastructure remained truly open. OpenSearch continues the mission of accessible, scalable search.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintains open source innovation in search&lt;/li&gt;
&lt;li&gt;Vector search capabilities added in 2023&lt;/li&gt;
&lt;li&gt;Demonstrates community fork resilience&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="databases-from-sql-to-vectors"&gt;Databases: From SQL to Vectors&lt;/h2&gt;
&lt;p&gt;The evolution from relational databases to vector databases represents a paradigm shift—but both have AI relevance.&lt;/p&gt;
&lt;h3 id="traditional-databases-that-paved-the-way"&gt;Traditional Databases That Paved the Way&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Dgraph&lt;/strong&gt; (2015) - Graph database proving that specialized data structures enable new use cases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TDengine&lt;/strong&gt; (2019) - Time-series database for IoT ML workloads&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OceanBase&lt;/strong&gt; (2021) - Distributed database showing ACID transactions could scale&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Proved that specialized database engines could outperform general-purpose ones&lt;/li&gt;
&lt;li&gt;Database internals (indexing, sharding, replication) are now applied to vector databases&lt;/li&gt;
&lt;li&gt;Multi-model databases (graph + vector + relational) are becoming the norm for AI apps&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="cloud-native-the-runtime-foundation"&gt;Cloud Native: The Runtime Foundation&lt;/h2&gt;
&lt;p&gt;When Docker and Kubernetes emerged, they weren&amp;rsquo;t built for AI—but AI couldn&amp;rsquo;t scale without them.&lt;/p&gt;
&lt;h3 id="docker-2013--kubernetes-2014"&gt;Docker (2013) &amp;amp; Kubernetes (2014)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/kubernetes/kubernetes" target="_blank" rel="noopener"&gt;https://github.com/kubernetes/kubernetes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Kubernetes became the operating system for cloud-native applications. Its declarative API and controller pattern made it perfect for AI workloads.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model deployment platforms (KServe, Seldon Core) run on K8s&lt;/li&gt;
&lt;li&gt;GPU orchestration (NVIDIA GPU Operator, Volcano, HAMi) extends K8s&lt;/li&gt;
&lt;li&gt;Kubeflow made K8s the standard for ML pipelines&lt;/li&gt;
&lt;li&gt;Microservice patterns enable modular AI agent architectures&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="service-mesh--serverless"&gt;Service Mesh &amp;amp; Serverless&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Istio&lt;/strong&gt; (2016), &lt;strong&gt;Knative&lt;/strong&gt; (2018) - Service mesh and serverless platforms that proved:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Network-level observability applies to AI model calls&lt;/li&gt;
&lt;li&gt;Scale-to-zero is essential for cost-effective inference&lt;/li&gt;
&lt;li&gt;Traffic splitting enables A/B testing of ML models&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI Gateway patterns evolved from API gateways + service mesh&lt;/li&gt;
&lt;li&gt;Serverless inference platforms use Knative-style autoscaling&lt;/li&gt;
&lt;li&gt;Observability patterns (tracing, metrics) are now standard for ML systems&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="api-gateways-from-rest-to-llm"&gt;API Gateways: From REST to LLM&lt;/h2&gt;
&lt;p&gt;API gateways weren&amp;rsquo;t designed for AI, but they became the foundation of AI Gateway patterns.&lt;/p&gt;
&lt;h3 id="kong-apisix-kgateway"&gt;Kong, APISIX, KGateway&lt;/h3&gt;
&lt;p&gt;These API gateways solved rate limiting, auth, and routing at scale. When LLMs emerged, the same patterns applied:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI Gateway Evolution&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Traditional API Gateway (2010s)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Rate Limiting → Token Bucket Rate Limiting
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Auth → API Key + Organization Management
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Routing → Model Routing (GPT-4 → Claude → Local Models)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Observability → LLM-specific Telemetry (token usage, cost)
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; ↓
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;AI Gateway (2024)
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Proved that centralized API management scales&lt;/li&gt;
&lt;li&gt;Plugin architectures enable LLM-specific features&lt;/li&gt;
&lt;li&gt;Traffic management patterns apply to prompt routing&lt;/li&gt;
&lt;li&gt;Security patterns (mTLS, JWT) now protect AI endpoints&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="workflow-orchestration-the-pipeline-backbone"&gt;Workflow Orchestration: The Pipeline Backbone&lt;/h2&gt;
&lt;p&gt;Data engineering needs pipelines. ML engineering needs pipelines. AI agents need workflows.&lt;/p&gt;
&lt;h3 id="apache-airflow-2015"&gt;Apache Airflow (2015)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/apache/airflow" target="_blank" rel="noopener"&gt;https://github.com/apache/airflow&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Airflow made pipeline orchestration accessible with its DAG-based approach. It became the standard for ETL and data engineering.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why it matters for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ML pipeline orchestration (feature engineering, training, evaluation)&lt;/li&gt;
&lt;li&gt;Proved that DAG-based workflow definition works at scale&lt;/li&gt;
&lt;li&gt;Prompt engineering pipelines use Airflow-style orchestration&lt;/li&gt;
&lt;li&gt;Scheduler patterns are now applied to AI agent workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="n8n-prefect-flyte"&gt;n8n, Prefect, Flyte&lt;/h3&gt;
&lt;p&gt;Modern workflow platforms that evolved from Airflow&amp;rsquo;s foundations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;n8n&lt;/strong&gt; (2019) - Visual workflow automation with AI capabilities&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefect&lt;/strong&gt; (2018) - Python-native workflow orchestration for ML&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flyte&lt;/strong&gt; (2019) - Kubernetes-native workflow orchestration for ML/data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-modal agents need workflow orchestration&lt;/li&gt;
&lt;li&gt;RAG pipelines are essentially ETL pipelines for embeddings&lt;/li&gt;
&lt;li&gt;Prompt chaining is DAG-based orchestration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="data-formats-the-lakehouse-foundation"&gt;Data Formats: The Lakehouse Foundation&lt;/h2&gt;
&lt;p&gt;Before we could train on massive datasets, we needed formats that supported ACID transactions and schema evolution.&lt;/p&gt;
&lt;h3 id="delta-lake-apache-iceberg-apache-hudi"&gt;Delta Lake, Apache Iceberg, Apache Hudi&lt;/h3&gt;
&lt;p&gt;These table formats brought reliability to data lakes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why they matter for AI&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training datasets need versioning and reproducibility&lt;/li&gt;
&lt;li&gt;Feature stores use Delta/Iceberg as storage formats&lt;/li&gt;
&lt;li&gt;Proved that &amp;ldquo;big data&amp;rdquo; could have transactional semantics&lt;/li&gt;
&lt;li&gt;Schema evolution handles ML feature drift&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-invisible-thread-why-these-projects-matter"&gt;The Invisible Thread: Why These Projects Matter&lt;/h2&gt;
&lt;p&gt;What do all these projects have in common?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;They solved scaling first&lt;/strong&gt; - AI training/inference needs horizontal scaling&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They proved distributed systems work&lt;/strong&gt; - Modern AI is fundamentally distributed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They created ecosystem patterns&lt;/strong&gt; - Plugin systems, extension points, APIs&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They established best practices&lt;/strong&gt; - Observability, security, CI/CD&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;They built developer habits&lt;/strong&gt; - YAML configs, declarative APIs, CLI tools&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="the-ai-native-continuum"&gt;The AI Native Continuum&lt;/h2&gt;
&lt;p&gt;Modern &amp;ldquo;AI Native&amp;rdquo; infrastructure didn&amp;rsquo;t replace these projects—it builds on them:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional Project&lt;/th&gt;
&lt;th&gt;AI Native Evolution&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hadoop HDFS&lt;/td&gt;
&lt;td&gt;Distributed model storage&lt;/td&gt;
&lt;td&gt;HDFS for datasets, S3 for checkpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kafka&lt;/td&gt;
&lt;td&gt;Real-time feature pipelines&lt;/td&gt;
&lt;td&gt;Kafka → Feature Store → Model Serving&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spark ML&lt;/td&gt;
&lt;td&gt;Distributed ML training&lt;/td&gt;
&lt;td&gt;MLlib → PyTorch Distributed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticsearch&lt;/td&gt;
&lt;td&gt;Vector search&lt;/td&gt;
&lt;td&gt;ES → Weaviate/Qdrant/Milvus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes&lt;/td&gt;
&lt;td&gt;ML orchestration&lt;/td&gt;
&lt;td&gt;K8s → Kubeflow/KServe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Istio&lt;/td&gt;
&lt;td&gt;AI Gateway service mesh&lt;/td&gt;
&lt;td&gt;Istio → LLM Gateway with mTLS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Airflow&lt;/td&gt;
&lt;td&gt;ML pipeline orchestration&lt;/td&gt;
&lt;td&gt;Airflow → Prefect/Flyte for ML&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: From Traditional to AI Native
&lt;/figcaption&gt;
&lt;h2 id="why-were-removing-them-from-ai-resources-list"&gt;Why We&amp;rsquo;re Removing Them from AI Resources List&lt;/h2&gt;
&lt;p&gt;This post honors these projects, but we&amp;rsquo;re also removing them from our AI Resources list. Here&amp;rsquo;s why:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;They&amp;rsquo;re not &amp;ldquo;AI Projects&amp;rdquo;—they&amp;rsquo;re foundational infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Hadoop, Kafka, Spark&lt;/strong&gt; are data engineering tools, not ML frameworks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elasticsearch&lt;/strong&gt; is search, not semantic search&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt; is general-purpose orchestration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;API gateways&lt;/strong&gt; serve REST/GraphQL, not just LLMs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;But their absence doesn&amp;rsquo;t diminish their importance.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;By removing them, we acknowledge that:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;AI has its own ecosystem&lt;/strong&gt; - Transformers, vector DBs, LLM ops&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Traditional infra has its own domain&lt;/strong&gt; - Data engineering, cloud native&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The intersection is where innovation happens&lt;/strong&gt; - AI-native data platforms, LLM ops on K8s&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="the-giants-we-stand-on"&gt;The Giants We Stand On&lt;/h2&gt;
&lt;p&gt;The next time you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deploy a model on Kubernetes&lt;/li&gt;
&lt;li&gt;Stream features through Kafka&lt;/li&gt;
&lt;li&gt;Search embeddings with a vector database&lt;/li&gt;
&lt;li&gt;Orchestrate a RAG pipeline with Prefect&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Remember: You&amp;rsquo;re standing on the shoulders of Hadoop, Kafka, Elasticsearch, Kubernetes, and countless others. They built the roads we now drive on.&lt;/p&gt;
&lt;h2 id="the-future-building-new-giants"&gt;The Future: Building New Giants&lt;/h2&gt;
&lt;p&gt;Just as Hadoop and Kafka enabled modern AI, today&amp;rsquo;s AI infrastructure will become tomorrow&amp;rsquo;s foundation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Vector databases&lt;/strong&gt; may become the new standard for all search&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LLM observability&lt;/strong&gt; may evolve into general distributed tracing&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI agent orchestration&lt;/strong&gt; may reinvent workflow automation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU scheduling&lt;/strong&gt; may influence general-purpose resource management&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cycle continues. The giants of today will be the foundations of tomorrow.&lt;/p&gt;
&lt;h2 id="conclusion-gratitude-and-continuity"&gt;Conclusion: Gratitude and Continuity&lt;/h2&gt;
&lt;p&gt;As we clean up our AI Resources list to focus on AI-native projects, we don&amp;rsquo;t forget where we came from. Traditional big data and cloud native infrastructure made the AI revolution possible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To the Hadoop committers, Kafka maintainers, Kubernetes contributors, and all who built the foundation: Thank you.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Your work enabled ChatGPT, enabled Transformers, enabled everything we now call &amp;ldquo;AI.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Standing on your shoulders, we see further.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Acknowledgments&lt;/strong&gt;: This post was inspired by the need to refactor our AI Resources list. The 27 projects mentioned here are being removed—not because they&amp;rsquo;re unimportant, but because they deserve their own category: &lt;strong&gt;The Foundation&lt;/strong&gt;.&lt;/p&gt;</content:encoded></item><item><title>My First Month at Dynamia: Why AI Native Infra Is Worth the Investment</title><link>https://jimmysong.io/blog/why-i-join-dynamia-ai-native-infra/</link><pubDate>Fri, 06 Feb 2026 12:56:35 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/why-i-join-dynamia-ai-native-infra/</guid><description>Observations from my first month at Dynamia: From cloud native to AI Native Infra, why this direction is worth investing in, and the key issues and opportunities in compute governance.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Time flies—it&amp;rsquo;s already been a month since I joined Dynamia. In this article, I want to share my observations from this past month: why AI Native Infra is a direction worth investing in, and some considerations for those thinking about their own career or technical direction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;After nearly five years of remote work, I officially joined &lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia&lt;/a&gt; last month as VP of Open Source Ecosystem. This decision was not sudden, but a natural extension of my journey from cloud native to AI Native Infra.&lt;/p&gt;
&lt;p&gt;But this article is not just about my personal choice. I want to answer a more universal question: &lt;strong&gt;In the wave of AI infrastructure startups, why is compute governance a direction worth investing in?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For the past decade, I have worked continuously in the infrastructure space: from Kubernetes to Service Mesh, and now to AI Infra. I am increasingly convinced that the core challenge in the AI era is not &amp;ldquo;can the model run,&amp;rdquo; but &amp;ldquo;can compute resources be run efficiently, reliably, and in a controlled manner.&amp;rdquo; This conviction has only grown stronger through my observations and reflections during this first month at Dynamia.&lt;/p&gt;
&lt;p&gt;This article answers three questions: What is AI Native Infra? Why is GPU virtualization a necessity? Why did I choose Dynamia and HAMi?&lt;/p&gt;
&lt;h2 id="what-is-ai-native-infra"&gt;What Is AI Native Infra&lt;/h2&gt;
&lt;p&gt;The core of &lt;a href="https://jimmysong.io/book/ai-native-infra/"&gt;AI Native Infrastructure&lt;/a&gt; is not about adding another platform layer, but about redefining the governance target: expanding from &amp;ldquo;services and containers&amp;rdquo; to &amp;ldquo;model behaviors and compute assets.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I summarize it as three key shifts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Models as execution entities&lt;/strong&gt;: Governance now includes not just processes, but also model behaviors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute as a scarce asset&lt;/strong&gt;: GPU, memory, and bandwidth must be scheduled and metered precisely.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Uncertainty as the default&lt;/strong&gt;: Systems must remain observable and recoverable amid fluctuations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In essence, AI Native Infra is about upgrading compute governance from &amp;ldquo;resource allocation&amp;rdquo; to &amp;ldquo;sustainable business capability.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="why-gpu-virtualization-is-essential"&gt;Why GPU Virtualization Is Essential&lt;/h2&gt;
&lt;p&gt;Many teams focus on model inference optimization, but in production, enterprises first encounter the problem of &amp;ldquo;underutilized GPUs.&amp;rdquo; This is where GPU virtualization delivers value.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Structural idleness&lt;/strong&gt;: Small tasks monopolize large GPUs, leaving them idle for long periods.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pseudo-isolation risks&lt;/strong&gt;: Native sharing lacks hard boundaries, so a single task OOM can cause cascading failures.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduling failures&lt;/strong&gt;: Some users queue for GPUs while others occupy but do not use them, leading to both shortages and idleness.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fragmentation waste&lt;/strong&gt;: There may be enough total GPU, but not enough full cards, making efficient packing impossible.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vendor lock-in anxiety&lt;/strong&gt;: Proprietary, tightly coupled solutions make migration costs uncontrollable.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short: GPUs must not only be allocatable, but also splittable, isolatable, schedulable, and governable.&lt;/p&gt;
&lt;h2 id="the-relationship-between-hami-and-dynamia"&gt;The Relationship Between HAMi and Dynamia&lt;/h2&gt;
&lt;p&gt;This is the most frequently asked question. Here is the shortest answer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HAMi&lt;/strong&gt;: A CNCF-hosted open source project and community focused on GPU virtualization and heterogeneous compute scheduling.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamia&lt;/strong&gt;: The founding and leading company behind HAMi, providing enterprise-grade products and services based on HAMi.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Open source projects are not the same as company products, but the two evolve together. HAMi drives industry adoption and technical trust, while Dynamia brings these capabilities into enterprise production environments at scale. This &amp;ldquo;dual engine&amp;rdquo; approach is what makes Dynamia unique.&lt;/p&gt;
&lt;h2 id="what-hami-provides"&gt;What HAMi Provides&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/project-hami/hami" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; (&lt;em&gt;Heterogeneous AI Computing Virtualization Middleware&lt;/em&gt;) delivers three key capabilities on Kubernetes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Virtualization and partitioning&lt;/strong&gt;: Split physical GPUs into logical resources on demand to improve utilization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scheduling and topology awareness&lt;/strong&gt;: Place workloads optimally based on topology to reduce communication bottlenecks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Isolation and observability&lt;/strong&gt;: Support quotas, policies, and monitoring to reduce production risks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently, HAMi has attracted over 360 contributors from 16 countries, with more than 200 enterprise end users, and its international influence continues to grow.&lt;/p&gt;
&lt;h2 id="market-trends-the-ai-infrastructure-startup-wave"&gt;Market Trends: The AI Infrastructure Startup Wave&lt;/h2&gt;
&lt;p&gt;AI infrastructure is experiencing a new wave of startups. The vLLM team&amp;rsquo;s company raised $150 million, SGLang&amp;rsquo;s commercial spin-off RadixArk is valued at $4 billion, and Databricks acquired MosaicML for $1.3 billion—all pointing to a consensus: &lt;strong&gt;Whoever helps enterprises run large models more efficiently and cost-effectively will hold the keys to next-generation AI infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Against this backdrop, &lt;strong&gt;the positioning of Dynamia and HAMi&lt;/strong&gt; is even clearer. Many teams focus on &amp;ldquo;model performance acceleration&amp;rdquo; and &amp;ldquo;inference optimization&amp;rdquo; (like vLLM, SGLang), while we focus on &lt;strong&gt;&amp;ldquo;resource scheduling and virtualization&amp;rdquo;&lt;/strong&gt;—enabling better orchestration of existing accelerated hardware resources.&lt;/p&gt;
&lt;p&gt;The two are complementary: the former makes individual models run faster and cheaper, while the latter ensures that compute allocation at the cluster level is efficient, fair, and controllable. This is similar to extending Kubernetes&amp;rsquo; CPU/memory scheduling philosophy to GPU and heterogeneous compute management in the AI era.&lt;/p&gt;
&lt;h2 id="why-ai-native-infra-is-worth-the-investment"&gt;Why AI Native Infra Is Worth the Investment&lt;/h2&gt;
&lt;p&gt;My observations this month have convinced me that &lt;strong&gt;compute governance is the most undervalued yet most promising area in AI infrastructure&lt;/strong&gt;. If you are considering a career or technical investment, here is my assessment:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;First, this is a real and urgent pain point&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Model training and inference optimization attract a lot of attention, but in production, enterprises first encounter the problem of &amp;ldquo;underutilized GPUs&amp;rdquo;—structural idleness, scheduling failures, fragmentation waste, and vendor lock-in anxiety. Without solving these problems, even the fastest models cannot scale in production. GPU virtualization and heterogeneous compute scheduling are the &amp;ldquo;infrastructure below infrastructure&amp;rdquo; for enterprise AI transformation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Second, this is a clear long-term track&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Frameworks like vLLM and SGLang emerge constantly, making individual models run faster. But who ensures that compute allocation at the cluster level is efficient, fair, and controllable? This is similar to extending Kubernetes&amp;rsquo; success in CPU/memory scheduling to GPU and heterogeneous compute management in the AI era. This is not something that can be finished in a year or two, but a direction for continuous construction over the next five to ten years.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Third, this is an open and verifiable path&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Dynamia chose to build on HAMi as an open source foundation, first solving general capabilities, then supporting enterprise adoption. This means the technical direction is transparent and verifiable in the community. You can form your own judgment by participating in open source, observing adoption, and evaluating the ecosystem—rather than relying on the black-box promises of proprietary solutions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fourth, this is a window of opportunity that is opening now&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;AI infrastructure is being redefined. Investing in its construction today will continue to yield value in the coming years. The vLLM team&amp;rsquo;s company raised $150 million, SGLang&amp;rsquo;s commercial spin-off RadixArk is valued at $4 billion, Databricks acquired MosaicML for $1.3 billion—all validating the same trend: &lt;strong&gt;Whoever helps enterprises run large models more efficiently will hold the keys to next-generation AI infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I hope to bring my experience in cloud native and open source communities to the next stage of HAMi and Dynamia: turning GPU resources from a &amp;ldquo;cost center&amp;rdquo; into an &amp;ldquo;operational asset.&amp;rdquo; This is not just my career choice, but my judgment and investment in the direction of next-generation infrastructure.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Join the HAMi Community
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
Add me on WeChat (&lt;code&gt;jimmysong&lt;/code&gt;) to join the &lt;a href="https://github.com/project-hami/hami" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; community focused on GPU virtualization and heterogeneous compute scheduling.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;If you are also interested in HAMi, GPU virtualization, AI Native Infra, or Dynamia, feel free to reach out.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;From cloud native to AI Native Infra, my observations this month have only strengthened my conviction: &lt;strong&gt;The true upper limit of AI applications is determined by the infrastructure&amp;rsquo;s ability to govern compute resources.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;HAMi addresses the fundamental issues of GPU virtualization and heterogeneous compute scheduling, while Dynamia is driving these capabilities into large-scale production. If you are also looking for a technical direction worth long-term investment, AI Native Infra—especially compute governance and scheduling—is a track with real pain points, a clear path, an open ecosystem, and an opening window of opportunity.&lt;/p&gt;
&lt;p&gt;Joining Dynamia is not just a career choice, but a commitment to building the next generation of infrastructure. I hope the observations and reflections in this article can provide some reference for you as you evaluate technical directions and career opportunities.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;If you are also interested in HAMi, GPU virtualization, AI Native Infra, or Dynamia, feel free to reach out.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>The True Inflection Point of ADD: When Spec Becomes the Core Asset of AI-Era Software</title><link>https://jimmysong.io/blog/add-inflection-point-spec-as-core-asset/</link><pubDate>Tue, 20 Jan 2026 07:51:36 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/add-inflection-point-spec-as-core-asset/</guid><description>Exploring how Spec becomes the governable core asset in Agent-Driven Development (ADD) and the trend toward control-plane engineering systems.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The role of Spec is undergoing a fundamental transformation, becoming the governance anchor of engineering systems in the AI era.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-essence-of-software-engineering-and-the-cost-structure-shift-brought-by-ai"&gt;The Essence of Software Engineering and the Cost Structure Shift Brought by AI&lt;/h2&gt;
&lt;p&gt;From first principles, software engineering has always been about one thing: &lt;strong&gt;stably, controllably, and reproducibly transforming human intent into executable systems.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Artificial Intelligence (AI) does not change this engineering essence, but it dramatically alters the cost structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Implementation costs plummet:&lt;/strong&gt; Code, tests, and boilerplate logic are rapidly commoditized.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Consistency costs rise sharply:&lt;/strong&gt; Intent drift, hidden conflicts, and cross-module inconsistencies become more frequent.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Governance costs are amplified:&lt;/strong&gt; As agents can act directly, auditability, accountability, and explainability become hard constraints.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, in the era of Agent-Driven Development (ADD), the core issue is not &amp;ldquo;can agents do the work,&amp;rdquo; but how to maintain controllability and intent preservation in engineering systems under highly autonomous agents.&lt;/p&gt;
&lt;h2 id="the-add-era-inflection-point-three-structural-preconditions"&gt;The ADD Era Inflection Point: Three Structural Preconditions&lt;/h2&gt;
&lt;p&gt;Many attribute the &amp;ldquo;explosion&amp;rdquo; of ADD to more mature multi-agent systems, stronger models, or more automated tools. In reality, the true structural inflection point arises only when these three conditions are met:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agents have acquired multi-step execution capabilities&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With frameworks like LangChain, LangGraph, and CrewAI, agents are no longer just prompt invocations, but long-lived entities capable of planning, decomposition, execution, and rollback.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agents are entering real enterprise delivery pipelines&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Once in enterprise R&amp;amp;D, the question shifts from &amp;ldquo;can it generate&amp;rdquo; to &amp;ldquo;who approved it, is it compliant, can it be rolled back.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Traditional engineering tools lack a control plane for the agent era&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Tools like Git, CI, and Issue Trackers were designed for &amp;ldquo;human developer collaboration,&amp;rdquo; not for &amp;ldquo;agent execution.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;When these three factors converge, ADD inevitably shifts from an &amp;ldquo;efficiency tool&amp;rdquo; to a &amp;ldquo;governance system.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-changing-role-of-spec-from-documentation-to-system-constraint"&gt;The Changing Role of Spec: From Documentation to System Constraint&lt;/h2&gt;
&lt;p&gt;In the context of ADD, Spec is undergoing a fundamental shift:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Spec is no longer &amp;ldquo;documentation for humans,&amp;rdquo; but &amp;ldquo;the source of constraints and facts for systems and agents to execute.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Spec now serves at least three roles:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Verifiable expression of intent and boundaries&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Requirements, acceptance criteria, and design principles are no longer just text, but objects that can be checked, aligned, and traced.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stable contracts for organizational collaboration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When agents participate in delivery, verbal consensus and tacit knowledge quickly fail. Versioned, auditable artifacts become the foundation of collaboration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Policy surface for agent execution&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Agents can write code, modify configurations, and trigger pipelines. Spec must become the constraint on &amp;ldquo;what can and cannot be done.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;From this perspective, the status of Spec is approaching that of the &lt;strong&gt;Control Plane&lt;/strong&gt; in AI-native infrastructure.&lt;/p&gt;
&lt;h2 id="the-reality-of-multi-agent-workflows-orchestration-and-governance-first"&gt;The Reality of Multi-Agent Workflows: Orchestration and Governance First&lt;/h2&gt;
&lt;p&gt;In recent systems (such as &lt;a href="https://apoxai.com" target="_blank" rel="noopener"&gt;APOX&lt;/a&gt; and other enterprise products), an industry consensus is emerging:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-agent collaboration no longer pursues &amp;ldquo;full automation,&amp;rdquo; but is staged and gated.&lt;/li&gt;
&lt;li&gt;Frameworks like LangGraph are used to build persistent, debuggable agent workflows.&lt;/li&gt;
&lt;li&gt;RAG (e.g., based on Milvus) is used to accumulate historical Specs, decisions, and context as long-term memory.&lt;/li&gt;
&lt;li&gt;The IDE mainly focuses on execution efficiency, not engineering governance.&lt;/li&gt;
&lt;/ul&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/add-inflection-point-spec-as-core-asset/apox.webp" data-img="https://assets.jimmysong.io/images/blog/add-inflection-point-spec-as-core-asset/apox.webp" alt="Figure 1: APOX user interface" data-caption="Figure 1: APOX user interface"
width="1400"
height="1045"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: APOX user interface&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;APOX (AI Product Orchestration eXtended) is a multi-agent collaboration workflow platform for enterprise software delivery. Its core goals are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;To connect the entire process from product requirements to executable code with a governable Agentflow and explicit engineering artifact chain.&lt;/li&gt;
&lt;li&gt;To assign dedicated AI agents to each delivery stage (such as PRD, PO, Architecture, Developer, Implementation, Coding, etc.).&lt;/li&gt;
&lt;li&gt;To embed manual approval gates and full audit trails at every step, solving the &amp;ldquo;intent drift and consistency&amp;rdquo; governance problem that traditional AI coding tools cannot address.&lt;/li&gt;
&lt;li&gt;The platform provides a VS Code plugin for real-time sync between local IDE and web artifacts, allowing Specs, code, tasks, and approval statuses to coexist in the repository.&lt;/li&gt;
&lt;li&gt;Supports assigning different base models to different agents according to enterprise needs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;APOX is not about simply speeding up code generation, but about elevating &amp;ldquo;Spec&amp;rdquo; from auxiliary documentation to a verifiable, constrainable, and traceable core asset in engineering—building a control plane and workflow governance system suitable for Agent-Driven Development.&lt;/p&gt;
&lt;p&gt;Such systems emphasize:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An explicit artifact chain from PRD → Spec → Task → Implementation.&lt;/li&gt;
&lt;li&gt;Manual confirmation and audit points at every stage.&lt;/li&gt;
&lt;li&gt;Bidirectional sync between Spec, code, repository, and IDE.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not about &amp;ldquo;smarter AI,&amp;rdquo; but about engineering systems adapting to the agent era.&lt;/p&gt;
&lt;h2 id="the-long-term-value-of-spec-the-core-anchor-of-engineering-assets"&gt;The Long-Term Value of Spec: The Core Anchor of Engineering Assets&lt;/h2&gt;
&lt;p&gt;This is not to devalue code, but to acknowledge reality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There will always be long-term differentiation in algorithms and model capabilities.&lt;/li&gt;
&lt;li&gt;General engineering implementation is rapidly homogenizing.&lt;/li&gt;
&lt;li&gt;What is hard to replicate is: how to define problems, constrain systems, and govern change.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the ADD era, the value of Spec is reflected in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Determining what agents can and cannot do.&lt;/li&gt;
&lt;li&gt;Carrying the organization&amp;rsquo;s long-term understanding of the system.&lt;/li&gt;
&lt;li&gt;Serving as the anchor for audit, compliance, and accountability.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Code will be rewritten again and again; Spec is the long-term asset.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="risks-and-challenges-of-add-living-spec-and-governance-constraints"&gt;Risks and Challenges of ADD: Living Spec and Governance Constraints&lt;/h2&gt;
&lt;p&gt;ADD also faces significant risks:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Can Spec become a Living Spec&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That is, when key implementation changes occur, can the system detect &amp;ldquo;intent changes&amp;rdquo; and prompt Spec updates, rather than allowing silent drift?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Can governance achieve low friction but strong constraints&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If gates are too strict, teams will bypass them; if too loose, the system loses control.&lt;/p&gt;
&lt;p&gt;These two factors determine whether ADD is &amp;ldquo;the next engineering paradigm&amp;rdquo; or &amp;ldquo;just another tool bubble.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-trend-toward-control-planes-in-engineering-systems"&gt;The Trend Toward Control Planes in Engineering Systems&lt;/h2&gt;
&lt;p&gt;From a broader perspective, ADD is the inevitable result of engineering systems becoming &amp;ldquo;control planes&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Engineering systems are evolving from &amp;ldquo;human collaboration tools&amp;rdquo; to &amp;ldquo;control systems for agent execution.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this structure:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agent / IDE is the &lt;strong&gt;execution plane&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;RAG / Memory is the &lt;strong&gt;state and memory plane&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Spec is the intent and policy plane&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Gates, audit, and traceability form the governance loop.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This closely aligns with the evolution path of AI-native infrastructure.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The winners of the ADD era will not be the systems with &amp;ldquo;the most agents or the fastest generation,&amp;rdquo; but those that first upgrade Spec from documentation to a governable, auditable, and executable asset. As automation advances, the true scarcity is the long-term control of intent.&lt;/p&gt;</content:encoded></item><item><title>AI Voice Dictation Input Methods Are Becoming the New Shortcut Key for the Programming Era</title><link>https://jimmysong.io/blog/ai-voice-dictation-input-method-comparison/</link><pubDate>Sun, 18 Jan 2026 06:53:08 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-voice-dictation-input-method-comparison/</guid><description>Comparing Miaoyan, Zhipu, and Shandianshuo voice input methods for developers: speed, stability, command capabilities, and cost models.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Voice input methods are not just about being &amp;ldquo;fast&amp;rdquo;—they are becoming a brand new gateway for developers to collaborate with AI.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="alert alert-warning-container"&gt;
&lt;div class="alert-warning-title px-2"&gt;
Warning
&lt;/div&gt;
&lt;div class="alert-warning px-2"&gt;
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.
&lt;/div&gt;
&lt;/div&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/banner.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/banner.webp" alt="Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience." data-caption="Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience."
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Can voice input become the new shortcut for developers? My in-depth comparison experience.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="ai-voice-input-methods-are-becoming-the-new-shortcut-key-in-the-programming-era"&gt;AI Voice Input Methods Are Becoming the &amp;ldquo;New Shortcut Key&amp;rdquo; in the Programming Era&lt;/h2&gt;
&lt;p&gt;I am increasingly convinced of one thing: &lt;strong&gt;PC-based AI voice input methods are evolving from mere &amp;ldquo;input tools&amp;rdquo; into the foundational interaction layer for the era of programming and AI collaboration.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not just about typing faster—it determines how you deliver your &lt;strong&gt;intent&lt;/strong&gt; to the system, whether you&amp;rsquo;re writing documentation, code, or collaborating with AI in IDEs, terminals, or chat windows.&lt;/p&gt;
&lt;p&gt;Because of this, the differences in voice input method experiences are far more significant than they appear on the surface.&lt;/p&gt;
&lt;h2 id="my-six-evaluation-criteria-for-ai-voice-input-methods"&gt;My Six Evaluation Criteria for AI Voice Input Methods&lt;/h2&gt;
&lt;p&gt;After long-term, high-frequency use, I have developed a set of criteria to assess the real-world performance of AI voice input methods:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Response speed&lt;/strong&gt;: Does text appear quickly enough after pressing the shortcut to keep up with your thoughts?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Continuous input stability&lt;/strong&gt;: Does it remain reliable during extended use, or does it suddenly fail or miss recognition?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mixed Chinese-English and technical terms&lt;/strong&gt;: Can it reliably handle code, paths, abbreviations, and product names?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developer friendliness&lt;/strong&gt;: Is it truly designed for command line, IDE, and automation scenarios?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Interaction restraint&lt;/strong&gt;: Does it avoid introducing distracting features that interfere with input itself?&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Subscription and cost structure&lt;/strong&gt;: Is it a standalone paid product, or can it be bundled with existing tool subscriptions?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on these criteria, I focused on comparing &lt;strong&gt;Miaoyan&lt;/strong&gt;, &lt;strong&gt;Shandianshuo&lt;/strong&gt;, and &lt;strong&gt;Zhipu AI Voice Input Method&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="miaoyan-currently-the-most-developer-oriented-domestic-product"&gt;Miaoyan: Currently the Most &amp;ldquo;Developer-Oriented&amp;rdquo; Domestic Product&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://miaoyan.cn" target="_blank" rel="noopener"&gt;Miaoyan&lt;/a&gt; was the first domestic AI voice input method I used extensively, and it remains the one I am most willing to use continuously.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/miaoyan.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/miaoyan.webp" alt="Figure 2: Miaoyan is currently my most-used Mac voice input method." data-caption="Figure 2: Miaoyan is currently my most-used Mac voice input method."
width="2272"
height="1624"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Miaoyan is currently my most-used Mac voice input method.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="command-mode-the-key-differentiator-for-developer-productivity"&gt;Command Mode: The Key Differentiator for Developer Productivity&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s important to clarify that &lt;strong&gt;Miaoyan&amp;rsquo;s command mode is not about editing text via voice&lt;/strong&gt;. Instead:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You describe your need in natural language, and the system directly generates an &lt;strong&gt;executable command-line command&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is crucial for developers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s not just about input&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s about turning voice into an automation entry point&lt;/li&gt;
&lt;li&gt;Essentially, it connects voice to the CLI or toolchain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This design is clearly focused on &lt;strong&gt;engineering efficiency&lt;/strong&gt;, not office document polishing.&lt;/p&gt;
&lt;h3 id="usage-experience-summary"&gt;Usage Experience Summary&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Fast response, nearly instant&lt;/li&gt;
&lt;li&gt;Output is relatively clean, with minimal guessing&lt;/li&gt;
&lt;li&gt;Interaction design is restrained, with no unnecessary concepts&lt;/li&gt;
&lt;li&gt;Developer-friendly mindset&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But there are some practical limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It is a &lt;strong&gt;completely standalone product&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Requires a separate subscription&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Still in relatively small-scale use&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a product strategy perspective, it feels more like a &amp;ldquo;pure tool&amp;rdquo; than part of an ecosystem.&lt;/p&gt;
&lt;div class="alert alert-warning-container"&gt;
&lt;div class="alert-warning-title px-2"&gt;
Note
&lt;/div&gt;
&lt;div class="alert-warning px-2"&gt;
On January 12, 2026, due to financial difficulties encountered during operations, the Miaoyan project announced the cessation of operations and the team was disbanded. The application will no longer be updated or maintained, but existing versions can continue to be used on the current device and system, and do not store any audio or transcription content.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="shandianshuo-local-first-approach-developer-experience-depends-on-your-setup"&gt;Shandianshuo: Local-First Approach, Developer Experience Depends on Your Setup&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://shandianshuo.cn" target="_blank" rel="noopener"&gt;Shandianshuo&lt;/a&gt; takes a different approach: it treats voice input as a &amp;ldquo;local-first foundational capability,&amp;rdquo; emphasizing low latency and privacy (at least in its product narrative). The natural advantages of this approach are speed and controllable marginal costs, making it suitable as a &amp;ldquo;system capability&amp;rdquo; that&amp;rsquo;s always available, rather than a cloud service.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/shandianshuo.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/shandianshuo.webp" alt="Figure 3: Shandianshuo settings page" data-caption="Figure 3: Shandianshuo settings page"
width="2556"
height="2080"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Shandianshuo settings page&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;However, from a developer&amp;rsquo;s perspective, its upper limit often depends on &amp;ldquo;how you implement enhanced capabilities&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;If you only use it for basic transcription, the experience is more like a high-quality local input tool. But if you want better mixed Chinese-English input, technical term correction, symbol and formatting handling, the common approach is to add optional AI correction/enhancement capabilities, which usually requires extra configuration (such as providing your own API key or subscribing to enhanced features). The key trade-off here is not &amp;ldquo;can it be used,&amp;rdquo; but &amp;ldquo;how much configuration cost are you willing to pay for enhanced capabilities.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;If you want voice input to be a &amp;ldquo;lightweight, stable, non-intrusive&amp;rdquo; foundation, Shandianshuo is worth considering. But if your goal is to make voice input part of your developer workflow (such as command generation or executable actions), it needs to offer stronger productized design at the &amp;ldquo;command layer&amp;rdquo; and in terms of controllability.&lt;/p&gt;
&lt;h2 id="zhipu-ai-voice-input-method-stable-but-with-friction"&gt;Zhipu AI Voice Input Method: Stable but with Friction&lt;/h2&gt;
&lt;p&gt;I also thoroughly tested the &lt;a href="https://autoglm.zhipuai.cn/autotyper/" target="_blank" rel="noopener"&gt;Zhipu AI Voice Input Method&lt;/a&gt;.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/autoglm.webp" data-img="https://assets.jimmysong.io/images/blog/ai-voice-dictation-input-method-comparison/autoglm.webp" alt="Figure 4: Zhipu Voice Input Method settings interface" data-caption="Figure 4: Zhipu Voice Input Method settings interface"
width="2430"
height="1824"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Zhipu Voice Input Method settings interface&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Its strengths include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;More stable for long-term continuous input&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Rarely becomes completely unresponsive&lt;/li&gt;
&lt;li&gt;Good tolerance for longer Chinese input&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But with frequent use, some issues stand out:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Idle misrecognition&lt;/strong&gt;: If you press the shortcut but don&amp;rsquo;t speak, it may output random characters, disrupting your input flow&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Occasionally messy output&lt;/strong&gt;: Sometimes adds irrelevant words, making it less controllable than Miaoyan&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Basic recognition errors&lt;/strong&gt;: For example, &amp;ldquo;Zhipu&amp;rdquo; being recognized as &amp;ldquo;Zhipu&amp;rdquo; (with a different character), which is a trust issue for professional users&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature-heavy design&lt;/strong&gt;: Various tone and style features increase cognitive load&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="subscription-bundling-zhipus-practical-advantage"&gt;Subscription Bundling: Zhipu&amp;rsquo;s Practical Advantage&lt;/h2&gt;
&lt;p&gt;Although I prefer Miaoyan in terms of experience, &lt;strong&gt;Zhipu has a very practical advantage&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;If you already subscribe to Zhipu&amp;rsquo;s programming package, &lt;strong&gt;the voice input method is included for free&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No need to pay separately for the input method&lt;/li&gt;
&lt;li&gt;Lower psychological and decision-making cost&lt;/li&gt;
&lt;li&gt;More likely to become the &amp;ldquo;default tool&amp;rdquo; that stays&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From a business perspective, this is a very smart strategy.&lt;/p&gt;
&lt;h2 id="main-comparison-table"&gt;Main Comparison Table&lt;/h2&gt;
&lt;p&gt;The following table compares the three products across key dimensions for quick reference.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Miaoyan&lt;/th&gt;
&lt;th&gt;Shandianshuo&lt;/th&gt;
&lt;th&gt;Zhipu AI Voice Input Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Response Speed&lt;/td&gt;
&lt;td&gt;Fast, nearly instant&lt;/td&gt;
&lt;td&gt;Usually fast (local-first)&lt;/td&gt;
&lt;td&gt;Slightly slower than Miaoyan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continuous Stability&lt;/td&gt;
&lt;td&gt;Stable&lt;/td&gt;
&lt;td&gt;Depends on setup and environment&lt;/td&gt;
&lt;td&gt;Very stable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Idle Misrecognition&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;td&gt;Generally restrained (varies by version)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Obvious: outputs characters even if silent&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Cleanliness/Control&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;More like an &amp;ldquo;input tool&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Occasionally messy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer Differentiator&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Natural language → executable command&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local-first / optional enhancements&lt;/td&gt;
&lt;td&gt;Ecosystem-attached capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subscription &amp;amp; Cost&lt;/td&gt;
&lt;td&gt;Standalone, separate purchase&lt;/td&gt;
&lt;td&gt;Basic usable; enhancements often require setup/subscription&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Bundled free with programming package&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;My Current Preference&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Best experience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;More like a &amp;ldquo;foundation approach&amp;rdquo;&lt;/td&gt;
&lt;td&gt;Easy to keep but not clean enough&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Core Comparison of Miaoyan, Shandianshuo, and Zhipu AI Voice Input Methods
&lt;/figcaption&gt;
&lt;h2 id="user-loyalty-to-ai-voice-input-methods"&gt;User Loyalty to AI Voice Input Methods&lt;/h2&gt;
&lt;p&gt;The switching cost for voice input methods is actually low: just a shortcut key and a habit of output.&lt;/p&gt;
&lt;p&gt;What really determines whether users stick around is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether the output is controllable&lt;/li&gt;
&lt;li&gt;Whether it keeps causing annoying minor issues&lt;/li&gt;
&lt;li&gt;Whether it integrates into your existing workflow and payment structure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For me personally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The best and smoothest experience is still Miaoyan&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The one most likely to stick around is probably Zhipu&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Shandianshuo is more of a &amp;ldquo;foundation approach&amp;rdquo; and worth watching for how its enhancements evolve&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These points are not contradictory.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Miaoyan is more mature in &lt;strong&gt;engineering orientation, command capabilities, and input control&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Zhipu has practical advantages in &lt;strong&gt;stability and subscription bundling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Shandianshuo takes a &lt;strong&gt;local-first + optional enhancement&lt;/strong&gt; approach, with the key being how it balances &amp;ldquo;basic capability&amp;rdquo; and &amp;ldquo;enhancement cost&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Who truly becomes the &amp;ldquo;default gateway&amp;rdquo; depends on reducing distractions, fixing frequent minor issues, and treating voice input as true &amp;ldquo;infrastructure&amp;rdquo; rather than an add-on feature&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;The competition among AI voice input methods is no longer about recognition accuracy, but about who can own the shortcut key you press every day.&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title>From Spatial Data to AI Open Source: Technical Standards, Data Sovereignty, and the Global Divide</title><link>https://jimmysong.io/blog/spatial-data-ai-open-source-standards-sovereignty/</link><pubDate>Sun, 11 Jan 2026 03:29:28 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/spatial-data-ai-open-source-standards-sovereignty/</guid><description>How technical standards and data sovereignty shape AI open source paths and infrastructure competition in the global AI era.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The divide in technical standards and data sovereignty determines the global competitive landscape of infrastructure open source in the AI era.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this article, I will use the differences in air quality data presentation in Apple Maps and Weather as a starting point to explore how technical standards and data sovereignty influence the open source paths of AI in different countries. I will further analyze why, in the AI era, infrastructure-level open source has become the key battleground for ecosystem dominance.&lt;/p&gt;
&lt;h2 id="authors-note"&gt;Author&amp;rsquo;s Note&lt;/h2&gt;
&lt;p&gt;This article originates from a very everyday observation: Why is air quality data in China shown as &amp;ldquo;points&amp;rdquo; in Apple Maps and Weather, while in other countries it is often displayed as &amp;ldquo;areas&amp;rdquo;?&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/aqi-map.webp" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/aqi-map.webp" alt="Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries" data-caption="Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries"
width="1650"
height="1864"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Air quality map in Apple Weather, showing point-based data in China and area-based data in other countries&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;At first glance, it seems like a product experience difference. But when I reconsidered this issue in the context of engineering, standards, and system design, I realized it actually points to a much bigger question: how different countries understand the relationship between technology, standards, openness, and sovereignty.&lt;/p&gt;
&lt;p&gt;As an engineer who has long worked in cloud native, AI infrastructure, and open source ecosystems, I gradually realized that this difference is not limited to air quality or map data. In the AI era, it is further amplified, directly affecting how we open source models, build infrastructure, and whether we can participate in the formulation of global rules.&lt;/p&gt;
&lt;p&gt;Writing this article is not about judging right or wrong, but about using a concrete example to explain a structural difference and discuss the long-term impact and real opportunities this difference may bring in the AI era.&lt;/p&gt;
&lt;p&gt;What is especially important: at the level of AI infrastructure and infra-level open source, the competition has just begun. China is not without opportunities, but the choice of path will become more critical than ever.&lt;/p&gt;
&lt;h2 id="differences-in-air-quality-data-presentation-a-microcosm-of-technical-standards-and-sovereignty"&gt;Differences in Air Quality Data Presentation: A Microcosm of Technical Standards and Sovereignty&lt;/h2&gt;
&lt;p&gt;The following image illustrates the divide between spatial data, AI open source, and technical standards. By comparing how air quality data is presented in Apple Maps and Weather in different countries, you can intuitively feel the differences in technical standards and sovereignty strategies behind the scenes.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/banner.webp" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/banner.webp" alt="Figure 2: The divide between spatial data, AI open source, and technical standards" data-caption="Figure 2: The divide between spatial data, AI open source, and technical standards"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: The divide between spatial data, AI open source, and technical standards&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;If you regularly use global products such as maps, weather, traffic, or various data services, you may notice a recurring phenomenon that is rarely discussed seriously: the way data is presented in China often differs significantly from global mainstream standards.&lt;/p&gt;
&lt;p&gt;A very intuitive example comes from the air quality display in Apple Maps or Weather. In China, air quality is usually shown as discrete points; in the US, Europe, Japan, and other countries, it is often rendered as continuous coverage areas.&lt;/p&gt;
&lt;p&gt;At first glance, this seems like a product experience difference, and may even lead people to mistakenly believe that &amp;ldquo;China&amp;rsquo;s data is incomplete.&amp;rdquo; But if you treat it as an engineering or system design issue, you will find: this is not a matter of data capability, but a different choice in technical standards, data sovereignty, and openness strategies.&lt;/p&gt;
&lt;p&gt;And this choice is not limited to air quality.&lt;/p&gt;
&lt;h2 id="air-quality-is-just-a-slice-greater-differences-in-spatial-public-data"&gt;Air Quality Is Just a Slice: Greater Differences in Spatial Public Data&lt;/h2&gt;
&lt;p&gt;Air quality is just a highly visible and relatively low-risk example. Similar differences have long existed in broader spatial and public data domains.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maps and coordinate systems&lt;/li&gt;
&lt;li&gt;Surveying and high-precision spatial data&lt;/li&gt;
&lt;li&gt;Real-time traffic and population movement&lt;/li&gt;
&lt;li&gt;Remote sensing, environmental, and urban operation data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In global mainstream systems, such data is usually regarded as public information infrastructure. It is standardized, gridded, API-ified, allows interpolation, modeling, and redistribution, and is widely used in research, business, and product innovation.&lt;/p&gt;
&lt;p&gt;In China, this data often takes another form: hierarchical, discrete, strictly defined, and with centralized interpretation authority.&lt;/p&gt;
&lt;p&gt;This is not a technical preference in a single field, but a systemic logic of technology and governance.&lt;/p&gt;
&lt;h2 id="three-global-paths"&gt;Three Global Paths&lt;/h2&gt;
&lt;p&gt;Placing China in a global context, we can see that there are roughly three different paths worldwide regarding &amp;ldquo;how public data and technical standards are opened.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Engineering-Open Type: Standards and Ecosystem First&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Represented by the US and some European countries, the core features of this system are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Public data prioritized as infrastructure&lt;/li&gt;
&lt;li&gt;Standards and interfaces come first&lt;/li&gt;
&lt;li&gt;Encourages engineering autonomy and ecosystem evolution&lt;/li&gt;
&lt;li&gt;Tolerates model inference and uncertainty&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This path directly shaped the global landscape of foundational software and infrastructure-level open source. Linux, Kubernetes, and the cloud native system are essentially products of openness at the rules layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Governance-Sovereignty Type: Control and Auditability First&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Represented by China, this path emphasizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sensitivity of spatial and public data&lt;/li&gt;
&lt;li&gt;Data as part of governance capability&lt;/li&gt;
&lt;li&gt;Standards, definitions, and release methods are highly bound&lt;/li&gt;
&lt;li&gt;Emphasizes traceability, accountability, and controllability&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this system, &amp;ldquo;point data&amp;rdquo; is not a sign of technological backwardness, but a governable technical form. When a technical system is designed as a governance system, its primary goal is not reusability, but controllability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Compromise-Coordinated Type: Cautious Openness, Engineering Internationalization&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some countries try to find a balance between the two, maintaining caution in spatial data while being highly internationalized in engineering and industry. This shows that the difference is not about being advanced or backward, but about different objective functions.&lt;/p&gt;
&lt;p&gt;The following diagram compares the core characteristics, typical cases, and advantages/challenges of these three paths from a global perspective. The &amp;ldquo;Engineering-Open Type&amp;rdquo; on the left shapes the global infrastructure software landscape through standards and ecosystems; the &amp;ldquo;Governance-Sovereignty Type&amp;rdquo; in the middle emphasizes data sovereignty and security controllability but has limitations in influence at the rules layer; the &amp;ldquo;Compromise-Coordinated Type&amp;rdquo; on the right attempts to find a balance between security and openness. The divide between these three paths directly affects the infrastructure competition landscape of various countries in the AI era.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/global-three-paths-en.svg" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/global-three-paths-en.svg" alt="Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards" data-caption="Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards"
width="2663"
height="1862"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Global Perspective: Three Paths for Public Data and Technical Standards&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="the-essence-of-point-vs-area-in-air-quality"&gt;The Essence of &amp;ldquo;Point&amp;rdquo; vs. &amp;ldquo;Area&amp;rdquo; in Air Quality&lt;/h2&gt;
&lt;p&gt;Among all spatial public data, air quality is an ideal observation window:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does not directly involve military or core economic security&lt;/li&gt;
&lt;li&gt;Highly visible, updated daily, and perceptible to everyone&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;China does not lack air quality data; on the contrary, the density of monitoring stations is among the highest in the world. The real difference lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether interpolation is allowed&lt;/li&gt;
&lt;li&gt;Whether model inference is allowed&lt;/li&gt;
&lt;li&gt;Whether platforms are allowed to reinterpret the data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&amp;ldquo;Point&amp;rdquo; means authenticity and traceability; &amp;ldquo;area&amp;rdquo; means models, inference, and redistribution of interpretive authority. This is precisely the watershed between technical standards and data sovereignty.&lt;/p&gt;
&lt;p&gt;The following diagram compares two different technical paths. The left side, &amp;ldquo;Governance-Sovereignty Type,&amp;rdquo; emphasizes data traceability and controllability, using discrete point-based data presentation. The right side, &amp;ldquo;Engineering-Open Type,&amp;rdquo; allows model interpolation and inference, providing more user-friendly experience through continuous area-based coverage. The essence of this difference lies not in the level of technical capability, but in the different choices made between data sovereignty, governance capability, and open ecosystems.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/data-sovereignty-comparison-en.svg" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/data-sovereignty-comparison-en.svg" alt="Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation" data-caption="Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation"
width="2263"
height="1562"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Technical Standards and Sovereignty Divide in Spatial Data Presentation&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="the-amplification-effect-in-the-ai-era"&gt;The Amplification Effect in the AI Era&lt;/h2&gt;
&lt;p&gt;With the above logic in mind, many phenomena in the AI era become less confusing.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why are Chinese AI companies more willing to open source large language model (LLM) weights, while American companies have clearly shifted toward closed source in recent years?&lt;/li&gt;
&lt;li&gt;Why is foundational software and infrastructure-level open source still mainly led by the US?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The key is not &amp;ldquo;whether to open source,&amp;rdquo; but &amp;ldquo;which layer is open sourced.&amp;rdquo;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model weights are static, declarable assets&lt;/li&gt;
&lt;li&gt;Infrastructure, runtimes, protocols, and standards are dynamic, evolving system rules&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Open sourcing weights is essentially openness at the asset layer; infrastructure-level open source means relinquishing control over operating rules and interpretive authority.&lt;/p&gt;
&lt;p&gt;The following diagram compares two different layers of AI open source. The left side shows &amp;ldquo;Model Weight Layer Open Source,&amp;rdquo; which is a typical feature of Chinese path—opening static digital assets with low cost and controllable risk, but not involving rule-making. The right side shows &amp;ldquo;Infrastructure Layer Open Source,&amp;rdquo; which is a core strategy of US path—by open sourcing development tools, protocol standards, runtimes, and compute scheduling and other infrastructure, defining how AI is used, thereby mastering ecosystem rules and interpretive authority. Key insight: Open sourcing model weights does not equal mastering AI ecosystem, and the real competitive focus is shifting to the infrastructure layer of &amp;ldquo;how AI runs.&amp;rdquo;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/ai-opensource-layers-en.svg" data-img="https://assets.jimmysong.io/images/blog/spatial-data-ai-open-source-standards-sovereignty/ai-opensource-layers-en.svg" alt="Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure" data-caption="Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure"
width="2363"
height="1862"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Two Layers of AI Era Open Source: Model Weights vs Infrastructure&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="the-us-approach-focusing-on-rules-and-runtime-layers"&gt;The US Approach: Focusing on Rules and Runtime Layers&lt;/h2&gt;
&lt;p&gt;In the past year or two, US-led AI open source and ecosystem initiatives have shown a highly consistent direction: not rushing to open source the strongest models, but focusing on defining &amp;ldquo;how AI is used.&amp;rdquo;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Linux Foundation established &lt;a href="https://aaif.io" target="_blank" rel="noopener"&gt;AAIF&lt;/a&gt; (Agentic AI Foundation), focusing on AI infrastructure, standards, and toolchain collaboration&lt;/li&gt;
&lt;li&gt;Protocols like MCP (Model Context Protocol) aim to define common interaction methods between agents and tools/systems&lt;/li&gt;
&lt;li&gt;Major tech companies are generally focusing on APIs, platforms, runtimes, and ecosystem binding&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The commonality of these actions: competing in model capability, but controlling the usage rules.&lt;/p&gt;
&lt;h2 id="chinas-shift-from-model-oriented-to-infrastructure-oriented"&gt;China&amp;rsquo;s Shift: From Model-Oriented to Infrastructure-Oriented&lt;/h2&gt;
&lt;p&gt;It is important to emphasize that this difference does not mean China is unaware of the issue.&lt;/p&gt;
&lt;p&gt;Whether in policy discussions or within industry and research institutions, the risk of &amp;ldquo;only open sourcing models without controlling infrastructure and standard dominance&amp;rdquo; has been repeatedly discussed.&lt;/p&gt;
&lt;p&gt;The real challenge lies in how to achieve a directional shift within the existing governance logic and risk framework. This shift has already appeared in some concrete practices.&lt;/p&gt;
&lt;h2 id="exploration-and-practice-at-the-infrastructure-layer"&gt;Exploration and Practice at the Infrastructure Layer&lt;/h2&gt;
&lt;p&gt;In the AI era, infrastructure often starts with the most engineering-driven problems.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;HAMi Project&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Projects like &lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; do not focus on model capability, but on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Abstraction, allocation, and isolation of GPU resources&lt;/li&gt;
&lt;li&gt;How multi-tenant AI workloads are run&lt;/li&gt;
&lt;li&gt;How computing power transitions from hardware assets to governable system resources&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The significance of such projects is not about being &amp;ldquo;SOTA,&amp;rdquo; but about entering the domain of &amp;ldquo;how AI runs.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI Runtime Reconstruction from a System Software Review&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Exploration at the research institution level is also noteworthy. The &lt;a href="https://www.flagos.io" target="_blank" rel="noopener"&gt;FlagOS&lt;/a&gt; initiative by the Beijing Academy of Artificial Intelligence is a clear signal: AI is being redefined as a system software issue, not just a model or algorithm problem.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Long-Term Tech Stack Investment by Industry Players&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In the industry, Huawei&amp;rsquo;s strategy reflects a similar direction: not simply open sourcing models, but attempting to build a complete, controllable AI tech stack, from computing power to frameworks, platforms, and ecosystems. This is a slower, heavier, but more infrastructure-competitive path.&lt;/p&gt;
&lt;h2 id="realistic-assessment-the-starting-point-of-ai-infrastructure-competition"&gt;Realistic Assessment: The Starting Point of AI Infrastructure Competition&lt;/h2&gt;
&lt;p&gt;Taking a longer view, we find an easily overlooked fact:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At the level of AI infrastructure and infra-level open source, there is no settled pattern between China and the US.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The US advantage lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mature engineering culture&lt;/li&gt;
&lt;li&gt;Standard organizations and foundation mechanisms&lt;/li&gt;
&lt;li&gt;High proficiency in openness at the rules layer&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;China&amp;rsquo;s variables include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Huge AI application scenarios&lt;/li&gt;
&lt;li&gt;Extreme demand for computing power and system efficiency&lt;/li&gt;
&lt;li&gt;Ongoing directional adjustments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The real uncertainty is not &amp;ldquo;whether we can catch up,&amp;rdquo; but whether it is possible to gradually open up space for engineering autonomy and standard co-construction while maintaining governance bottom lines.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The &amp;ldquo;points&amp;rdquo; and &amp;ldquo;areas&amp;rdquo; of air quality, model weights and the world of operations—behind these appearances lies not a simple technical route dispute, but how a country finds its own balance between openness, standards, and sovereignty.&lt;/p&gt;
&lt;p&gt;In the AI era, this issue will not disappear, but will become more concrete and more engineering-driven. And this is precisely where there are still opportunities for China&amp;rsquo;s AI infrastructure open source.&lt;/p&gt;</content:encoded></item><item><title>Joining Dynamia: Embarking on a New Journey in AI Native Infrastructure</title><link>https://jimmysong.io/blog/joining-dynamia/</link><pubDate>Wed, 07 Jan 2026 07:49:21 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/joining-dynamia/</guid><description>Joining Dynamia as Open Source Ecosystem VP to drive AI-native infrastructure ecosystem development, transforming compute from hardware consumption to core asset.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Compute governance is the critical bottleneck for AI scaling. From hardware consumption to core asset, this long-undervalued path needs to be redefined.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/joining-dynamia/banner.webp" data-img="https://assets.jimmysong.io/images/blog/joining-dynamia/banner.webp" alt="Figure 1: Dynamia.ai" data-caption="Figure 1: Dynamia.ai"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Dynamia.ai&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="a-new-beginning"&gt;A New Beginning&lt;/h2&gt;
&lt;p&gt;I have officially joined &lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia&lt;/a&gt; as &lt;strong&gt;Open Source Ecosystem VP&lt;/strong&gt;, responsible for the long-term development of the company in open source, technical narrative, and &lt;strong&gt;AI Native Infrastructure&lt;/strong&gt; ecosystem directions.&lt;/p&gt;
&lt;h2 id="why-i-chose-dynamia"&gt;Why I Chose Dynamia&lt;/h2&gt;
&lt;p&gt;I chose to join Dynamia not because it&amp;rsquo;s a company trying to &amp;ldquo;solve all AI problems,&amp;rdquo; but precisely the opposite—it&amp;rsquo;s because Dynamia &lt;strong&gt;focuses intensely on one unavoidable, yet long-undervalued core issue in AI Native Infrastructure&lt;/strong&gt;: compute, especially &lt;strong&gt;Graphics Processing Units&lt;/strong&gt; (GPU), are evolving from &amp;ldquo;technical resources&amp;rdquo; into infrastructure elements that require refined governance and economic management.&lt;/p&gt;
&lt;p&gt;Through years of practice in cloud native, distributed systems, and AI infrastructure (AI Infra), I&amp;rsquo;ve formed a clear judgment: as Large Language Models (LLM) and &lt;strong&gt;AI Agents&lt;/strong&gt; enter the stage of large-scale deployment, the real bottleneck limiting system scalability and sustainability is no longer just model capability itself, but how compute is measured, allocated, isolated, and scheduled, and how a governable, accountable, and optimizable operational mechanism is formed at the system level. From this perspective, the core challenge of AI infrastructure is essentially evolving into a &amp;ldquo;resource governance and Token economy&amp;rdquo; problem.&lt;/p&gt;
&lt;h2 id="about-dynamia-and-hami"&gt;About Dynamia and HAMi&lt;/h2&gt;
&lt;p&gt;Dynamia is an AI-native infrastructure technology company rooted in open source DNA, driving efficiency leaps in heterogeneous compute through technological innovation. Its leading open source project, &lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi&lt;/a&gt; (Heterogeneous AI Computing Virtualization Middleware), is a &lt;strong&gt;Cloud Native Computing Foundation&lt;/strong&gt; (CNCF) sandbox project providing GPU, NPU and other heterogeneous device virtualization, sharing, isolation, and topology-aware scheduling capabilities, widely adopted by 50+ enterprises and institutions.&lt;/p&gt;
&lt;h2 id="dynamias-technical-approach"&gt;Dynamia&amp;rsquo;s Technical Approach&lt;/h2&gt;
&lt;p&gt;In this context, Dynamia&amp;rsquo;s technical approach—starting from &lt;strong&gt;the GPU layer, which is the most expensive, scarcest, and least unified abstraction layer in AI systems&lt;/strong&gt;, treating compute as a foundational resource that can be measured, partitioned, scheduled, governed, and even &amp;ldquo;tokenized&amp;rdquo; for refined accounting and optimization—aligns highly with my long-term judgment on AI-native infrastructure.&lt;/p&gt;
&lt;p&gt;This path doesn&amp;rsquo;t use &amp;ldquo;model capabilities&amp;rdquo; or &amp;ldquo;application innovation&amp;rdquo; as selling points in the short term, nor is it easily packaged into simple stories. However, with rising compute costs, heterogeneous accelerators becoming the norm, and AI systems moving toward multi-tenant and large-scale operations, these infrastructure-level capabilities are gradually becoming prerequisites for the establishment and expansion of AI systems.&lt;/p&gt;
&lt;h2 id="future-focus"&gt;Future Focus&lt;/h2&gt;
&lt;p&gt;As Dynamia&amp;rsquo;s Open Source Ecosystem VP, I will focus on &lt;strong&gt;technical narrative of AI-native infrastructure, open source ecosystem building, and global developer collaboration&lt;/strong&gt;, promoting compute from &amp;ldquo;hardware resource being consumed&amp;rdquo; to &lt;strong&gt;governable, measurable, and optimizable AI infrastructure core asset&lt;/strong&gt;, laying the foundation for the scaling and sustainable evolution of AI systems in the next stage.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Joining Dynamia is an important milestone in my career and a concrete action demonstrating my long-term optimism about AI-native infrastructure. Compute governance is not a short-term trend that yields quick results, but an infrastructure proposition that cannot be bypassed for AI large-scale deployment. I look forward to exploring, building, and landing solutions on this long-undervalued path with global developers.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dynamia.ai" target="_blank" rel="noopener"&gt;Dynamia Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Project-HAMi/HAMi" target="_blank" rel="noopener"&gt;HAMi - Heterogeneous AI Computing Virtualization Middleware (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Running Parallel AI Agents on My Mac: Hands-On with Verdent's Standalone App</title><link>https://jimmysong.io/blog/verdent-standalone-app-parallel-agents/</link><pubDate>Sun, 04 Jan 2026 02:25:48 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/verdent-standalone-app-parallel-agents/</guid><description>A hands-on experience with Verdent&amp;#39;s standalone Mac app, exploring how parallel AI agents, isolated workspaces, and task-oriented workflows change real-world development.</description><content:encoded>
&lt;p&gt;I&amp;rsquo;ve been spending more time recently experimenting with vibe coding tools on real projects, not demos. One of those projects is my own website, where I constantly tweak content structure, navigation, and layout.&lt;/p&gt;
&lt;p&gt;During this process, I started using &lt;a href="https://verdent.ai" target="_blank" rel="noopener"&gt;Verdent&amp;rsquo;s standalone Mac app&lt;/a&gt; more seriously. What stood out was not any single feature, but how different the experience felt compared to traditional AI coding tools.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-standalone-app-ui.webp" data-img="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-standalone-app-ui.webp" alt="Figure 1: Verdent Standalone App UI" data-caption="Figure 1: Verdent Standalone App UI"
width="3836"
height="2240"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Verdent Standalone App UI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Verdent doesn&amp;rsquo;t behave like an assistant waiting for instructions. It behaves more like an environment where work happens in parallel.&lt;/p&gt;
&lt;h2 id="a-different-starting-point-tasks-not-chats"&gt;A Different Starting Point: Tasks, Not Chats&lt;/h2&gt;
&lt;p&gt;Most AI coding tools begin with a conversation. Verdent begins with &lt;strong&gt;tasks&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When I opened my website repository in the Verdent app, I didn&amp;rsquo;t start with a long prompt. I created multiple tasks directly: one to rethink navigation and SEO structure, another to explore homepage layout improvements, and a third to review existing content organization.&lt;/p&gt;
&lt;p&gt;Each task immediately spun up its own agent and workspace. From the beginning, the app encouraged me to think in parallel, the same way I normally would when sketching ideas on paper or jumping between files.&lt;/p&gt;
&lt;p&gt;This framing alone changes how you work.&lt;/p&gt;
&lt;h2 id="built-for-multitasking-without-losing-context"&gt;Built for Multitasking, Without Losing Context&lt;/h2&gt;
&lt;p&gt;Switching contexts is unavoidable in real development work. What usually breaks is continuity.&lt;/p&gt;
&lt;p&gt;Verdent handles this well. Each task preserves its full context independently. I could stop one task mid-way, switch to another, and come back later without re-explaining the problem or reloading files.&lt;/p&gt;
&lt;p&gt;For example, while one agent was analyzing my site&amp;rsquo;s navigation structure, another was exploring layout options. I moved between them freely. Nothing was lost. Each agent remembered exactly what it was doing.&lt;/p&gt;
&lt;p&gt;This feels closer to how developers think than how chat-based tools operate.&lt;/p&gt;
&lt;h2 id="safe-parallel-coding-with-workspaces"&gt;Safe Parallel Coding with Workspaces&lt;/h2&gt;
&lt;p&gt;Parallel work only becomes truly safe when code changes are isolated. When parallelism moves from discussion to actual code modification, risk management becomes essential.&lt;/p&gt;
&lt;p&gt;Verdent solves this with &lt;strong&gt;Workspaces&lt;/strong&gt;. Each workspace is an isolated, independent code environment with its own change history, commit log, and branches. This isn&amp;rsquo;t just about separation—it&amp;rsquo;s about making concurrent code changes manageable.&lt;/p&gt;
&lt;p&gt;What this means in practice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple tasks can write code simultaneously&lt;/li&gt;
&lt;li&gt;Changes remain isolated from each other&lt;/li&gt;
&lt;li&gt;If conflicts arise, they&amp;rsquo;re visible and cleanly resolvable&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I intentionally let different agents operate on overlapping parts of my project: one modifying Markdown content and links, another adjusting CSS and layout logic. Both ran in parallel. No conflicts emerged. Later, I reviewed the diffs from each workspace and merged only what made sense.&lt;/p&gt;
&lt;p&gt;This kind of isolation removes significant anxiety from AI-assisted coding. You stop worrying about breaking things and start experimenting more freely, knowing that each change exists in its own contained environment.&lt;/p&gt;
&lt;h2 id="parallel-agent-execution-feels-like-delegation"&gt;Parallel Agent Execution Feels Like Delegation&lt;/h2&gt;
&lt;p&gt;Parallelism doesn’t mean that all agents complete the same phase of work at the same time—instead, by isolating and overlapping phases, what was once a strictly sequential process is compressed into a more efficient, collaborative mode.&lt;/p&gt;
&lt;p&gt;In Verdent, each agent runs in its own workspace, essentially an automatically managed branch or worktree. In practice, I often create multiple tasks with different responsibilities for the same requirement, such as planning, implementation, and review. But this doesn’t mean they all complete the same phase simultaneously.&lt;/p&gt;
&lt;p&gt;These tasks are triggered as needed, each running for a period and producing clear artifacts as boundaries for collaboration. The planning task generates planning documents or constraint specifications; the implementation task advances code changes based on those documents and produces diffs; the review task, according to the established planning goals and audit criteria, performs staged reviews of the generated changes. By overlapping phases around artifacts, the originally strict sequential process is compressed into a workflow that more closely resembles team collaboration.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The value of splitting into multiple tasks is not parallel execution, but parallel cognition and clear collaboration boundaries.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While it’s technically possible to put multiple roles into a single task, this causes planning, implementation, and review to share the same context, which weakens role isolation and the auditability of results.&lt;/p&gt;
&lt;h2 id="configurability-and-design-trade-offs"&gt;Configurability and Design Trade-offs&lt;/h2&gt;
&lt;p&gt;Beyond the workflow model itself, Verdent exposes a surprisingly rich set of configurable capabilities.&lt;/p&gt;
&lt;p&gt;It allows users to customize MCP settings, define subagents with configurable prompts, and create reusable commands via slash (&lt;code&gt;/&lt;/code&gt;) shortcuts. Personal rules can be written to influence agent behavior and response style, and command-level permissions can be configured to enforce basic security boundaries. Verdent also supports multiple mainstream foundation models, including GPT, Claude, Gemini, and K2. For users who prefer a lightweight coding experience without a full IDE, Verdent offers DiffLens as an alternative review-oriented interface. Both &lt;a href="https://www.verdent.ai/pricing" target="_blank" rel="noopener"&gt;subscription-based and credit-based pricing models&lt;/a&gt; are supported.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-settings.webp" data-img="https://assets.jimmysong.io/images/blog/verdent-standalone-app-parallel-agents/verdent-settings.webp" alt="Figure 2: Verdent Settings" data-caption="Figure 2: Verdent Settings"
width="2780"
height="1648"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Verdent Settings&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;That said, Verdent makes a clear set of trade-offs. It is not built around tab-based code completion, nor does it offer a plugin system. If it did, it would start to resemble a traditional IDE - which does not seem to be its goal. Verdent is not designed for direct, fine-grained code manipulation; most changes are mediated through conversational tasks and agent-driven edits. This makes the experience clean and focused, but it also means that for large, highly complex codebases, Verdent may function better as a complementary orchestration layer rather than a full-time development environment.&lt;/p&gt;
&lt;h2 id="where-verdent-fits-today"&gt;Where Verdent Fits Today&lt;/h2&gt;
&lt;p&gt;There are many AI-assisted coding tools emerging right now. Some focus on smarter editors, others on faster generation.&lt;/p&gt;
&lt;p&gt;Verdent feels different because it focuses on &lt;strong&gt;orchestration&lt;/strong&gt;, not just assistance.&lt;/p&gt;
&lt;p&gt;It doesn&amp;rsquo;t try to replace your editor. It sits one level above, coordinating planning, execution, and review across multiple agents.&lt;/p&gt;
&lt;p&gt;That makes it particularly suitable for exploratory work, refactoring, and early-stage design - exactly the kind of work I was doing on my website.&lt;/p&gt;
&lt;h2 id="final-thoughts"&gt;Final Thoughts&lt;/h2&gt;
&lt;p&gt;Using Verdent&amp;rsquo;s standalone app didn&amp;rsquo;t just speed things up. It changed how I structured work.&lt;/p&gt;
&lt;p&gt;Instead of doing everything sequentially, I started thinking in parallel again - and letting the system support that way of thinking.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://verdent.ai" target="_blank" rel="noopener"&gt;Verdent&lt;/a&gt; feels less like an AI feature and more like an environment that assumes AI is already part of how development happens.&lt;/p&gt;
&lt;p&gt;For developers experimenting with AI-native workflows, that shift is worth paying attention to.&lt;/p&gt;</content:encoded></item><item><title>2025 Annual Review: The Transformation Journey from Cloud Native to AI Native</title><link>https://jimmysong.io/blog/2025-annual-review/</link><pubDate>Wed, 31 Dec 2025 10:02:01 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/2025-annual-review/</guid><description>A look back at the major changes in 2025: shifting from Cloud Native to AI Native Infrastructure, AI tool ecosystem, and major website improvements.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The waves of technology keep evolving; only by actively embracing change can we continue to create value. In 2025, I chose to move from Cloud Native to AI Native—this year marked a key turning point for personal growth and system reinvention.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;2025 was a turning point for me. This year, I not only changed my technical direction but also the way I approach problems. Moving from Cloud Native infrastructure to AI Native Infrastructure was not just a migration of content, but an upgrade in mindset.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/2025-annual-review/banner.webp" data-img="https://assets.jimmysong.io/images/blog/2025-annual-review/banner.webp" alt="Figure 1: Farewell 2025!" data-caption="Figure 1: Farewell 2025!"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Farewell 2025!&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This year, I conducted a large-scale refactoring of the website and systematically organized the content. Beyond the technical improvements, I want to share my thoughts and changes throughout the year.&lt;/p&gt;
&lt;h2 id="a-bold-shift-embracing-the-ai-native-era"&gt;A Bold Shift: Embracing the AI Native Era&lt;/h2&gt;
&lt;p&gt;At the beginning of 2025, I made an important decision: to reposition myself from a Cloud Native Evangelist to an AI Infrastructure Architect. This was not just a change in title, but a strategic transformation after careful consideration.&lt;/p&gt;
&lt;p&gt;As I witnessed the surge of AI technologies and the rise of Agent-based applications reshaping software, I realized that clinging to the boundaries of Cloud Native might mean missing an era. So, I systematically adjusted the website’s content structure, shifting the focus toward AI Native Infrastructure.&lt;/p&gt;
&lt;p&gt;This transformation was not about abandoning the past, but extending forward from the foundation of Cloud Native. Classic content like Kubernetes and Istio remains and is continuously updated, but new topics such as AI Agent and the AI OSS landscape have been added, forming a more complete knowledge map.&lt;/p&gt;
&lt;h2 id="content-creation-from-technical-details-to-ecosystem-perspective"&gt;Content Creation: From Technical Details to Ecosystem Perspective&lt;/h2&gt;
&lt;h3 id="ai-agent-building-systematic-knowledge"&gt;AI Agent: Building Systematic Knowledge&lt;/h3&gt;
&lt;p&gt;Agents represent a major evolution in software for the AI era. When I tried to understand Agent design principles, I found fragmented information everywhere but lacked a systematic knowledge base.&lt;/p&gt;
&lt;p&gt;So I created content that analyzes the Agent context lifecycle and control loop mechanisms, summarizing several proven architectural patterns. To make complex knowledge easier to digest, I organized it into logical sections so readers can learn step by step.&lt;/p&gt;
&lt;h3 id="ai-tool-ecosystem-mapping-the-open-source-landscape"&gt;AI Tool Ecosystem: Mapping the Open Source Landscape&lt;/h3&gt;
&lt;p&gt;AI tools and frameworks are emerging rapidly, with new projects appearing daily. To help readers quickly grasp the ecosystem, I built a comprehensive AI OSS database.&lt;/p&gt;
&lt;p&gt;This database covers everything from Agent frameworks to development tools and deployment services. I not only included active projects but also established an archive mechanism, preserving detailed information on over 150 historical projects. More importantly, I developed a scoring system to objectively evaluate projects across dimensions like quality and sustainability, helping readers decide which tools are worth investing time in.&lt;/p&gt;
&lt;h3 id="blogging-capturing-technology-trends-faster"&gt;Blogging: Capturing Technology Trends Faster&lt;/h3&gt;
&lt;p&gt;In 2025, I wrote over 120 blog posts. Compared to previous years, these articles focused more on observing and reflecting on technology trends, rather than just technical tutorials.&lt;/p&gt;
&lt;p&gt;I started paying attention to deeper questions: How will AI infrastructure evolve? What does Beijing’s open source initiative mean for the AI industry? What ripple effects might a tech acquisition trigger? These articles allowed me and my readers to not only see &amp;ldquo;what&amp;rdquo; technology is, but also &amp;ldquo;why&amp;rdquo; and &amp;ldquo;what’s next.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="user-experience-making-knowledge-easier-to-discover-and-consume"&gt;User Experience: Making Knowledge Easier to Discover and Consume&lt;/h2&gt;
&lt;p&gt;No matter how good the content is, if it can’t be easily found and read, its value is greatly diminished. In 2025, I invested significant effort into website functionality, with one goal: to provide readers with a smoother reading experience.&lt;/p&gt;
&lt;h3 id="comprehensive-search-upgrade"&gt;Comprehensive Search Upgrade&lt;/h3&gt;
&lt;p&gt;As the volume of content grew, the original search function could no longer meet demand. I redesigned the search system to support fuzzy search and result scoring, and optimized index loading performance. More importantly, the new search interface is more user-friendly, supporting keyboard navigation and category filtering so users can find what they want faster.&lt;/p&gt;
&lt;h3 id="multi-device-experience-optimization"&gt;Multi-Device Experience Optimization&lt;/h3&gt;
&lt;p&gt;Mobile reading experience has improved significantly. I refactored the mobile navigation and table of contents, making reading on phones much smoother. Dark mode is now more refined, fixing several display issues and ensuring images and diagrams look good on dark backgrounds.&lt;/p&gt;
&lt;h3 id="efficiency-revolution-in-content-distribution"&gt;Efficiency Revolution in Content Distribution&lt;/h3&gt;
&lt;p&gt;A major change was optimizing the WeChat Official Account publishing workflow. Previously, publishing website content to WeChat required manual handling of many details; now, it’s almost one-click export. This workflow automatically processes images, metadata, styles, and all details, reducing a half-hour task to just a few minutes.&lt;/p&gt;
&lt;p&gt;Additionally, I added a glossary feature for technical term highlighting and tooltips; improved SEO and social sharing metadata; and cleaned up outdated content. These seemingly minor improvements quietly enhance the user experience.&lt;/p&gt;
&lt;h2 id="content-evolution-more-dimensional-knowledge-expression"&gt;Content Evolution: More Dimensional Knowledge Expression&lt;/h2&gt;
&lt;p&gt;Looking back at content creation in 2025, I found clear changes in several dimensions.&lt;/p&gt;
&lt;h3 id="from-tutorials-to-observations"&gt;From Tutorials to Observations&lt;/h3&gt;
&lt;p&gt;Early content leaned toward technical tutorials and practical guides, showing &amp;ldquo;how to do.&amp;rdquo; This year, I focused more on &amp;ldquo;why&amp;rdquo; and &amp;ldquo;what are the trends.&amp;rdquo; I wrote more technology trend analyses, ecosystem maps, and in-depth case studies. These may not directly teach you how to use an API, but they help you understand the direction of technological evolution.&lt;/p&gt;
&lt;h3 id="from-chinese-to-bilingual"&gt;From Chinese to Bilingual&lt;/h3&gt;
&lt;p&gt;AI is a global wave and cannot be limited to the Chinese-speaking world. In 2025, I wrote bilingual documentation for almost all new AI tools, and important blog posts also have English versions. This increased the workload, but allowed the content to reach a broader audience.&lt;/p&gt;
&lt;h3 id="from-text-to-multimedia"&gt;From Text to Multimedia&lt;/h3&gt;
&lt;p&gt;Text is efficient, but not all knowledge is best expressed in words. This year, I used many architecture and schematic diagrams to explain complex concepts, adding 59 new charts. These visual elements lower the barrier to understanding, making abstract concepts more intuitive. I also optimized image display in dark mode to ensure consistent visual experience.&lt;/p&gt;
&lt;h2 id="development-approach-embracing-ai-assisted-programming"&gt;Development Approach: Embracing AI-Assisted Programming&lt;/h2&gt;
&lt;p&gt;2025 was not only a year of shifting content themes toward AI, but also a year of deep practice in AI-assisted programming.&lt;/p&gt;
&lt;p&gt;I developed a VS Code plugin and created many prompts to automate repetitive tasks. I experimented with various AI programming tools and settled on a toolchain that suits me. I even migrated the website to Cloudflare Pages and used its edge computing services to develop a chatbot. These practices greatly improved development efficiency, giving me more time to focus on thinking and creating rather than mechanical coding.&lt;/p&gt;
&lt;p&gt;This made me realize: AI will not replace developers, but developers who use AI well will replace those who do not. I also shared more insights to help others master AI-assisted programming.&lt;/p&gt;
&lt;h2 id="looking-ahead-to-2026-keep-moving-forward"&gt;Looking Ahead to 2026: Keep Moving Forward&lt;/h2&gt;
&lt;p&gt;Looking back at 2025, the site underwent a profound transformation—from a Cloud Native tech blog to an AI infrastructure knowledge base. But this is just the beginning, not the end.&lt;/p&gt;
&lt;p&gt;Looking forward to 2026, I plan to continue deepening in several areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Enhancing the knowledge system&lt;/strong&gt;: Continue to supplement GPU infrastructure and AI Agent content, especially practical cases and performance tuning knowledge.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tracking ecosystem evolution&lt;/strong&gt;: AI tools and frameworks iterate rapidly; I need to keep up with this fast-changing ecosystem and update content in a timely manner.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deepening engineering practice&lt;/strong&gt;: Share more practical AI engineering experience to help readers turn theory into practice.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Exploring knowledge connections&lt;/strong&gt;: Consider building a knowledge graph to connect different content sections, providing smarter navigation and recommendations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;2025 was a year of change and growth. From Cloud Native to AI Native, from technical practice to ecosystem observation, both the content and functionality of the site have made qualitative leaps.&lt;/p&gt;
&lt;p&gt;What makes me happiest is that this transformation allowed me and my readers to stand at the forefront of the technology wave. We are not just learning new technologies, but thinking about how technology changes the world and the way we write software.&lt;/p&gt;
&lt;p&gt;The waves of technology keep evolving; only by actively embracing change can we continue to create value. Thank you to every reader for your companionship and support. I look forward to sharing more insights and practices in 2026.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Further Reading&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/ai/"&gt;AI OSS Landscape&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/"&gt;2025 Blog Posts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>The Butterfly Effect After Manus Was Acquired by Meta</title><link>https://jimmysong.io/blog/manus-meta-acquisition-butterfly-effect/</link><pubDate>Tue, 30 Dec 2025 03:30:51 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/manus-meta-acquisition-butterfly-effect/</guid><description>Manus&amp;#39;s acquisition by Meta sparked polarized opinions. This article explores the butterfly effect in AI applications and key lessons for entrepreneurs on growth strategies.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The success or failure of AI applications often lies not in the technology itself, but in the ability to scale delivery and create a closed loop.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/manus-meta-acquisition-butterfly-effect/banner.webp" data-img="https://assets.jimmysong.io/images/blog/manus-meta-acquisition-butterfly-effect/banner.webp" alt="Figure 1: The Butterfly Effect After Manus Was Acquired by Meta" data-caption="Figure 1: The Butterfly Effect After Manus Was Acquired by Meta"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: The Butterfly Effect After Manus Was Acquired by Meta&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="when-those-who-discuss-it-are-not-those-who-pay-for-it"&gt;When &amp;ldquo;Those Who Discuss It&amp;rdquo; Are Not &amp;ldquo;Those Who Pay for It&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;On December 30, 2025, a piece of news went viral: Manus was acquired by Meta for billions of dollars (&lt;a href="https://manus.im/blog/manus-joins-meta-for-next-era-of-innovation" target="_blank" rel="noopener"&gt;Manus Joins Meta for Next Era of Innovation&lt;/a&gt;). This startup, founded in China and under pressure from tech giants since its inception, completed a whirlwind journey in less than a year—from explosive growth, relocating to Singapore, to being acquired by a global giant.&lt;/p&gt;
&lt;p&gt;According to Manus&amp;rsquo;s official statement, its products and subscriptions will continue to be available via the app and website, and the company will remain operational in Singapore. The team will join Meta to provide general Agent capabilities for Meta&amp;rsquo;s consumer and enterprise products (including Meta AI).&lt;/p&gt;
&lt;p&gt;Rather than focusing on &amp;ldquo;who won,&amp;rdquo; I&amp;rsquo;m more interested in the chain reaction this event triggered: it activated completely opposite judgment systems among different groups, and this split is reshaping the growth paths and strategies for AI applications and startups.&lt;/p&gt;
&lt;h2 id="two-public-opinion-arenas-blessings-and-doubts-coexist"&gt;Two Public Opinion Arenas: Blessings and Doubts Coexist&lt;/h2&gt;
&lt;p&gt;After Manus was acquired, the mainstream sentiment in social circles was one of congratulations and excitement. Many saw it as a stellar example of a Chinese team going global—achieving remarkable results in the most competitive field in a very short time.&lt;/p&gt;
&lt;p&gt;Meanwhile, the comment sections of public accounts became &amp;ldquo;venting valves for counter-narratives,&amp;rdquo; with skepticism centering on three main points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether the technology has real barriers (e.g., &amp;ldquo;there are countless similar products,&amp;rdquo; &amp;ldquo;it&amp;rsquo;s not hard for big companies to build their own&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Valuation and bubble concerns (e.g., &amp;ldquo;another case of the AI bubble&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;Distrust in the buyer&amp;rsquo;s judgment (e.g., &amp;ldquo;giants making desperate bets,&amp;rdquo; &amp;ldquo;history repeating itself&amp;rdquo;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This divergence isn&amp;rsquo;t about who understands AI better, but about different evaluation frameworks: social circles focus on &amp;ldquo;trajectory and outcome,&amp;rdquo; while comment sections focus on &amp;ldquo;legitimacy and worthiness.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="where-does-the-100m-arr-come-from-the-target-users-arent-in-our-social-circles"&gt;Where Does the $100M ARR Come From: The Target Users Aren&amp;rsquo;t in Our Social Circles&lt;/h2&gt;
&lt;p&gt;Many people are impressed by Manus&amp;rsquo;s marketing buzz and controversies, which can lead to skepticism. But if it achieved a &amp;ldquo;strict $100M ARR&amp;rdquo; in 10 months, one fact is clear: &lt;strong&gt;its revenue doesn&amp;rsquo;t depend on broad consensus, but comes from a highly concentrated group of global users with strong willingness to pay.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Manus&amp;rsquo;s core user profile is closer to &amp;ldquo;individuals as production units,&amp;rdquo; including freelancers, indie developers, independent researchers, and key deliverers in small and medium businesses. They don&amp;rsquo;t care about debates over &amp;ldquo;wrapping&amp;rdquo; or not; they care about &amp;ldquo;can I deliver end-to-end tasks,&amp;rdquo; and &amp;ldquo;can this help me hire one less person, work fewer late nights, or avoid juggling ten tools.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This leads to a counterintuitive phenomenon: &lt;strong&gt;those who discuss the most may not pay, while those who pay steadily are often silent.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For these users, tools are not identity badges—they are profit levers.&lt;/p&gt;
&lt;h2 id="three-lessons-for-entrepreneurs-the-growth-paradigm-in-the-ai-application-era-has-changed"&gt;Three Lessons for Entrepreneurs: The Growth Paradigm in the AI Application Era Has Changed&lt;/h2&gt;
&lt;p&gt;Based on the above, the Manus case offers three lessons for entrepreneurs:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Growth No Longer Equals Positive Reviews&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;AI applications can commercialize first and build consensus later. Public opinion can remain divided for a long time, but cash flow doesn&amp;rsquo;t wait for unified recognition.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Heavy Marketing&amp;rdquo; Is Becoming a Capability, Not a Stigma&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As foundational models and capabilities spread rapidly, differentiation is quickly erased. Being seen, understood, and paid for is itself part of the moat. Not all marketing deserves respect, but &amp;ldquo;distribution and mindshare&amp;rdquo; have become unavoidable battlegrounds for AI applications.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Globalization Is No Longer a Bonus, but May Be a Survival Strategy&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From payment willingness, compliance boundaries, talent density to valuation systems, market structure means many teams &amp;ldquo;can only complete the loop overseas.&amp;rdquo; It&amp;rsquo;s not romantic, but it&amp;rsquo;s reality.&lt;/p&gt;
&lt;h2 id="a-personal-reflection"&gt;A Personal Reflection&lt;/h2&gt;
&lt;p&gt;As someone long engaged in cloud native and AI infrastructure, I&amp;rsquo;m used to evaluating products by their &amp;ldquo;technical barriers.&amp;rdquo; But cases like Manus remind me: at the AI application layer, barriers may not first appear in models or code, but often in &lt;strong&gt;organizational speed, productization capability, delivery loop, and distribution efficiency&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When a system can reliably turn &amp;ldquo;capability&amp;rdquo; into &amp;ldquo;results,&amp;rdquo; it has built a commercial moat—even if its tech stack doesn&amp;rsquo;t meet outsiders&amp;rsquo; ideals of &amp;ldquo;purity.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The biggest butterfly effect of Manus being acquired by Meta may not be the deal itself, but making more entrepreneurs realize: &lt;strong&gt;in the AI era, the winning move is shifting from &amp;ldquo;what model you use&amp;rdquo; to &amp;ldquo;whether you can deliver results at scale.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The acquisition of Manus by Meta is not just a convergence of capital and technology, but also a microcosm of the changing growth paradigm in the AI application era. For entrepreneurs, understanding and mastering &amp;ldquo;user structure,&amp;rdquo; &amp;ldquo;distribution capability,&amp;rdquo; and &amp;ldquo;global closed loops&amp;rdquo; will be key to future competition.&lt;/p&gt;</content:encoded></item><item><title>AI Infra Open Source in China: Analysis of Beijing and Shanghai's Plans</title><link>https://jimmysong.io/blog/beijing-open-source-plan-ai-infra-analysis/</link><pubDate>Thu, 25 Dec 2025 10:01:13 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/beijing-open-source-plan-ai-infra-analysis/</guid><description>Beijing and Shanghai&amp;#39;s open source plans reveal opportunities and challenges for China&amp;#39;s AI infrastructure, balancing technology and governance.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Institutionalized open source marks a new starting point for China&amp;rsquo;s AI Infra, but true breakthroughs and risks lie in the engineering and governance details.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="perspective-on-beijing-and-shanghais-open-source-plans"&gt;Perspective on Beijing and Shanghai&amp;rsquo;s Open Source Plans&lt;/h2&gt;
&lt;p&gt;Using the simultaneous release of open source ecosystem plans by Beijing and Shanghai as a lens, and drawing on China&amp;rsquo;s past foundation practices and international open source governance experience, this article explores the real opportunities, structural constraints, and potential risks as AI Infrastructure (AI Infra, Artificial Intelligence Infrastructure) enters a new phase of institutionalized open source.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/beijing-open-source-plan-ai-infra-analysis/banner.webp" data-img="https://assets.jimmysong.io/images/blog/beijing-open-source-plan-ai-infra-analysis/banner.webp" alt="Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans" data-caption="Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans"
width="1536"
height="1024"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Beijing and Shanghai successively launch open source ecosystem construction plans&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="why-compare-beijing-and-shanghai-together"&gt;Why Compare Beijing and Shanghai Together&lt;/h2&gt;
&lt;p&gt;It is rare for me to write an article solely because of a local policy document. However, during Christmas, both Beijing and Shanghai&amp;rsquo;s Bureaus of Economy and Information Technology released their respective open source ecosystem construction plans:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://mp.weixin.qq.com/s/9YEL1HORWatsol3nRT596w" target="_blank" rel="noopener"&gt;Building an Open Source Innovation Highland! Beijing Releases Open Source Ecosystem Construction Implementation Plan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mp.weixin.qq.com/s/QZl66fUllKiePwQ7euhiGQ" target="_blank" rel="noopener"&gt;Shanghai&amp;rsquo;s Implementation Plan for Strengthening the Open Source System | Infographic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This time, the fact that both cities released their plans on the same day sends a signal worth serious attention: China is attempting to advance open source in a more systematic and institutionalized way, especially regarding open source capabilities related to AI Infra.&lt;/p&gt;
&lt;p&gt;If you only look at Beijing&amp;rsquo;s plan, it is easy to interpret it as a local industrial policy upgrade. But when you consider both Beijing and Shanghai&amp;rsquo;s plans together, it looks more like a clearly defined &amp;ldquo;dual-center structure.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The question is no longer whether to develop open source, but:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In the AI era, what institutional forms, engineering paths, and governance models will open source take?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="open-source-as-industrial-infrastructure-engineering"&gt;Open Source as &amp;ldquo;Industrial Infrastructure Engineering&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Both Beijing and Shanghai&amp;rsquo;s plans reflect a highly consistent judgment:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Open source is no longer seen as a spontaneous community activity, but as an industrial infrastructure capability that requires systematic construction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is especially evident in the field of AI Infra.&lt;/p&gt;
&lt;p&gt;Issues such as computing power scheduling, model evaluation, toolchains, data elements, license compliance, and supply chain security—previously hidden in &amp;ldquo;engineering details&amp;rdquo;—are now systematically incorporated into policy language for the first time. This at least shows that decision-makers have realized:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI competition is not only about model parameter scale&lt;/li&gt;
&lt;li&gt;It is even more about toolchains, infrastructure, evaluation systems, and engineering capabilities&lt;/li&gt;
&lt;li&gt;These capabilities are naturally more suitable for building public foundations through open source&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this respect, Beijing and Shanghai are highly aligned.&lt;/p&gt;
&lt;h2 id="two-open-source-paths-infra-vs-platform"&gt;Two Open Source Paths: Infra vs. Platform&lt;/h2&gt;
&lt;p&gt;When we zoom in, the differences between the two plans become clear.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Beijing: &amp;ldquo;Foundation-Oriented&amp;rdquo; Open Source Path for AI Infra&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Beijing&amp;rsquo;s plan focuses on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Heterogeneous computing power scheduling&lt;/li&gt;
&lt;li&gt;Model evaluation toolchains&lt;/li&gt;
&lt;li&gt;Data elements and data governance&lt;/li&gt;
&lt;li&gt;RISC-V software-hardware collaboration&lt;/li&gt;
&lt;li&gt;SBOM, license compatibility, open source compliance&lt;/li&gt;
&lt;li&gt;Supply chain security and industrial resilience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a typical perspective of &amp;ldquo;treating AI as an infrastructure problem.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It is less concerned with the number of projects or community size, and more with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether reusable engineering capabilities can be formed&lt;/li&gt;
&lt;li&gt;Whether these can be trusted by industry and government over the long term&lt;/li&gt;
&lt;li&gt;Whether they can stand up to scrutiny in terms of security, compliance, and governance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To some extent, Beijing is answering the question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;How can open source become a &amp;ldquo;governable, auditable, and scalable public capability&amp;rdquo;?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Shanghai: &amp;ldquo;Scale and Internationalization&amp;rdquo; Path for AI Platform&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In contrast, Shanghai&amp;rsquo;s plan has a different focus:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Building an international open source community for artificial intelligence&lt;/li&gt;
&lt;li&gt;Covering the entire platform chain from development, training, testing, hosting, to operation&lt;/li&gt;
&lt;li&gt;Overseas sites, multilingual support, international activities&lt;/li&gt;
&lt;li&gt;Resource linkage through computing vouchers and model vouchers&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Open source platform first release / global simultaneous release&amp;rdquo; dual-release mechanism&lt;/li&gt;
&lt;li&gt;Clear targets for community, enterprise, and developer scale&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Shanghai cares more about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How open source can achieve scale effects&lt;/li&gt;
&lt;li&gt;How it can support the growth of commercial enterprises&lt;/li&gt;
&lt;li&gt;How it can be seen and adopted globally&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a path of &amp;ldquo;treating open source as a global digital product and platform capability.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="together-a-complete-but-tension-filled-structure"&gt;Together: A Complete but Tension-Filled Structure&lt;/h2&gt;
&lt;p&gt;When viewed together, Beijing and Shanghai&amp;rsquo;s plans form a more complete picture:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Beijing is responsible for &amp;ldquo;making open source solid,&amp;rdquo; while Shanghai is responsible for &amp;ldquo;taking open source global.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Structurally, this is a clear division of labor:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Beijing focuses on institutions, governance, and foundational capabilities&lt;/li&gt;
&lt;li&gt;Shanghai focuses on community, commercialization, and international communication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These two paths are not in conflict; in theory, they are even complementary. The real question is whether they can form positive feedback in practice, rather than operating in silos.&lt;/p&gt;
&lt;h2 id="cautious-attitude-toward-institutionalized-platformized-open-source"&gt;Cautious Attitude Toward &amp;ldquo;Institutionalized, Platformized Open Source&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Precisely because both plans are so &amp;ldquo;systematic,&amp;rdquo; I am even more cautious.&lt;/p&gt;
&lt;p&gt;The reason is simple: this is not China&amp;rsquo;s first attempt to promote open source through foundations, associations, or platforms.&lt;/p&gt;
&lt;p&gt;Over the past decade, we have seen similar paths repeatedly, and recurring structural problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The difficulty of establishing neutrality and multi-party trust is extremely high&lt;/li&gt;
&lt;li&gt;There is a huge gap between showcase metrics (quantity, activities, certifications) and ecosystem strength&lt;/li&gt;
&lt;li&gt;Commercialization and long-term maintenance mechanisms are hard to sustain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These problems will not disappear just because the plans are more comprehensive.&lt;/p&gt;
&lt;h2 id="four-risks-to-watch-under-the-dual-plans"&gt;Four Risks to Watch Under the Dual Plans&lt;/h2&gt;
&lt;p&gt;If we are to &amp;ldquo;listen to their words and watch their actions,&amp;rdquo; I would focus on the following four risks:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Will Metrics Hijack Engineering Reality&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When &amp;ldquo;internationally influential projects,&amp;rdquo; &amp;ldquo;star projects,&amp;rdquo; and &amp;ldquo;first-release projects&amp;rdquo; become hard metrics, will this induce packaging, migration, and short-term hype, rather than truly solving engineering problems?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Will It Slide Toward Platform Centralism&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The long-term pattern of AI Infra is closer to a model that prioritizes protocols, standards, and interoperability. If it eventually evolves into &amp;ldquo;a few platforms concentrating resources and discourse power,&amp;rdquo; it may be efficient in the short term but will suppress external participation and international collaboration in the long run.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Is Internationalization Underestimated as an &amp;ldquo;Operational Issue&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;True international collaboration is never just about language, sites, or events; it also involves governance structures, compliance boundaries, and supply chain trust.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Will Application Demonstrations Become One-Off Projects&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If &amp;ldquo;first plans&amp;rdquo; and &amp;ldquo;computing vouchers&amp;rdquo; are just procurement tactics without continuous iteration and community feedback mechanisms, the long-term benefit to the ecosystem will be very limited.&lt;/p&gt;
&lt;h2 id="what-are-the-hard-results-of-ai-infra-open-source-after-three-years"&gt;What Are the &amp;ldquo;Hard Results&amp;rdquo; of AI Infra Open Source After Three Years&lt;/h2&gt;
&lt;p&gt;If we review the success of this round of institutionalized open source after three years, I would look for three types of results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether de facto standards and interoperable ecosystems have emerged, including scheduling interfaces, evaluation benchmarks, Agent tool invocation protocols, and observability semantics.&lt;/li&gt;
&lt;li&gt;Whether compliance and supply chain security have become public capabilities—SBOM, license compatibility, vulnerability monitoring—truly productized and service-oriented.&lt;/li&gt;
&lt;li&gt;Whether a sustainable maintenance business mechanism has been established, allowing core maintainers to stay long-term, rather than relying on passion and subsidies.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;If I were to use a North Star metric to measure the success of these plans, it would be the emergence of several outstanding open source commercial companies rooted in China and serving the world.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The open source ecosystem plans of Beijing and Shanghai mark a new phase of institutionalization and engineering for AI Infra open source in China. Over the next three years, the real achievements will not be about meeting targets, but about forming sustainable engineering capabilities, de facto standards, and maintenance mechanisms. Only through continuous participation and practice can open source become the public foundation of AI infrastructure.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jxj.beijing.gov.cn/zwgk/2024zcwj/202512/t20251224_4360437.html" target="_blank" rel="noopener"&gt;Beijing Open Source Ecosystem Construction Implementation Plan (2026–2028) - jxj.beijing.gov.cn&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://mp.weixin.qq.com/s/QZl66fUllKiePwQ7euhiGQ" target="_blank" rel="noopener"&gt;Shanghai&amp;rsquo;s Implementation Plan for Strengthening the Open Source System | Infographic - mp.weixin.qq.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>From 2025 Onwards, Software Engineering Shifts from Code-Centric to Runtime and Cost-Centric</title><link>https://jimmysong.io/blog/software-engineering-shift-runtime-cost-2025/</link><pubDate>Wed, 24 Dec 2025 14:59:11 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/software-engineering-shift-runtime-cost-2025/</guid><description>In 2025, software engineering shifts from code-centric to runtime and cost governance. AI and Agents move complexity to runtime, compute, and budget layers, reshaping engineering value.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;In 2025, the core of software engineering is no longer just about code itself, but about runtime controllability and cost governance. This shift is fundamentally reshaping the industry&amp;rsquo;s underlying logic.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Looking back at 2025, I became increasingly aware that this year was not about &amp;ldquo;code becoming unimportant,&amp;rdquo; but rather that &lt;strong&gt;the value coordinates of engineering have shifted as a whole&lt;/strong&gt;. For more than a decade, software engineering has focused on code quality, architectural evolution, and delivery efficiency. But starting in 2025, the key to system success is shifting—&lt;strong&gt;towards whether the runtime is controllable and whether costs are governable&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is not just a slogan, but a conclusion repeatedly validated by my real-world experiences throughout the year.&lt;/p&gt;
&lt;h2 id="my-2025-from-platform-engineering-to-runtime-challenges"&gt;My 2025: From &amp;ldquo;Platform Engineering&amp;rdquo; to &amp;ldquo;Runtime Challenges&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;In my annual review, I noted a clear change: I spent less time on &amp;ldquo;how to write a good system,&amp;rdquo; and more time on &amp;ldquo;how to keep the system running stably, reliably, and affordably.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This shift in focus is a natural extension of a decade of cloud native evolution.&lt;/p&gt;
&lt;p&gt;The following timeline diagram illustrates how my focus has changed over recent years:
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/focus-shift-timeline-en.svg" data-img="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/focus-shift-timeline-en.svg" alt="Figure 1: My Focus Shift Timeline" data-caption="Figure 1: My Focus Shift Timeline"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: My Focus Shift Timeline&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;My focus shifted from cloud native platform engineering to LLM application engineering, then to AI infrastructure, and finally to Agentic Runtime with governance and cost control.&lt;/p&gt;
&lt;p&gt;When AI workloads truly enter business scenarios, the core challenges engineers face also change:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Are inference, training, and evaluation competing for the same compute pool?&lt;/li&gt;
&lt;li&gt;Is GPU utilization consistently below expectations?&lt;/li&gt;
&lt;li&gt;Does cost scale linearly and uncontrollably with concurrency?&lt;/li&gt;
&lt;li&gt;Does the system have failure isolation and replay capabilities?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These issues go far beyond the code level.&lt;/p&gt;
&lt;h2 id="industry-consensus-ai-is-shifting-the-focus-of-engineering"&gt;Industry Consensus: AI Is Shifting the Focus of Engineering&lt;/h2&gt;
&lt;p&gt;By 2025, an industry consensus is emerging: AI is rewriting software engineering. But the real change is not happening in the IDE or code completion speed—it is reflected in &lt;strong&gt;the migration of engineering complexity&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Previously, complexity was concentrated in code and interfaces, and problems were solved through abstraction, refactoring, and testing.&lt;/p&gt;
&lt;p&gt;Now, complexity has shifted to the runtime, resource, and cost layers, and must be addressed through scheduling, isolation, observability, and governance.&lt;/p&gt;
&lt;p&gt;This is why the same AI tools:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Serve as &amp;ldquo;accelerators&amp;rdquo; for junior engineers&lt;/li&gt;
&lt;li&gt;But act as &amp;ldquo;magnifiers&amp;rdquo; for senior engineers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI tools amplify whether you truly understand how systems run in production.&lt;/p&gt;
&lt;h2 id="why-cost-becomes-a-first-principle"&gt;Why &amp;ldquo;Cost&amp;rdquo; Becomes a First Principle&lt;/h2&gt;
&lt;p&gt;In traditional cloud native systems, low CPU utilization is often just an efficiency issue; but in AI systems, &lt;strong&gt;low GPU utilization is often a cash flow problem&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In 2025, I repeatedly encountered scenarios like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Resources &amp;ldquo;seem insufficient,&amp;rdquo; but utilization is not actually high&lt;/li&gt;
&lt;li&gt;Scaling up to solve queuing issues ends up increasing unit costs&lt;/li&gt;
&lt;li&gt;The system lacks clear budget and quota boundaries, so throttling becomes the only way to stop the bleeding&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The root cause of these phenomena is not model selection, but &lt;strong&gt;the lack of a runtime and cost control plane tailored for AI workloads&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The following flowchart visually illustrates the cyclical relationship between GPU resources and cost pressures:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/gpu-cost-cycle-en.svg" data-img="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/gpu-cost-cycle-en.svg" alt="Figure 2: GPU Resource and Cost Cycle in AI Systems" data-caption="Figure 2: GPU Resource and Cost Cycle in AI Systems"
width="2263"
height="320"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: GPU Resource and Cost Cycle in AI Systems&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In AI systems, limited GPU supply leads to queuing and waiting, which causes throughput to drop. Attempts to solve this through blind scaling only increase unit costs and create budget pressure, ultimately forcing the adoption of finer scheduling and governance strategies.&lt;/p&gt;
&lt;p&gt;Engineering problems ultimately manifest as cost issues.&lt;/p&gt;
&lt;h2 id="the-rise-of-agents-the-real-challenge-is-at-runtime"&gt;The Rise of Agents: The Real Challenge Is at Runtime&lt;/h2&gt;
&lt;p&gt;In 2025, Agent (Intelligent Agent, Agent, Intelligent Agent) became a hot topic; by 2026, it will enter the &amp;ldquo;can it actually run&amp;rdquo; stage.&lt;/p&gt;
&lt;p&gt;The challenge for Agents has never been about &amp;ldquo;how smart they are,&amp;rdquo; but rather:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether there are clear permission and data boundaries&lt;/li&gt;
&lt;li&gt;Whether they run in an isolated execution environment&lt;/li&gt;
&lt;li&gt;Whether they can be observed, evaluated, and replayed&lt;/li&gt;
&lt;li&gt;Whether they are subject to explicit cost and budget constraints&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities form the outline of &lt;strong&gt;Agentic Runtime (Agentic Runtime, Intelligent Agent Runtime)&lt;/strong&gt; that I have been trying to clarify throughout the year.&lt;/p&gt;
&lt;p&gt;The following flowchart shows the core capability layers of Agentic Runtime:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/agentic-runtime-layers-en.svg" data-img="https://assets.jimmysong.io/images/blog/software-engineering-shift-runtime-cost-2025/agentic-runtime-layers-en.svg" alt="Figure 3: Agentic Runtime Capability Layers" data-caption="Figure 3: Agentic Runtime Capability Layers"
width="463"
height="983"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Agentic Runtime Capability Layers&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Agentic Runtime builds from the foundation of Agents and workflows, connecting through orchestration and tool protocols, with the runtime managing state, memory, and evaluation. It provides secure execution environments (Sandbox and Policy), and ultimately implements a resource and cost control plane that unifies GPU, quota, and billing management.&lt;/p&gt;
&lt;p&gt;Without a runtime, an Agent is just a demo; without cost constraints, an Agent is just a risk amplifier.&lt;/p&gt;
&lt;h2 id="outlook-for-2026-the-foundation-of-engineering-matters-again"&gt;Outlook for 2026: The &amp;ldquo;Foundation&amp;rdquo; of Engineering Matters Again&lt;/h2&gt;
&lt;p&gt;Looking ahead to 2026, I remain cautiously optimistic.&lt;/p&gt;
&lt;p&gt;I do not believe the future belongs to &amp;ldquo;those who write the best prompts,&amp;rdquo; but more likely to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Those who understand runtime boundaries&lt;/li&gt;
&lt;li&gt;Those who can govern compute as a constrained resource&lt;/li&gt;
&lt;li&gt;Those who design AI systems as long-running systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From 2025 onwards, software engineering is no longer code-centric, but &lt;strong&gt;runtime and cost-centric&lt;/strong&gt;. This is not a regression, but a return: a return to being responsible for the whole system and for real-world constraints.&lt;/p&gt;
&lt;p&gt;For me personally, this is both a year-end summary and the direction I will continue to invest in for the coming years.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In 2025, the focus of software engineering has shifted from code itself to runtime and cost governance. The rise of AI and Agents has not diminished the value of engineering, but has pushed complexity to a higher level. In the future, understanding runtime, managing compute and cost will become the new core competencies for engineers. I hope this year-end review provides some inspiration and reflection for fellow professionals.&lt;/p&gt;</content:encoded></item><item><title>From Cloud Native to AI Native: Why Kubernetes Is the Foundation for Next-Gen AI Agents</title><link>https://jimmysong.io/blog/ai-native-from-cloud-native/</link><pubDate>Wed, 24 Dec 2025 12:25:52 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-native-from-cloud-native/</guid><description>Explores why AI Agents need Kubernetes infrastructure and how Agent orchestration, MCP services, and AI gateways enable production-ready AI architectures.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;As a long-time practitioner in the cloud native field, I am increasingly convinced of one thing: &lt;strong&gt;AI Agents are not just a change in application form, but a migration of infrastructure paradigms.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As artificial intelligence evolves from demos and copilots to systems that truly take on tasks and responsibilities, &lt;strong&gt;AI Agents&lt;/strong&gt; are becoming the new execution units in enterprise IT architectures. They not only &amp;ldquo;think,&amp;rdquo; but also &lt;strong&gt;act&lt;/strong&gt;: they can invoke tools, access systems, and collaborate to achieve goals.&lt;/p&gt;
&lt;p&gt;This raises an important question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What kind of infrastructure should such systems run on?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In my view, Kubernetes remains a solid choice for large-scale scenarios—but only if we &lt;strong&gt;reimagine Kubernetes in an AI-native way&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="cloud-native-challenges-for-production-grade-ai-agents"&gt;Cloud Native Challenges for Production-Grade AI Agents&lt;/h2&gt;
&lt;p&gt;In real production environments, AI Agents expose infrastructure needs that are fundamentally different from traditional microservices. Agents are not &amp;ldquo;just another HTTP service&amp;rdquo;; they have three distinct characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Behavior is non-deterministic&lt;/strong&gt; (driven by model inference)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Execution paths are dynamic&lt;/strong&gt; (tool invocation cannot be fully enumerated in advance)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Decisions must be auditable, constrained, and reviewable&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we simply apply existing cloud native infrastructure, we quickly hit bottlenecks.&lt;/p&gt;
&lt;p&gt;The following table summarizes the main challenges and risks AI Agents face in cloud native environments:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Challenge Category&lt;/th&gt;
&lt;th&gt;Real Needs of Agents&lt;/th&gt;
&lt;th&gt;What Happens If Missing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Policy &amp;amp; Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dynamic control of tool and data access based on context, identity, and task&lt;/td&gt;
&lt;td&gt;Agents have &amp;ldquo;superuser&amp;rdquo; privileges, risks are uncontrollable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not just &amp;ldquo;did it succeed,&amp;rdquo; but also &lt;strong&gt;why was this decision made&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hard to debug, hard to review, hard to hold accountable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Governance &amp;amp; Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Platform-level guardrails enforce organizational policies&lt;/td&gt;
&lt;td&gt;Each Agent could become a &amp;ldquo;shadow AI&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Challenges and Risks for AI Agents in Cloud Native Environments
&lt;/figcaption&gt;
&lt;p&gt;All these issues point to one conclusion:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI Agents must be treated as first-class citizens in Kubernetes, not just ordinary workloads.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="core-architecture-making-agents-native-kubernetes-objects"&gt;Core Architecture: Making Agents Native Kubernetes Objects&lt;/h2&gt;
&lt;p&gt;Looking back at the evolution of cloud native technologies, we&amp;rsquo;ve gone through similar stages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Physical machines → Virtual machines&lt;/li&gt;
&lt;li&gt;Virtual machines → Containers&lt;/li&gt;
&lt;li&gt;Containers → Microservices&lt;/li&gt;
&lt;li&gt;Microservices → Declarative, governable platforms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;AI Agents are simply the next step.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A production-ready AI Agent architecture requires at least three layers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Agent Orchestration Layer&lt;/strong&gt;: Declaratively define Agents&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool Service-ization Layer (MCP Services)&lt;/strong&gt;: Turn capabilities into governable services&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Native Data Plane / Gateway&lt;/strong&gt;: Unify policy, security, and protocols&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="agent-orchestration-layer-declarative-agent-management"&gt;Agent Orchestration Layer: Declarative Agent Management&lt;/h2&gt;
&lt;p&gt;Agents should no longer be &amp;ldquo;runtime objects&amp;rdquo; inside an SDK—they should be managed like Pods or Deployments.&lt;/p&gt;
&lt;p&gt;Key concepts:&lt;/p&gt;
&lt;h3 id="agents-as-kubernetes-resources"&gt;Agents as Kubernetes Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Agents are defined using &lt;strong&gt;CRD (CustomResourceDefinition)&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Lifecycle managed via &lt;code&gt;kubectl&lt;/code&gt; or GitOps&lt;/li&gt;
&lt;li&gt;Agent &lt;strong&gt;models, tools, and policies&lt;/strong&gt; are all explicitly declared&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A typical Agent definition includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Agent logic&lt;/strong&gt; (inference loop)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model configuration&lt;/strong&gt; (specifying which large language model to use)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Callable toolset&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;This closely mirrors how we once decomposed &amp;ldquo;applications&amp;rdquo; into Deployments, Services, and ConfigMaps.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="tool-service-ization-layer-mcp-services-are-essential"&gt;Tool Service-ization Layer: MCP Services Are Essential&lt;/h2&gt;
&lt;p&gt;In Agent architectures, &lt;strong&gt;tools&lt;/strong&gt; are where real &amp;ldquo;actions&amp;rdquo; happen.&lt;/p&gt;
&lt;p&gt;Early MCP tools were often:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local processes&lt;/li&gt;
&lt;li&gt;Tightly coupled to a single Agent&lt;/li&gt;
&lt;li&gt;Lacking versioning, permissions, and auditing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is unsustainable in enterprise environments.&lt;/p&gt;
&lt;h3 id="the-essence-of-mcp-service-ization"&gt;The Essence of MCP Service-ization&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Tools → &lt;strong&gt;Remote services&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Services → &lt;strong&gt;Kubernetes native workloads&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Capabilities → &lt;strong&gt;Reusable, governable, auditable&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This step is fundamentally similar to how we once turned scripts into microservices.&lt;/p&gt;
&lt;h2 id="ai-native-gateway-the-control-plane-entry-for-the-agent-world"&gt;AI Native Gateway: The &amp;ldquo;Control Plane Entry&amp;rdquo; for the Agent World&lt;/h2&gt;
&lt;p&gt;As the number of Agents grows and tools/models diversify, &lt;strong&gt;connectivity itself becomes a system risk&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Traditional API Gateways do not understand scenarios like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP&lt;/li&gt;
&lt;li&gt;Agent-to-Agent (A2A) communication&lt;/li&gt;
&lt;li&gt;Model invocation context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thus, we need an &lt;strong&gt;AI native gateway&lt;/strong&gt; dedicated to mediation and governance.&lt;/p&gt;
&lt;p&gt;It must understand at least three types of traffic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A2T&lt;/strong&gt;: Agent → Tool&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A2L&lt;/strong&gt;: Agent → LLM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A2A&lt;/strong&gt;: Agent ↔ Agent&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And enforce, across these paths:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identity and authorization&lt;/li&gt;
&lt;li&gt;Policy and guardrails&lt;/li&gt;
&lt;li&gt;Auditing and rate limiting&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="architecture-overview"&gt;Architecture Overview&lt;/h2&gt;
&lt;p&gt;The diagram below illustrates the core layers and traffic paths of an AI-native system on Kubernetes:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-native-from-cloud-native/5be5cb784d4b228006abdf024bb99d6f.svg" data-img="https://assets.jimmysong.io/images/blog/ai-native-from-cloud-native/5be5cb784d4b228006abdf024bb99d6f.svg" alt="Figure 1: AI Native Architecture Layers and Traffic Paths" data-caption="Figure 1: AI Native Architecture Layers and Traffic Paths"
width="1311"
height="1642"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI Native Architecture Layers and Traffic Paths&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;AI Agents do not negate cloud native; on the contrary:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;AI Agents are the natural extension of cloud native in the era of intelligence.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Declarative → Agent definitions&lt;/li&gt;
&lt;li&gt;Service → MCP Services&lt;/li&gt;
&lt;li&gt;Service Mesh → AI Native Gateway&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If Kubernetes is the &amp;ldquo;automated factory,&amp;rdquo; then AI Agents are the &lt;strong&gt;intelligent workers who actually get things done&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;And the AI native gateway is the &lt;strong&gt;security and governance system tailored for these intelligent workers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This is not an optional architecture—it is &lt;strong&gt;the only path for AI to reach production&lt;/strong&gt;.&lt;/p&gt;</content:encoded></item><item><title>AI Open Source Landscape: A One-Stop Guide to AI Project Navigation and Scoring System</title><link>https://jimmysong.io/blog/ai-oss-landscape-intro/</link><pubDate>Tue, 23 Dec 2025 08:34:05 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-oss-landscape-intro/</guid><description>Comprehensive introduction to the AI Open Source Landscape&amp;#39;s positioning, interface, scoring model, and data mechanisms to help developers efficiently discover quality AI projects.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The AI Open Source Landscape is not just a project directory, but an innovative attempt to bring transparency and quantifiability to the AI open source ecosystem.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note: This article is intended for general readers and focuses on platform features and usage scenarios. If you want to see the technical details and formulas behind the scoring, please refer to:&lt;/strong&gt; &lt;a href="https://jimmysong.io/ai/ranking-criteria/"&gt;AI Project Scoring and Inclusion Criteria&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="project-background-and-positioning"&gt;Project Background and Positioning&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://jimmysong.io/ai/"&gt;AI Open Source Landscape&lt;/a&gt; aims to provide developers, researchers, and enterprise users with a one-stop navigation and evaluation platform for AI open source projects. With the rapid development of large language models (LLM, Large Language Model), multimodal models (Multimodal Model), and other AI technologies, the open source community has seen a surge of innovative projects. However, information is scattered and quality varies, making it difficult for users to filter and make decisions.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/ai-oss-landscape.webp" data-img="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/ai-oss-landscape.webp" alt="Figure 1: AI Open Source Landscape" data-caption="Figure 1: AI Open Source Landscape"
width="3653"
height="2494"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AI Open Source Landscape&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The AI Open Source Landscape systematically collects mainstream AI open source projects. As of the time of writing, it has included 851 open source projects. This landscape combines a multi-dimensional scoring system to help users efficiently discover, compare, and select the most suitable AI tools and frameworks for their needs. The platform not only focuses on models themselves, but also covers datasets, inference engines, evaluation tools, application frameworks, and the entire ecosystem chain, striving to promote transparency, quantifiability, and sustainable development in the AI open source ecosystem.&lt;/p&gt;
&lt;h2 id="main-interface-and-feature-highlights"&gt;Main Interface and Feature Highlights&lt;/h2&gt;
&lt;p&gt;The platform homepage presents project distribution in both landscape and list views, supporting category filtering, keyword search, and tag navigation to help users quickly locate target projects.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/project-details.webp" data-img="https://assets.jimmysong.io/images/blog/ai-oss-landscape-intro/project-details.webp" alt="Figure 2: Open Source Project Detail Page" data-caption="Figure 2: Open Source Project Detail Page"
width="2780"
height="2915"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Open Source Project Detail Page&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;For general readers, the main experience points include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Card view: One-sentence overview, star rating, and overall score for quick browsing and comparison.&lt;/li&gt;
&lt;li&gt;Health card: Displays overall health and key dimensions (activity, community, influence, sustainability) on the project page or sidebar, with the latest update marked for easy assessment of maintenance status.&lt;/li&gt;
&lt;li&gt;Detail page: Provides more background information, project links, and application scenarios to help you evaluate suitability for your needs.&lt;/li&gt;
&lt;li&gt;Smart badges: Visually display labels such as &amp;ldquo;Active&amp;rdquo;, &amp;ldquo;New Project&amp;rdquo;, &amp;ldquo;Popular&amp;rdquo;, &amp;ldquo;Archived&amp;rdquo; on cards, helping you quickly capture key project features.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are interested in the specific rules for badge determination or scoring, detailed explanations are available on the &lt;a href="https://jimmysong.io/ai/ranking-criteria/"&gt;Scoring Rules Page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="scoring-and-ranking-mechanism"&gt;Scoring and Ranking Mechanism&lt;/h2&gt;
&lt;p&gt;The platform uses multi-dimensional scores to reflect the overall health and popularity of projects. The main dimensions include: &lt;strong&gt;Activity&lt;/strong&gt;, &lt;strong&gt;Community&lt;/strong&gt;, &lt;strong&gt;Quality&lt;/strong&gt;, &lt;strong&gt;Sustainability&lt;/strong&gt;, and the comprehensive &lt;strong&gt;Health&lt;/strong&gt; score. These scores help you quickly judge whether a project is suitable for production or experimentation.&lt;/p&gt;
&lt;h2 id="data-sources-and-update-mechanism"&gt;Data Sources and Update Mechanism&lt;/h2&gt;
&lt;p&gt;The platform&amp;rsquo;s data mainly comes from GitHub, project lists, official documentation, and community recommendations. We regularly and automatically synchronize and update metrics to ensure that the &amp;ldquo;last updated&amp;rdquo; and scores displayed on the interface reflect the current maintenance status of projects. Projects that have not been updated for a long time or are determined to be &amp;ldquo;inactive&amp;rdquo; are moved to the &lt;a href="https://jimmysong.io/ai/archived/"&gt;Archived Page&lt;/a&gt;. Archived projects remain searchable and retain historical scores, but will not appear in the default view of active rankings, making it easier for readers to focus on projects that are still maintained and active.&lt;/p&gt;
&lt;p&gt;For general readers, the key points are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The page displays key metrics and &amp;ldquo;last updated&amp;rdquo; time, helping you quickly judge whether a project is still maintained.&lt;/li&gt;
&lt;li&gt;The AI Open Source Landscape continuously iterates on the scoring model to improve fairness and differentiation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="how-to-contribute-and-correct-data"&gt;How to Contribute and Correct Data&lt;/h2&gt;
&lt;p&gt;If you want a project to be included or its data updated, you can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/rootsongjc/rootsongjc.github.io/issues/new?template=ai-resource.md" target="_blank" rel="noopener"&gt;Submit an AI Open Source Project Inclusion Request&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Keep the project&amp;rsquo;s README, License, documentation, and other information complete in the repository to facilitate our data collection and assessment.&lt;/li&gt;
&lt;li&gt;For faster synchronization or if you encounter data issues, contact the maintainers via project issues or raise a request in the site discussion area.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="typical-use-cases-or-user-feedback"&gt;Typical Use Cases or User Feedback&lt;/h2&gt;
&lt;p&gt;The AI Open Source Landscape has been widely used in various scenarios such as AI developer selection, enterprise technology research, and academic studies. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Developers can quickly filter models or tools that meet their needs through the platform, saving significant research time.&lt;/li&gt;
&lt;li&gt;Enterprise technical teams use the ranking lists for competitor analysis and technology planning.&lt;/li&gt;
&lt;li&gt;Educational and research institutions refer to the landscape to understand trends in the AI open source ecosystem, supporting course design and topic selection.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some users have commented that the platform is &amp;ldquo;comprehensive, well-structured, and fair in scoring,&amp;rdquo; greatly improving the efficiency of AI project selection and learning. Community suggestions continue to drive ongoing improvements in platform features and content.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The AI Open Source Landscape systematically and quantitatively organizes the AI open source ecosystem: the backend worker is responsible for reliable data collection and scoring calculations (supporting backfill and migration), while frontend components handle fast rendering and visualization (including smart badges, health cards, and metric explanations).&lt;/p&gt;
&lt;p&gt;If you want to learn more about the scoring details or participate in improvements:&lt;/p&gt;
&lt;p&gt;The community is welcome to join in evaluation, backfilling historical data, and refining scoring rules, working together to make the AI open source ecosystem more transparent and sustainable.&lt;/p&gt;</content:encoded></item><item><title>AI 2026: Infrastructure, Agents, and the Next Cloud-Native Shift</title><link>https://jimmysong.io/blog/ai-2026-infra-agentic-runtime/</link><pubDate>Fri, 19 Dec 2025 03:54:31 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-2026-infra-agentic-runtime/</guid><description>2026 AI&amp;#39;s turning point: not models, but infrastructure, agentic runtimes, GPU efficiency, and new organizational forms.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The real turning point for AI in 2026 is not autonomy, but the maturity of infrastructure - where agentic runtimes, GPU efficiency, and organizational design will decide who wins.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="introduction-2026-is-not-an-ai-moment-it-is-an-infrastructure-moment"&gt;Introduction: 2026 Is Not an AI Moment, It Is an Infrastructure Moment&lt;/h2&gt;
&lt;p&gt;Over the past fifteen years, every major shift in software has followed a familiar arc. Microservices were adopted not out of love for distributed systems, but because monoliths reached organizational limits. Kubernetes succeeded not because containers were novel, but because infrastructure finally matched how teams operated. Cloud native was never about YAML—it was about &lt;strong&gt;operability at scale&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;AI now stands at a similar inflection point.&lt;/p&gt;
&lt;p&gt;The central question for 2026 is not whether models will become more autonomous. That debate overlooks the core issue. Instead, the real question is whether AI can become &lt;strong&gt;operable, governable, and economically sustainable&lt;/strong&gt; within real systems.&lt;/p&gt;
&lt;p&gt;Most organizations today are limited not by intelligence, but by infrastructure: inefficient GPU utilization, escalating inference costs, fragile agent demos, and a tendency to treat AI as a feature rather than a runtime. The next phase of AI will be shaped not by model breakthroughs, but by the maturity of AI infrastructure and its ability to absorb responsibility.&lt;/p&gt;
&lt;h2 id="from-automation-to-capability-multiplication--a-familiar-cloud-native-pattern"&gt;From Automation to Capability Multiplication — A Familiar Cloud-Native Pattern&lt;/h2&gt;
&lt;p&gt;Reflecting on early cloud adoption, the dominant narrative was cost reduction: fewer servers, lower CapEx, elastic scaling. Yet, the true payoff emerged later, when teams realized cloud enabled &lt;strong&gt;entirely new operating models&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;AI is repeating this pattern.&lt;/p&gt;
&lt;p&gt;The following diagram illustrates the shift from automation to capability multiplication.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/from-automation-to-capability-multilication.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/from-automation-to-capability-multilication.svg" alt="Figure 1: From Automation to Capability Multiplication" data-caption="Figure 1: From Automation to Capability Multiplication"
width="1642"
height="1214"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: From Automation to Capability Multiplication&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The first wave of AI focused on labor replacement. The second wave reframes AI as &lt;strong&gt;capability multiplication&lt;/strong&gt;: the same team, observing more signals, covering broader areas, and acting sooner.&lt;/p&gt;
&lt;p&gt;This mirrors the evolution of monitoring, tracing, and SRE practices. Rather than reducing engineers, these systems enabled continuous observation instead of occasional sampling.&lt;/p&gt;
&lt;p&gt;Preemptive AI systems—monitoring every interaction, log, and signal—are only viable if the underlying infrastructure can support them. This exposes a critical constraint: &lt;strong&gt;AI capability scales faster than AI infrastructure&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Without efficient scheduling, isolation, and utilization, multiplying capability simply multiplies cost.&lt;/p&gt;
&lt;h2 id="agents-are-becoming-distributed-systems-whether-we-admit-it-or-not"&gt;Agents Are Becoming Distributed Systems, Whether We Admit It or Not&lt;/h2&gt;
&lt;p&gt;The industry often discusses agents as products. In reality, agents are evolving into &lt;strong&gt;distributed systems&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The diagram below highlights this architectural shift.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/agents-are-becoming-distributed-systems.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/agents-are-becoming-distributed-systems.svg" alt="Figure 2: Agents Are Becoming Distributed Systems" data-caption="Figure 2: Agents Are Becoming Distributed Systems"
width="1102"
height="1382"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Agents Are Becoming Distributed Systems&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Single-agent designs resemble early monoliths: impressive demos, fragile behavior, and opaque failure modes. As tasks grow in complexity, systems must decompose work into planning, execution, verification, and review—making coordination inevitable.&lt;/p&gt;
&lt;p&gt;This is not merely a philosophical change, but an architectural one.&lt;/p&gt;
&lt;p&gt;Multi-agent systems introduce challenges familiar from the microservices era:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Coordination and orchestration&lt;/li&gt;
&lt;li&gt;Resource contention&lt;/li&gt;
&lt;li&gt;Fault isolation&lt;/li&gt;
&lt;li&gt;Observability and rollback&lt;/li&gt;
&lt;li&gt;Deterministic artifacts between stages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Labeling this as &amp;ldquo;multi-agent collaboration&amp;rdquo; can be misleading. What is actually occurring is &lt;strong&gt;workload decomposition and control-plane emergence&lt;/strong&gt;. Agents are transitioning from tools to workloads competing for limited resources.&lt;/p&gt;
&lt;p&gt;Recognizing this clarifies why agent progress is inseparable from infrastructure maturity.&lt;/p&gt;
&lt;h2 id="ai-infra-is-the-missing-layer-between-models-and-organizations"&gt;AI Infra Is the Missing Layer Between Models and Organizations&lt;/h2&gt;
&lt;p&gt;Cloud native taught us that abstractions only scale when a control plane exists.&lt;/p&gt;
&lt;p&gt;Currently, AI lacks a mature control plane.&lt;/p&gt;
&lt;p&gt;The following image demonstrates the gap between models and organizations.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/ai-infra-is-the-missing-layer-between-models-and-organizations.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/ai-infra-is-the-missing-layer-between-models-and-organizations.svg" alt="Figure 3: AI Infra Is the Missing Layer Between Models and Organizations" data-caption="Figure 3: AI Infra Is the Missing Layer Between Models and Organizations"
width="1102"
height="1260"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: AI Infra Is the Missing Layer Between Models and Organizations&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Models are powerful, but the surrounding infrastructure—scheduling, isolation, quota enforcement, cost attribution, observability—remains primitive, especially at the GPU layer.&lt;/p&gt;
&lt;p&gt;GPUs are expensive, scarce, and often underutilized. In many environments, utilization remains below 30–40%, while inference costs continue to rise. Training pipelines monopolize resources, inference workloads spike unpredictably, and organizations must choose between waste and throttling innovation.&lt;/p&gt;
&lt;p&gt;This is not a model problem. It is fundamentally an &lt;strong&gt;AI infrastructure problem&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The next phase of AI will depend on treating GPUs as we learned to treat CPUs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fine-grained allocation&lt;/li&gt;
&lt;li&gt;Fair sharing&lt;/li&gt;
&lt;li&gt;Preemption and prioritization&lt;/li&gt;
&lt;li&gt;Clear ownership and accounting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Until GPU utilization becomes a primary design goal, AI systems will remain economically fragile.&lt;/p&gt;
&lt;h2 id="domain-expertise-matters-because-infrastructure-finally-exposes-it"&gt;Domain Expertise Matters Because Infrastructure Finally Exposes It&lt;/h2&gt;
&lt;p&gt;As models plateau in general reasoning, differentiation shifts elsewhere.&lt;/p&gt;
&lt;p&gt;The diagram below illustrates how infrastructure exposes domain expertise.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/domain-expertise-matters-because-infrastructure-finally-exposes-it.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/domain-expertise-matters-because-infrastructure-finally-exposes-it.svg" alt="Figure 4: Domain Expertise Matters Because Infrastructure Finally Exposes It" data-caption="Figure 4: Domain Expertise Matters Because Infrastructure Finally Exposes It"
width="1482"
height="1302"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Domain Expertise Matters Because Infrastructure Finally Exposes It&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In cloud-native systems, competitive advantage eventually moved from frameworks to &lt;strong&gt;operational excellence&lt;/strong&gt;: superior runbooks, incident response, and cost control. AI is following a similar trajectory.&lt;/p&gt;
&lt;p&gt;High-value AI systems must operate within dense, rule-heavy domains such as finance, healthcare, manufacturing, and infrastructure operations. What matters is not abstract intelligence, but the ability to encode domain constraints, exceptions, and failure patterns.&lt;/p&gt;
&lt;p&gt;Here, domain experts become central—not as prompt engineers, but as &lt;strong&gt;system shapers&lt;/strong&gt;. Their decisions define agent permissions, human intervention points, and error containment strategies.&lt;/p&gt;
&lt;p&gt;Infrastructure determines whether this expertise can be safely operationalized.&lt;/p&gt;
&lt;h2 id="simulation-is-becoming-the-new-staging-environment-for-ai"&gt;Simulation Is Becoming the New Staging Environment for AI&lt;/h2&gt;
&lt;p&gt;One of the most important lessons from cloud-native operations: distributed systems are not tested in production.&lt;/p&gt;
&lt;p&gt;AI systems that act, plan, and modify state are no exception.&lt;/p&gt;
&lt;p&gt;The following image shows simulation as the new staging environment.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/simulation-is-becoming-the-new-staging-environment-for-ai.svg" data-img="https://assets.jimmysong.io/images/blog/ai-2026-infra-agentic-runtime/simulation-is-becoming-the-new-staging-environment-for-ai.svg" alt="Figure 5: Simulation Is Becoming the New Staging Environment for AI" data-caption="Figure 5: Simulation Is Becoming the New Staging Environment for AI"
width="1062"
height="1482"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Simulation Is Becoming the New Staging Environment for AI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Training and validating agents directly in live environments is unsustainable. The future lies in &lt;strong&gt;simulation-first AI development&lt;/strong&gt;—sandboxed environments that mirror real systems, workloads, and constraints.&lt;/p&gt;
&lt;p&gt;This approach is analogous to staging clusters, chaos engineering, and load testing, but elevated for decision-making systems. Evaluation shifts from static benchmarks to behavioral metrics: intervention rates, rollback frequency, and cost impact.&lt;/p&gt;
&lt;p&gt;Organizations that build these environments will advance faster and safer. Those that do not may remain limited by conservative deployments and restricted autonomy.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Technological revolutions succeed not on novelty alone, but when infrastructure, tooling, and organizational models align.&lt;/p&gt;
&lt;p&gt;AI is nearing that pivotal moment.&lt;/p&gt;
&lt;p&gt;The leaders in 2026 will be those who:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Treat AI as a runtime, not just a feature&lt;/li&gt;
&lt;li&gt;Optimize for resource efficiency, especially GPUs&lt;/li&gt;
&lt;li&gt;Recognize agents as distributed systems&lt;/li&gt;
&lt;li&gt;Redesign organizations around continuous learning systems&lt;/li&gt;
&lt;li&gt;Invest in infrastructure ahead of autonomy&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI is no longer just a model problem. It is an infrastructure challenge—and the next phase will be decided not in labs, but in production systems.&lt;/p&gt;</content:encoded></item><item><title>What I Saw at COSCon'25: The Real State of Open Source in China</title><link>https://jimmysong.io/blog/coscon-2025-china-open-source-observation/</link><pubDate>Thu, 18 Dec 2025 06:14:51 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/coscon-2025-china-open-source-observation/</guid><description>From an engineering and organizer&amp;#39;s perspective, real changes at COSCon&amp;#39;25: AI as the default backdrop, discussions returning to engineering issues, and Chinese open source entering a long-term phase.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Attending COSCon'25 in Beijing, I observed firsthand how open source in China is shifting: AI is now the default context, discussions are grounded in real engineering, and the community is embracing long-term thinking. These are not just trends—they are the new reality.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In early December this year, I attended COSCon'25, the China Open Source Annual Conference, in Beijing. Although I have worked in open source for many years, this was my first time participating in an event organized by the Open Source Society—and I joined as a sub-forum producer. Previously, I thought such conferences were too high-level or disconnected from reality, but after actually taking part, I found there was much to gain.&lt;/p&gt;
&lt;p&gt;A quick note: &lt;strong&gt;this article is not an official conference summary or review&lt;/strong&gt;. The organizers have already published detailed information about the event&amp;rsquo;s scale, attendee numbers, and forum sessions. If you&amp;rsquo;re interested in those details, please refer to the official article:
&lt;a href="https://mp.weixin.qq.com/s/1Q5xBUEmSN9MXon03P00lA" target="_blank" rel="noopener"&gt;COSCon'25: The 10th China Open Source Annual Conference Successfully Concludes in Beijing—A Comprehensive Recap!&lt;/a&gt;&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/coscon-2025-china-open-source-observation/banner.webp" data-img="https://assets.jimmysong.io/images/blog/coscon-2025-china-open-source-observation/banner.webp" alt="Figure 1: 10th COSCon Venue" data-caption="Figure 1: 10th COSCon Venue"
width="1080"
height="716"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: 10th COSCon Venue&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;What I want to share is this: &lt;strong&gt;Standing on site, on the engineering front lines, and as an organizer rather than an audience member, I saw real changes happening in Chinese open source.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="this-coscon-no-more-trying-to-prove-open-source-matters"&gt;This COSCon: No More Trying to &amp;ldquo;Prove Open Source Matters&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;One clear impression:
&lt;strong&gt;Almost no one spent time arguing &amp;ldquo;why do open source&amp;rdquo; anymore.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In earlier years, common narratives included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is open source safe?&lt;/li&gt;
&lt;li&gt;Can open source be commercialized?&lt;/li&gt;
&lt;li&gt;Can China create its own open source projects?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But at COSCon'25, these questions were basically assumed as &amp;ldquo;background conditions.&amp;rdquo; The focus shifted to &lt;strong&gt;those already doing open source, and what comes next&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This doesn&amp;rsquo;t mean the issues have disappeared, but it does mean:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In China&amp;rsquo;s engineering circles, open source is no longer a &amp;ldquo;philosophical choice&amp;rdquo;—it&amp;rsquo;s a practical way of working.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="ai-as-background-noise-not-the-main-character"&gt;AI as Background Noise, Not the Main Character&lt;/h2&gt;
&lt;p&gt;The theme of this year&amp;rsquo;s conference was Open Source × Open Intelligence, but interestingly, &lt;strong&gt;AI did not take center stage&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Instead, it was more like background noise—
Almost every topic touched on AI, but no one was giving talks solely &amp;ldquo;about AI.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;You would see it repeatedly in areas like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cloud native scheduling, focusing on GPU / NPU / heterogeneous resources&lt;/li&gt;
&lt;li&gt;Storage and data, focusing on data paths for training and inference&lt;/li&gt;
&lt;li&gt;Serverless, focusing on LLM cold starts and elasticity&lt;/li&gt;
&lt;li&gt;Observability, focusing on what to do when system complexity gets out of hand&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AI was not treated as a &amp;ldquo;hot trend,&amp;rdquo; but as &lt;strong&gt;a new workload reality&lt;/strong&gt;.
This is a significant change, though not one easily captured in press releases.&lt;/p&gt;
&lt;h2 id="real-impressions-as-a-cloud-native-sub-forum-producer"&gt;Real Impressions as a Cloud Native Sub-forum Producer&lt;/h2&gt;
&lt;p&gt;I helped organize the cloud native open source sub-forum at this year&amp;rsquo;s conference. This role gave me a perspective very different from that of a typical attendee.&lt;/p&gt;
&lt;h3 id="first-topics-clearly-converged-on-engineering-problems"&gt;First, Topics Clearly Converged on &amp;ldquo;Engineering Problems&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;There were almost no talks about Kubernetes concepts;
Very few about &amp;ldquo;architectural philosophies.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Instead, the focus was on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What pitfalls did you encounter at what scale?&lt;/li&gt;
&lt;li&gt;Why did you choose this solution over another?&lt;/li&gt;
&lt;li&gt;Which problems remain unsolved?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many presentations weren&amp;rsquo;t &amp;ldquo;pleasant to hear,&amp;rdquo; but they were very real.&lt;/p&gt;
&lt;h3 id="second-the-boundary-between-academia-and-industry-is-thinning"&gt;Second, The Boundary Between Academia and Industry Is Thinning&lt;/h3&gt;
&lt;p&gt;This was especially evident this year.&lt;/p&gt;
&lt;p&gt;Some talks from universities and research institutes were no longer just &amp;ldquo;from a paper&amp;rsquo;s perspective,&amp;rdquo; but directly addressed core issues in industrial systems, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cold start of serverless LLMs&lt;/li&gt;
&lt;li&gt;The real value of RDMA in inference paths&lt;/li&gt;
&lt;li&gt;Whether prefill/decode separation is truly feasible in engineering&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These topics may not be immediately applicable, but &lt;strong&gt;they are now colliding head-on with engineering problems&lt;/strong&gt;, rather than talking past each other.&lt;/p&gt;
&lt;h3 id="third-open-source-is-no-longer-just-about-code"&gt;Third, Open Source Is No Longer Just About Code&lt;/h3&gt;
&lt;p&gt;In many discussions, &amp;ldquo;governance,&amp;rdquo; &amp;ldquo;maintenance cost,&amp;rdquo; and &amp;ldquo;community collaboration&amp;rdquo; came up frequently.&lt;/p&gt;
&lt;p&gt;This is a signal:
When a project is truly being used, &lt;strong&gt;code is no longer the hardest part&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="main-forum-more-questions-not-answers"&gt;Main Forum: More Questions, Not Answers&lt;/h2&gt;
&lt;p&gt;If I had to sum up the main forum in one sentence:
&lt;strong&gt;It kept raising questions, but wasn&amp;rsquo;t in a hurry to provide answers.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Has the boundary of open source changed in the AI era?&lt;/li&gt;
&lt;li&gt;Should models, data, and chips become part of the open source core?&lt;/li&gt;
&lt;li&gt;Are developers&amp;rsquo; roles being redefined?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;There are no standard answers to these questions, but the fact that they are being raised repeatedly shows they have become common concerns, not just the thoughts of a few.&lt;/p&gt;
&lt;h2 id="exhibition-area-and-sub-forums-closer-to-the-real-ecosystem"&gt;Exhibition Area and Sub-forums: Closer to the Real Ecosystem&lt;/h2&gt;
&lt;p&gt;Compared to the main forum, I personally paid more attention to the sub-forums and exhibition area.&lt;/p&gt;
&lt;p&gt;There, you would see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many projects no longer emphasize &amp;ldquo;who they want to replace&amp;rdquo;&lt;/li&gt;
&lt;li&gt;More discussions about &amp;ldquo;who they can work with&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Several communities are seriously discussing long-term maintenance, not just releasing versions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This may not be glamorous, but it&amp;rsquo;s important.&lt;/p&gt;
&lt;h2 id="a-personal-judgment"&gt;A Personal Judgment&lt;/h2&gt;
&lt;p&gt;If I had to make a judgment about COSCon'25, I would say:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Chinese open source is shifting from &amp;ldquo;can we do it&amp;rdquo; to &amp;ldquo;can we sustain it for the long term.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is a more difficult, but also more realistic, stage.&lt;/p&gt;
&lt;p&gt;This COSCon did not try to create a grand narrative. Instead, it felt like a &amp;ldquo;status exposure&amp;rdquo; at a particular stage:
There are more questions, participants are more diverse, but the discussions are also closer to the real world.&lt;/p&gt;
&lt;p&gt;Open source doesn&amp;rsquo;t depend on a single conference to move forward, but being on site helps you see more clearly:
&lt;strong&gt;Where exactly are we standing right now?&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title>Decoding Goose: Why It Joined AAIF and What This Means for Agentic Runtime</title><link>https://jimmysong.io/blog/goose-aaif-agentic-runtime/</link><pubDate>Fri, 12 Dec 2025 08:16:48 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/goose-aaif-agentic-runtime/</guid><description>An analysis of Block&amp;#39;s Goose project, why it became one of the first Agentic AI Foundation (AAIF) projects, and what this means for Agentic Runtime and the evolution of AI-Native infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Goose is not a project that excites you at first glance in this wave of Agent innovation, but its entry into AAIF signals a deeper shift in how we think about Agentic Runtime and AI-Native infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;At first glance, &lt;a href="https://github.com/block/goose" target="_blank" rel="noopener"&gt;Goose&lt;/a&gt; is not a project that immediately excites people.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/goose-aaif-agentic-runtime/goose.webp" data-img="https://assets.jimmysong.io/images/blog/goose-aaif-agentic-runtime/goose.webp" alt="Figure 1: Goose App UI" data-caption="Figure 1: Goose App UI"
width="2622"
height="2360"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Goose App UI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;It doesn&amp;rsquo;t have flashy demos, nor does it showcase overwhelming multimodal capabilities, and it certainly doesn&amp;rsquo;t look like an AI product aimed at consumers. Yet, this seemingly &amp;ldquo;plain&amp;rdquo; project became one of the first donations to the Agentic AI Foundation (AAIF), standing alongside Anthropic&amp;rsquo;s MCP and OpenAI&amp;rsquo;s AGENTS.md.&lt;/p&gt;
&lt;p&gt;This fact alone is worth a closer look.&lt;/p&gt;
&lt;p&gt;This article does not aim to prove how powerful Goose is, but rather to answer three more practical questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What overlooked but long-term critical problems does Goose actually solve?&lt;/li&gt;
&lt;li&gt;Why was it Goose, and not another Agent framework, that entered AAIF?&lt;/li&gt;
&lt;li&gt;What does this mean for Agentic Runtime and AI-Native infrastructure, which I care about?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="gooses-true-positioning-its-not-an-ide-or-a-chatbot"&gt;Goose&amp;rsquo;s True Positioning: It&amp;rsquo;s Not an IDE or a Chatbot&lt;/h2&gt;
&lt;p&gt;If you only look at its surface features, Goose is easily mistaken for one of two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &amp;ldquo;multi-model AI desktop client&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Or an &amp;ldquo;intelligent programming assistant that can run commands&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But inside Block, it was never designed as a &amp;ldquo;tool&amp;rdquo; from the start.&lt;/p&gt;
&lt;p&gt;Goose&amp;rsquo;s origin is closely tied to Block&amp;rsquo;s engineering environment.&lt;/p&gt;
&lt;p&gt;Block (formerly Square) is a classic engineering-driven company: complex systems, high automation needs, many internal tools, and very high execution costs in real production environments. In its recent AI transformation, Block did not focus on &amp;ldquo;which model to choose&amp;rdquo; or &amp;ldquo;which AI tool to introduce,&amp;rdquo; but directly targeted the engineering execution layer itself.&lt;/p&gt;
&lt;p&gt;Goose was born in this context.&lt;/p&gt;
&lt;p&gt;Its goal is not to &amp;ldquo;help people code faster,&amp;rdquo; but to enable models to &lt;strong&gt;stably and controllably take action&lt;/strong&gt;: run tests, modify code, drive UIs, call internal systems, and operate reliably in real engineering environments.&lt;/p&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Goose is more like an executable Agent Runtime than a conversation-centric product.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="blocks-ai-transformation-started-with-organization-not-tools"&gt;Block&amp;rsquo;s AI Transformation Started with Organization, Not Tools&lt;/h2&gt;
&lt;p&gt;To understand Goose, you can&amp;rsquo;t ignore a key organizational shift at Block.&lt;/p&gt;
&lt;p&gt;In an interview with Block&amp;rsquo;s CTO, one signal was very clear: the starting point for AI transformation was not buying tools or stacking models, but the organizational structure itself.&lt;/p&gt;
&lt;p&gt;Block shifted from a business-line GM model to a more functionally oriented structure, making engineering and design the company&amp;rsquo;s core scheduling units again. This is essentially a proactive response to Conway&amp;rsquo;s Law.&lt;/p&gt;
&lt;p&gt;If the organizational structure doesn&amp;rsquo;t allow technical capabilities to be orchestrated centrally, Agents will ultimately remain &amp;ldquo;personal assistants&amp;rdquo; or &amp;ldquo;engineering toys.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;From this perspective, Goose is not just a tool, but a &lt;strong&gt;cultural signal&lt;/strong&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Every employee can use AI to build and execute real system behaviors.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This also explains a fact many overlook:
Goose was not packaged as SaaS, nor was it rushed to commercialization, but was open-sourced and rapidly standardized.&lt;/p&gt;
&lt;p&gt;Because its role inside Block is closer to an &amp;ldquo;operating system for execution models&amp;rdquo; than a product that can be sold separately.&lt;/p&gt;
&lt;h2 id="why-did-goose-enter-aaif-not-because-its-technically-strongest"&gt;Why Did Goose Enter AAIF? Not Because It&amp;rsquo;s &amp;ldquo;Technically Strongest&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;This is what confuses outsiders the most.&lt;/p&gt;
&lt;p&gt;If you only look at flashy features, model support, or community popularity, Goose doesn&amp;rsquo;t stand out. But AAIF&amp;rsquo;s choice was not about &amp;ldquo;maximum capability,&amp;rdquo; but about &lt;strong&gt;whether the position is right&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Looking at the first batch of AAIF projects, a clear chain emerges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP (Anthropic): Defines how models safely and standardly call tools&lt;/li&gt;
&lt;li&gt;AGENTS.md (OpenAI): Defines behavioral conventions for Agents in code repositories&lt;/li&gt;
&lt;li&gt;Goose (Block): A real, runnable, open-source Agent execution framework&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Goose&amp;rsquo;s role is not to set new protocols, but to serve as the &lt;strong&gt;practical carrier and reference implementation&lt;/strong&gt; for these protocols.&lt;/p&gt;
&lt;p&gt;It proves one thing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MCP is not just a paper standard&lt;/li&gt;
&lt;li&gt;Agents are not just research concepts&lt;/li&gt;
&lt;li&gt;In real enterprise environments, they can actually run&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From this angle, Goose&amp;rsquo;s &amp;ldquo;ordinariness&amp;rdquo; is actually an advantage.&lt;/p&gt;
&lt;p&gt;It is not tied to Block&amp;rsquo;s business moat, nor does it have irreplaceable private APIs; it can be forked, replaced, audited—&amp;ldquo;boring&amp;rdquo; enough, and neutral enough.&lt;/p&gt;
&lt;p&gt;And that is the most important trait of public infrastructure.&lt;/p&gt;
&lt;h2 id="gooses-value-lies-not-in-today-but-in-23-years"&gt;Goose&amp;rsquo;s Value Lies Not in Today, But in 2–3 Years&lt;/h2&gt;
&lt;p&gt;From a longer-term perspective, Goose&amp;rsquo;s value becomes clearer.&lt;/p&gt;
&lt;p&gt;What we&amp;rsquo;re experiencing now is much like the early days of containers:
Most Agent projects today are demos, IDE plugins, or workflow wrappers, but what&amp;rsquo;s really missing is a &lt;strong&gt;sustainable, schedulable, observable execution layer&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Goose is already moving in this direction.&lt;/p&gt;
&lt;p&gt;Block&amp;rsquo;s metrics for Goose&amp;rsquo;s success are straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How many human hours are saved each week&lt;/li&gt;
&lt;li&gt;How much non-technical teams reduce their dependence on engineering teams&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Behind this is a judgment I&amp;rsquo;m increasingly convinced of:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What enterprises truly need is not &amp;ldquo;smarter models,&amp;rdquo; but &amp;ldquo;cheaper execution.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The long-term value of Agents is not in generation quality, but in execution substitution rate.&lt;/p&gt;
&lt;h2 id="aaif-is-an-attempt-at-infrastructure-level-consensus"&gt;AAIF Is an Attempt at Infrastructure-Level Consensus&lt;/h2&gt;
&lt;p&gt;Just as CNCF did for cloud native, AAIF is not guaranteed to succeed.&lt;/p&gt;
&lt;p&gt;But it at least marks a shift:
Agents are no longer just application-layer innovations, but are beginning to enter the stage of infrastructure-layer collaboration.&lt;/p&gt;
&lt;p&gt;As a reference implementation, Goose is likely to remain in this ecosystem for a long time—even if it is replaced, rewritten, or evolved in the future.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;If you see Goose as a &amp;ldquo;product,&amp;rdquo; it is indeed not dazzling.&lt;/p&gt;
&lt;p&gt;But if you place it in the long-term evolution path of Agentic AI, its significance becomes clear:&lt;/p&gt;
&lt;p&gt;It is not the end, but a necessary intermediate state.&lt;/p&gt;
&lt;p&gt;For me, the emergence of Goose further confirms one thing:&lt;/p&gt;
&lt;p&gt;Agentic Runtime is not a conceptual problem, but an engineering and organizational one.&lt;/p&gt;
&lt;p&gt;And that is one of the most worthwhile directions to invest energy in over the next few years.&lt;/p&gt;</content:encoded></item><item><title>ARK: Multi-Agent Systems Are Finally Entering the Engineer's World</title><link>https://jimmysong.io/blog/ark-agentic-runtime-for-kubernetes/</link><pubDate>Thu, 11 Dec 2025 13:19:42 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ark-agentic-runtime-for-kubernetes/</guid><description>How ARK uses cloud-native architecture and declarative runtime to drive engineering adoption of multi-agent systems and shape the Agentic Runtime ecosystem.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The deep integration of cloud native and AI, with the ARK platform, provides a new paradigm for engineering multi-agent systems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;AI Agents are moving from the &amp;ldquo;single agent demo&amp;rdquo; stage to &amp;ldquo;large-scale operation.&amp;rdquo; The real challenge does not lie in the model itself, but in engineering issues at runtime: model management, tool invocation, state maintenance, elastic scaling, team collaboration, observability, deployment, and upgrades. These are problems that traditional agent libraries struggle to solve.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ARK (Agentic Runtime for Kubernetes)&lt;/strong&gt; provides a fully operational, observable, governable, and continuously deliverable multi-agent operating system. It is not a Python library, but a complete runtime platform.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-homepage.webp" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-homepage.webp" alt="Figure 1: ARK Dashboard" data-caption="Figure 1: ARK Dashboard"
width="3176"
height="1822"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: ARK Dashboard&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Note: In this article, ARK refers to McKinsey&amp;rsquo;s open-source &lt;a href="https://github.com/mckinsey/ark-agent-runtime-for-kubernetes" target="_blank" rel="noopener"&gt;ARK Agent Runtime for Kubernetes&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This article, from an engineer&amp;rsquo;s perspective, will reorganize ARK&amp;rsquo;s core capabilities and answer the following questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What engineering challenges does ARK actually solve?&lt;/li&gt;
&lt;li&gt;Why is it worth special attention in the cloud native field?&lt;/li&gt;
&lt;li&gt;How is it fundamentally different from frameworks like LangChain and CrewAI?&lt;/li&gt;
&lt;li&gt;What insights does it offer for the Agentic Runtime ecosystem?&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="ark-architecture-treating-agents-as-kubernetes-native-workloads"&gt;ARK Architecture: Treating Agents as Kubernetes-Native Workloads&lt;/h2&gt;
&lt;p&gt;The core idea of ARK is: &lt;strong&gt;An agent is not a script, but a schedulable, governable, and observable Kubernetes workload.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The following architecture diagram illustrates ARK&amp;rsquo;s underlying structure.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/168007ae485fa14769e5483aa20805d3.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/168007ae485fa14769e5483aa20805d3.svg" alt="Figure 2: ARK Overall Architecture" data-caption="Figure 2: ARK Overall Architecture"
width="2060"
height="1146"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: ARK Overall Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This diagram highlights ARK&amp;rsquo;s key design points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CRDs declare requirements&lt;/strong&gt; (Agent, Model, Team, Tool, Memory, etc.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Controller translates declarations into actual Pods/Services&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The API provides a unified communication entry point and team orchestration&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory supports long-term state management for agents&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MCP Server enables external systems to become tools&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dashboard provides visual management and observability&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ARK adopts the typical cloud-native Operator pattern and applies it to multi-agent systems.&lt;/p&gt;
&lt;h2 id="crd-arks-abstraction-layer"&gt;CRD: ARK&amp;rsquo;s &amp;ldquo;Abstraction Layer&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Unlike traditional agent frameworks where &amp;ldquo;code is logic,&amp;rdquo; ARK uses CRDs (Custom Resource Definitions) to abstract the components of agent applications.&lt;/p&gt;
&lt;p&gt;The main CRD types in ARK include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model&lt;/li&gt;
&lt;li&gt;Agent&lt;/li&gt;
&lt;li&gt;Team&lt;/li&gt;
&lt;li&gt;Tool&lt;/li&gt;
&lt;li&gt;Memory&lt;/li&gt;
&lt;li&gt;Evaluation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These CRDs correspond to all the key components of an agent system.&lt;/p&gt;
&lt;p&gt;The following diagram shows the structure of the CRDs:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/b464d2b85b6d664b51fa48a5aed2fbd0.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/b464d2b85b6d664b51fa48a5aed2fbd0.svg" alt="Figure 3: CRD Structure (Simplified)" data-caption="Figure 3: CRD Structure (Simplified)"
width="795"
height="829"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: CRD Structure (Simplified)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Through CRDs, ARK achieves the following engineering features:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;All resources are GitOps-ready&lt;/strong&gt;, supporting declarative management&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Changes are auditable, reversible, and continuously deliverable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The evolution of models, tools, and agents does not require business code changes&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the key gene of ARK&amp;rsquo;s engineering-oriented system.&lt;/p&gt;
&lt;h2 id="agent-execution-flow-from-query-to-tool-invocation"&gt;Agent Execution Flow: From Query to Tool Invocation&lt;/h2&gt;
&lt;p&gt;The following image shows how to view query details in the ARK Dashboard.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-queries.webp" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/ark-dashboard-queries.webp" alt="Figure 4: Viewing Query Details in ARK Dashboard" data-caption="Figure 4: Viewing Query Details in ARK Dashboard"
width="3176"
height="1822"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Viewing Query Details in ARK Dashboard&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In ARK, the complete execution flow for an agent receiving a query is as follows:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/67a8b1142ee63f7cacd4d907cd198ce4.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/67a8b1142ee63f7cacd4d907cd198ce4.svg" alt="Figure 5: Agent Execution Flow" data-caption="Figure 5: Agent Execution Flow"
width="1146"
height="591"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Agent Execution Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This flow has the following characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Memory modules are naturally involved in the execution flow, without code specialization&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Large language model (LLM, Large Language Model) and tool invocation are governed by the runtime&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agents can reside in Pods long-term, not just as one-off processes&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes ARK more like an &amp;ldquo;agent microservice platform.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Below is an example of a request and response:&lt;/p&gt;
&lt;h2 id="the-true-value-of-multi-agent-team-orchestration"&gt;The True Value of Multi-Agent: Team Orchestration&lt;/h2&gt;
&lt;p&gt;ARK&amp;rsquo;s Team CRD allows multiple agents to be woven into a higher-level &amp;ldquo;system,&amp;rdquo; enabling multi-agent collaboration.&lt;/p&gt;
&lt;p&gt;The following diagram shows the collaboration model of a multi-agent team:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/0fb6990e479cd7b5c0ff3c8e8626693b.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/0fb6990e479cd7b5c0ff3c8e8626693b.svg" alt="Figure 6: Multi-Agent Team Collaboration" data-caption="Figure 6: Multi-Agent Team Collaboration"
width="786"
height="499"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: Multi-Agent Team Collaboration&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;strong&gt;The engineering value of Team is reflected in:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Making &amp;ldquo;expert collaboration&amp;rdquo; declarative and configurable&lt;/li&gt;
&lt;li&gt;Flexible strategies (such as polling, role assignment, routing, etc.)&lt;/li&gt;
&lt;li&gt;A2A Gateway handles message passing&lt;/li&gt;
&lt;li&gt;The Team itself is observable (every round of collaboration is logged)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For enterprises, this means the &amp;ldquo;agent organizational structure&amp;rdquo; can be standardized, replayed, and tuned.&lt;/p&gt;
&lt;h2 id="fundamental-differences-between-ark-and-other-frameworks"&gt;Fundamental Differences Between ARK and Other Frameworks&lt;/h2&gt;
&lt;p&gt;Many engineers, upon first seeing ARK, may wonder:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Is it just LangChain or CrewAI wrapped in Kubernetes?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In fact, there are fundamental differences. The following diagram compares the structural differences between ARK and mainstream agent frameworks:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/8da9b272d1930a2356a6401b6615d134.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-for-kubernetes/8da9b272d1930a2356a6401b6615d134.svg" alt="Figure 7: ARK vs LangChain / AutoGPT / CrewAI" data-caption="Figure 7: ARK vs LangChain / AutoGPT / CrewAI"
width="3454"
height="345"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 7: ARK vs LangChain / AutoGPT / CrewAI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The table below further summarizes the key differences:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Traditional Agent Libraries&lt;/th&gt;
&lt;th&gt;ARK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Pattern&lt;/td&gt;
&lt;td&gt;Write Python code&lt;/td&gt;
&lt;td&gt;Write CRDs (declarative)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Local/Container&lt;/td&gt;
&lt;td&gt;Kubernetes-native scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;Managed inside code&lt;/td&gt;
&lt;td&gt;Memory CR + Service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tools&lt;/td&gt;
&lt;td&gt;Integrated at code level&lt;/td&gt;
&lt;td&gt;Tool CR + MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Agent&lt;/td&gt;
&lt;td&gt;Dialog managed in code&lt;/td&gt;
&lt;td&gt;Team CR + A2A protocol&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;Almost none&lt;/td&gt;
&lt;td&gt;OTel / Langfuse / Dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Use Cases&lt;/td&gt;
&lt;td&gt;Demo / Prototype / Single Agent&lt;/td&gt;
&lt;td&gt;Enterprise production / Multi-Agent Systems&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: ARK vs Traditional Agent Libraries
&lt;/figcaption&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LangChain is a &amp;ldquo;library for building agents,&amp;rdquo; while ARK is a &amp;ldquo;platform for running agents.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The two are not in conflict and are, in fact, highly complementary.&lt;/p&gt;
&lt;h2 id="the-engineering-value-of-ark"&gt;The Engineering Value of ARK&lt;/h2&gt;
&lt;p&gt;To summarize ARK&amp;rsquo;s engineering value in simple terms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Turns agents into &lt;strong&gt;governable workloads&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Unifies models, tools, and memory as &lt;strong&gt;reusable resources&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Makes multi-agent collaboration &lt;strong&gt;structured, observable, and tunable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Brings agent upgrades and iteration into &lt;strong&gt;CI/CD + GitOps&lt;/strong&gt; mode&lt;/li&gt;
&lt;li&gt;Enables enterprises to &lt;strong&gt;manage agents like microservices&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a clear evolution path:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent → Service → Platform → Runtime → Operating System&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;ARK is currently positioned at the fourth stage: Runtime.&lt;/p&gt;
&lt;h2 id="insights-for-agentic-runtime"&gt;Insights for Agentic Runtime&lt;/h2&gt;
&lt;p&gt;ARK provides three direct insights for building Agentic Runtimes:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unified Scheduling System&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The agent runtime must run on a unified scheduling system (Kubernetes, MicroVM, Wasmtime, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Declarative Capability Boundaries&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Must use declarative abstractions to split capability boundaries, including:
&lt;ul&gt;
&lt;li&gt;Model Layer&lt;/li&gt;
&lt;li&gt;Tool Layer&lt;/li&gt;
&lt;li&gt;Memory Layer&lt;/li&gt;
&lt;li&gt;Workflow Layer&lt;/li&gt;
&lt;li&gt;Team Layer&lt;/li&gt;
&lt;li&gt;State Layer&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Observability is essential; otherwise, multi-agent systems cannot be engineered
&lt;ul&gt;
&lt;li&gt;Langfuse&lt;/li&gt;
&lt;li&gt;OTel&lt;/li&gt;
&lt;li&gt;Logs / Events&lt;/li&gt;
&lt;li&gt;Structured JSON&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ARK demonstrates a direction:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multi-agent systems are an engineering problem, not a prompt engineering problem.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;If you only need to build a simple agent, frameworks like LangChain, CrewAI, and AutoGPT are sufficient.&lt;/p&gt;
&lt;p&gt;But if you want to operate a system composed of dozens or hundreds of agents that need to collaborate, run long-term, and support continuous delivery and governance, runtimes like ARK are the inevitable trend.&lt;/p&gt;
&lt;p&gt;It provides Agentic AI with:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A cloud-native runtime model&lt;/li&gt;
&lt;li&gt;Observable execution paths&lt;/li&gt;
&lt;li&gt;Governable abstraction layers&lt;/li&gt;
&lt;li&gt;Extensible, componentized architecture&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, ARK deserves to be regarded as an early model for engineering multi-agent systems.&lt;/p&gt;</content:encoded></item><item><title>Can Open Source Suddenly Disappear? An AI Chat Dev Tool Went 404 Overnight</title><link>https://jimmysong.io/blog/ai-project-lunary-404/</link><pubDate>Thu, 11 Dec 2025 05:20:12 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-project-lunary-404/</guid><description>Lunary, an open-source project in the AI DevTool space, suddenly deleted its GitHub repo, exposing the instability of commercial open source projects.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Open source&amp;rdquo; in the AI era is no longer a trustworthy promise. Commercial projects can withdraw their code at any time, and developers must be wary of the gap between appearances and reality.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-disappearance-of-lunarys-repository-a-real-case-of-open-source-vanishing"&gt;The Disappearance of Lunary&amp;rsquo;s Repository: A Real Case of Open Source &amp;ldquo;Vanishing&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;While updating the AI open source project library on my website, I encountered a situation that left me stunned for the first time:
An &amp;ldquo;open-source AI tool&amp;rdquo; that still promotes itself, with an active website and commercial services, suddenly vanished from GitHub—its repository went straight to 404.&lt;/p&gt;
&lt;p&gt;The project is called Lunary.&lt;/p&gt;
&lt;p&gt;Original repository address:
&lt;a href="https://github.com/lunary-ai/lunary" target="_blank" rel="noopener"&gt;https://github.com/lunary-ai/lunary&lt;/a&gt;
It now returns a 404 Not Found.&lt;/p&gt;
&lt;p&gt;Notably, the official site lunary.ai remains online, but the core promise of an &amp;ldquo;open-source codebase&amp;rdquo; has disappeared.&lt;/p&gt;
&lt;h2 id="lunarys-positioning-and-features"&gt;Lunary&amp;rsquo;s Positioning and Features&lt;/h2&gt;
&lt;p&gt;Here is an overview of Lunary&amp;rsquo;s main features and positioning to help understand its role in the AI tool ecosystem.&lt;/p&gt;
&lt;p&gt;Lunary claims to be an Observability and Evaluations platform for large language model (LLM, Large Language Model) applications, focusing on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM conversation and feedback logs&lt;/li&gt;
&lt;li&gt;Cost, latency, and metrics analysis&lt;/li&gt;
&lt;li&gt;Prompt version management&lt;/li&gt;
&lt;li&gt;Distributed tracing&lt;/li&gt;
&lt;li&gt;Evaluations&lt;/li&gt;
&lt;li&gt;Supports both self-hosted and managed modes&lt;/li&gt;
&lt;li&gt;Provides JS / Python SDKs, integrates with LangChain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Its overall positioning is clear:
&amp;ldquo;Development and debugging tools for AI applications.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In fact, products like this have emerged rapidly over the past year, forming a new AI DevTool track.&lt;/p&gt;
&lt;h2 id="the-reality-and-risks-behind-the-open-source-label"&gt;The Reality and Risks Behind the &amp;ldquo;Open Source&amp;rdquo; Label&lt;/h2&gt;
&lt;p&gt;The core issue is not the tool itself, but its claim to be &amp;ldquo;open source.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Lunary has consistently emphasized:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;Lunary is an open-source platform for developers.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This statement is great for attracting users, as open source implies transparency, trustworthiness, self-hosting, and community participation.&lt;/p&gt;
&lt;p&gt;But now the repository is gone, with only the website continuing its promotion—raising many questions.&lt;/p&gt;
&lt;p&gt;Lunary is not a niche hobby project, but a commercial company-led initiative. If an individual suddenly deletes a repo, it&amp;rsquo;s not surprising, but for a company operating publicly, this move is extremely rare.&lt;/p&gt;
&lt;p&gt;This is the first time I&amp;rsquo;ve truly seen a reality in the AI DevTools space: &amp;ldquo;Open source&amp;rdquo; is being used as a branding term, not a commitment.&lt;/p&gt;
&lt;h2 id="possible-industry-reasons-for-repo-deletion"&gt;Possible Industry Reasons for Repo Deletion&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s analyze some common industry reasons for deleting a repository to help developers understand the motivations behind such actions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Increased commercial pressure&lt;/strong&gt;: These tools often struggle with sustainable business models, prompting teams to shift to closed-source SaaS.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pivoting&lt;/strong&gt;: The company finds the original direction unprofitable and prepares to change course.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Team changes&lt;/strong&gt;: Acquisition, key member departures, or funding issues can all lead to repo shutdowns.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compliance or legal risks&lt;/strong&gt;: Observability products involve user data, which may require public code to be taken down.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regardless of the reason, the impact on users is the same: it is no longer an &amp;ldquo;open-source product.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="the-pseudo-open-source-phenomenon-in-ai-tools"&gt;The &amp;ldquo;Pseudo Open Source&amp;rdquo; Phenomenon in AI Tools&lt;/h2&gt;
&lt;p&gt;The most noteworthy aspect is not Lunary itself, but the rapid spread of this phenomenon in the AI tool space.&lt;/p&gt;
&lt;p&gt;Many projects use &amp;ldquo;open source&amp;rdquo; as a user acquisition strategy but lack open governance and long-term commitment.&lt;/p&gt;
&lt;p&gt;High substitutability, homogeneity, and commercial pressure mean these DevTools have low survival rates.&lt;/p&gt;
&lt;p&gt;When commercial teams lead open source, a single decision can make the repository disappear instantly.&lt;/p&gt;
&lt;p&gt;In the cloud native era, we&amp;rsquo;ve already seen a wave of &amp;ldquo;pseudo open source.&amp;rdquo; In the AI era, this trend is accelerating.&lt;/p&gt;
&lt;h2 id="three-practical-lessons-for-developers"&gt;Three Practical Lessons for Developers&lt;/h2&gt;
&lt;p&gt;Based on this case, here are three practical lessons for developers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The &amp;ldquo;open source label&amp;rdquo; does not guarantee trustworthiness&lt;/strong&gt;: Open source projects led by commercial companies without community or foundation backing can be withdrawn at any time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI DevTools are far less stable than infrastructure&lt;/strong&gt;: These tools are not essential, highly replaceable, and have short lifecycles.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tool usability should take precedence over &amp;ldquo;open source status&amp;rdquo;&lt;/strong&gt;: Because it may stop being open source at any moment.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="my-first-experience-maintaining-an-ai-project-list-and-facing-repo-deletion"&gt;My First Experience Maintaining an AI Project List and Facing Repo Deletion&lt;/h2&gt;
&lt;p&gt;After collecting hundreds of projects over the past two years, this is the first time I&amp;rsquo;ve encountered a &amp;ldquo;commercial open source project disappearing, official repo 404&amp;rdquo; case.&lt;/p&gt;
&lt;p&gt;To me, this is an industry signal: the AI open source world is entering a period of drift, and commercial projects&amp;rsquo; open source commitments are increasingly unstable.&lt;/p&gt;
&lt;p&gt;It also reminds everyone making technical choices: in the AI era, open source is no longer a label you can automatically trust.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The disappearance of the Lunary repository is not an isolated incident, but a reflection of the &amp;ldquo;pseudo open source&amp;rdquo; phenomenon in the AI tool space. Developers should be cautious about the actual commitments behind the &amp;ldquo;open source&amp;rdquo; label, paying attention to project governance and sustainability. In the future, the boundary between open source and commercial will become even more blurred, and rational judgment and risk awareness will be essential for technical decision-making.&lt;/p&gt;
&lt;p&gt;Lunary&amp;rsquo;s sudden disappearance highlights the instability of open source projects in the AI DevTools space. For developers, technical choices should focus more on project usability and community governance, rather than relying solely on the &amp;ldquo;open source&amp;rdquo; label. As the industry evolves, similar incidents may become more frequent. Only rational judgment and risk awareness can help you stand firm in the fast-changing tech landscape.&lt;/p&gt;</content:encoded></item><item><title>CNCF in the AI Native Era? The Agentic AI Foundation Is Officially Established</title><link>https://jimmysong.io/blog/agentic-ai-foundation-cncf-era/</link><pubDate>Wed, 10 Dec 2025 03:25:38 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/agentic-ai-foundation-cncf-era/</guid><description>An analysis of the background, strategic urgency, differences and division of labor between Agentic AI Foundation (AAIF) and CNCF/CNAI, and its significance for the AI Native era.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The standardization and open collaboration of the agent ecosystem is no longer a luxury, but the critical watershed for whether AI Native can be engineered and implemented.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;The establishment of &lt;a href="https://aaif.io/" target="_blank" rel="noopener"&gt;AAIF (Agentic AI Foundation)&lt;/a&gt; is the result of leading vendors staking out the &amp;ldquo;agent protocol layer&amp;rdquo; in advance.&lt;/li&gt;
&lt;li&gt;The real challenge is not technical, but how organizations transition from &amp;ldquo;human execution + AI assistance&amp;rdquo; to &amp;ldquo;agent execution + human supervision&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Successful agent adoption requires a phased adoption path, not just a bunch of protocols and demos.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cncf.io/" target="_blank" rel="noopener"&gt;CNCF&lt;/a&gt; and AAIF are complementary: CNCF manages &amp;ldquo;what infrastructure agents run on&amp;rdquo;, AAIF manages &amp;ldquo;how agents collaborate&amp;rdquo;. This matches the system I am building in &lt;a href="https://arksphere.dev/" target="_blank" rel="noopener"&gt;ArkSphere&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="cloud-native-problems-are-solved-ai-native-problems-are-just-beginning"&gt;Cloud Native Problems Are Solved, AI Native Problems Are Just Beginning&lt;/h2&gt;
&lt;p&gt;Over the past decade, Cloud Native technologies like Kubernetes, Service Mesh, and microservices have standardized &amp;ldquo;how applications run in the cloud&amp;rdquo;.
But AI Native faces a completely different challenge:
&lt;strong&gt;It&amp;rsquo;s not about &amp;ldquo;how to deploy a service&amp;rdquo;, but &amp;ldquo;how many behaviors in the system can be handed over to agents to execute themselves&amp;rdquo;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;CNCF&amp;rsquo;s Cloud Native AI (CNAI) addresses infrastructure-level issues:
&amp;ldquo;How can model training/inference/RAG run at scale and securely on Kubernetes?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But what AI Native truly lacks is another layer:
&lt;strong&gt;How do agents collaborate, access tools, get governed, and audited?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is exactly the gap AAIF aims to fill.&lt;/p&gt;
&lt;h2 id="aaifs-three-weapons-protocol--runtime--development-standard"&gt;AAIF&amp;rsquo;s Three Weapons: Protocol + Runtime + Development Standard&lt;/h2&gt;
&lt;p&gt;AAIF hosts three core technologies contributed by its founding members:&lt;/p&gt;
&lt;h3 id="-anthropics-model-context-protocol-mcp"&gt;① Anthropic&amp;rsquo;s Model Context Protocol (MCP)&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/modelcontextprotocol" target="_blank" rel="noopener"&gt;https://github.com/modelcontextprotocol&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A &amp;ldquo;system call interface for agents&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unified definition for how agents access databases, APIs, files, and external tools.&lt;/li&gt;
&lt;li&gt;Designed to be more like an AI version of gRPC + OAuth.&lt;/li&gt;
&lt;li&gt;Already integrated by Claude, Cursor, ChatGPT, VS Code, Microsoft Copilot, Gemini, and others.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It may not be the flashiest technology, but it could become the plumbing for the entire Agentic ecosystem.&lt;/p&gt;
&lt;h3 id="-blocks-goose-framework"&gt;② Block&amp;rsquo;s Goose Framework&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/block/goose" target="_blank" rel="noopener"&gt;https://github.com/block/goose&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Reference runtime for MCP:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local-first, composable agent workflow engine.&lt;/li&gt;
&lt;li&gt;Enables enterprises to pilot agents in small scopes without betting on a specific vendor.&lt;/li&gt;
&lt;li&gt;Serves as an engineering template for protocol implementation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="-openais-agentsmd"&gt;③ OpenAI&amp;rsquo;s AGENTS.md&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://agents.md/" target="_blank" rel="noopener"&gt;https://agents.md&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A simple but effective standard:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Place an AGENTS.md file in the project repository.&lt;/li&gt;
&lt;li&gt;Clearly document build steps, testing, constraints, and context rules.&lt;/li&gt;
&lt;li&gt;Any agent that understands AGENTS.md can operate the codebase using the same instructions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes agent behavior more predictable and auditable.&lt;/p&gt;
&lt;h2 id="why-is-aaif-in-such-a-hurry-this-is-a-race-for-standards"&gt;Why Is AAIF in Such a Hurry? This Is a Race for Standards&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s compare with history:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes&amp;rsquo; predecessor Borg ran internally at Google for over a decade; K8s was open sourced and donated to CNCF two years later.&lt;/li&gt;
&lt;li&gt;PyTorch joined the Linux Foundation six years after its release.&lt;/li&gt;
&lt;li&gt;MCP was donated to AAIF just &lt;strong&gt;over one year&lt;/strong&gt; after its launch.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AAIF is not about &amp;ldquo;mature technology entering a foundation&amp;rdquo;, but &lt;strong&gt;staking out the key position early&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The reasons are practical:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Prevent agent ecosystem fragmentation&lt;/strong&gt;
Today, there are many competing &amp;ldquo;tool invocation protocols&amp;rdquo;, which could become incompatible silos in three years.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Protocol layer is easier to reach global consensus than model layer&lt;/strong&gt;
Model competition is inevitable, but protocols can be standardized, open sourced, and avoid vendor lock-in.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A necessary move in global tech competition&lt;/strong&gt;
Putting the foundational standards for Agentic AI into the Linux Foundation is both a gesture of cooperation and a strategic move.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="aaif-vs-cncf-not-competition-but-two-pieces-of-the-puzzle"&gt;AAIF vs CNCF: Not Competition, But Two Pieces of the Puzzle&lt;/h2&gt;
&lt;p&gt;CNCF&amp;rsquo;s role:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;What infrastructure do agent workloads run on?&amp;rdquo;
Kubernetes, Service Mesh, observability, AI Gateway, RAG Infra—all at this layer.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;AAIF&amp;rsquo;s role:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;How do agents collaborate, invoke tools, and get governed?&amp;rdquo;
Protocols, runtimes, and behavioral standards—all at this layer.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Analogy:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Domain&lt;/th&gt;
&lt;th&gt;Responsibilities&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AAIF&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Semantic and collaboration layer of Agentic Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CNCF/CNAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Resource and execution layer of AI Native Infra&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: AAIF vs CNCF Comparison
&lt;/figcaption&gt;
&lt;p&gt;This matches the upper semantic and lower infrastructure layers in my &lt;a href="https://arksphere.dev/" target="_blank" rel="noopener"&gt;ArkSphere&lt;/a&gt; architecture diagram.&lt;/p&gt;
&lt;p&gt;In the long run, the two sides will be tightly coupled:
CNCF&amp;rsquo;s KServe, KAgent, and AI Gateway will natively support MCP / AGENTS.md,
AAIF&amp;rsquo;s Runtime will run on Cloud Native infrastructure by default.&lt;/p&gt;
&lt;h2 id="the-real-challenge-not-protocols-but-organizations-and-people"&gt;The Real Challenge: Not Protocols, But Organizations and People&lt;/h2&gt;
&lt;p&gt;Most enterprises will get stuck on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How much responsibility can agents actually take?&lt;/li&gt;
&lt;li&gt;Who is accountable when things go wrong?&lt;/li&gt;
&lt;li&gt;How are audit, SLOs, and compliance defined?&lt;/li&gt;
&lt;li&gt;How is multi-agent collaboration visualized?&lt;/li&gt;
&lt;li&gt;How are tool invocation permissions controlled?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, &lt;strong&gt;agent adoption is not a &amp;ldquo;technical migration&amp;rdquo;, but an &amp;ldquo;organizational migration&amp;rdquo;.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;If AAIF cannot provide:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Phased adoption methodologies&lt;/li&gt;
&lt;li&gt;Typical organizational migration paths&lt;/li&gt;
&lt;li&gt;Engineering best practices&lt;/li&gt;
&lt;li&gt;Failure cases and anti-patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It will be difficult for AAIF to achieve the industry impact that CNCF did.&lt;/p&gt;
&lt;h2 id="summary-aaif-is-the-moment-when-boundaries-are-drawn"&gt;Summary: AAIF Is the Moment When Boundaries Are Drawn&lt;/h2&gt;
&lt;p&gt;For me, the establishment of AAIF feels like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;The battlefield boundaries of the agent world have finally been drawn. Now it&amp;rsquo;s up to the engineering community to make it work.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;CNCF solved &amp;ldquo;how to run Cloud Native&amp;rdquo;,
AAIF is now trying to solve &amp;ldquo;how agents collaborate&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;In the next five years, whoever can truly connect these two worlds
will stand at the gateway to the next generation of infrastructure.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s why I started a dedicated &amp;ldquo;Agentic Runtime + AI Native Infra&amp;rdquo; research track in &lt;a href="https://arksphere.dev" target="_blank" rel="noopener"&gt;ArkSphere&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="the-three-body-architecture-of-the-ai-native-era"&gt;The &amp;lsquo;Three-Body&amp;rsquo; Architecture of the AI Native Era&lt;/h2&gt;
&lt;p&gt;Finally, a personal note—my thoughts on ArkSphere.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/agentic-ai-foundation-cncf-era/a2b0ea6c87b10fd78607da5d75c4cd1a.svg" data-img="https://assets.jimmysong.io/images/blog/agentic-ai-foundation-cncf-era/a2b0ea6c87b10fd78607da5d75c4cd1a.svg" alt="Figure 1: AAIF × CNCF: Three-Layer Architecture of Agentic AI in the AI Native Era" data-caption="Figure 1: AAIF × CNCF: Three-Layer Architecture of Agentic AI in the AI Native Era"
width="2677"
height="739"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: AAIF × CNCF: Three-Layer Architecture of Agentic AI in the AI Native Era&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This diagram shows the three-layer structure of the AI Native era:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;CNCF (bottom layer): Provides the Cloud Native foundation required for agent operation, including Kubernetes, Service Mesh, GPU scheduling, and security systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;AAIF (middle layer): Defines the runtime semantics and standards for agents, including the MCP protocol, Goose reference runtime, and AGENTS.md behavioral standard.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ArkSphere (bridging layer): Aligns the &amp;ldquo;Agentic Runtime semantic layer&amp;rdquo; with the &amp;ldquo;AI Native Infra infrastructure layer&amp;rdquo;, forming an engineerable agent architecture standard.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;p&gt;Infra is responsible for &amp;ldquo;running&amp;rdquo;, Runtime for &amp;ldquo;how to act&amp;rdquo;, and ArkSphere for &amp;ldquo;how to assemble a system&amp;rdquo;.&lt;/p&gt;</content:encoded></item><item><title>Bun Acquired by Anthropic: A Structural Signal for AI-Native Runtimes</title><link>https://jimmysong.io/blog/bun-anthropic-runtime-shift/</link><pubDate>Wed, 03 Dec 2025 05:21:28 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/bun-anthropic-runtime-shift/</guid><description>Bun&amp;#39;s acquisition by Anthropic marks the first time a general-purpose language runtime is integrated into a large model engineering system, revealing a structural trend for AI-native runtimes.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The shifting ownership of runtimes is reshaping the underlying logic of AI programming and infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;After the &lt;a href="https://bun.com/blog/bun-joins-anthropic" target="_blank" rel="noopener"&gt;announcement of Bun&amp;rsquo;s acquisition by Anthropic&lt;/a&gt;, my focus was not on the deal itself, but on the structural signal it revealed: general-purpose language runtimes are now being drawn into the path dependencies of AI programming systems. This is not just &amp;ldquo;a JS project finding a home,&amp;rdquo; but &amp;ldquo;the first time a language runtime has been actively integrated into the unified engineering system of a leading large model company.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This event deserves a deeper analysis.&lt;/p&gt;
&lt;h2 id="buns-engineering-features-and-current-status"&gt;Bun&amp;rsquo;s Engineering Features and Current Status&lt;/h2&gt;
&lt;p&gt;Before examining &lt;a href="https://bun.com" target="_blank" rel="noopener"&gt;Bun&lt;/a&gt;&amp;rsquo;s industry significance, let&amp;rsquo;s outline its runtime characteristics. The following list summarizes Bun&amp;rsquo;s main engineering capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High-performance JavaScript/TypeScript runtime&lt;/li&gt;
&lt;li&gt;Built-in bundler, test framework, and package manager&lt;/li&gt;
&lt;li&gt;Single-file executable&lt;/li&gt;
&lt;li&gt;Extremely fast cold start&lt;/li&gt;
&lt;li&gt;Node compatibility without Node&amp;rsquo;s legacy dependencies&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities have formed measurable performance barriers.&lt;/p&gt;
&lt;p&gt;However, it should be noted that Bun currently lacks the core attributes of an AI Runtime, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Permission model&lt;/li&gt;
&lt;li&gt;Tool isolation&lt;/li&gt;
&lt;li&gt;Capability declaration protocol&lt;/li&gt;
&lt;li&gt;Execution semantics understandable by models&lt;/li&gt;
&lt;li&gt;Sandbox execution environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, Bun&amp;rsquo;s &amp;ldquo;AI Native&amp;rdquo; properties have not yet been established, but Anthropic&amp;rsquo;s acquisition provides an opportunity for it to evolve in this direction.&lt;/p&gt;
&lt;h2 id="the-significance-of-a-leading-model-company-acquiring-a-general-purpose-runtime"&gt;The Significance of a Leading Model Company Acquiring a General-Purpose Runtime&lt;/h2&gt;
&lt;p&gt;Historically, it is not uncommon for model companies to acquire editors, plugins, or IDEs, but in known public cases, mainstream large model vendors have never directly acquired a mature general-purpose language runtime. Bun × Anthropic is the first clear event pulling the runtime into the AI programming system landscape. This move sends two engineering-level signals:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The speed of AI code generation continues to increase, amplifying the need for deterministic execution environments. The generate→execute→validate→destroy cycle intensifies the problem of environment non-repeatability.&lt;/li&gt;
&lt;li&gt;Models require a &amp;ldquo;controllable execution substrate&amp;rdquo; rather than a traditional operating system. Agents are not suited to run tools in an uncontrollable, unpredictable OS layer.&lt;/li&gt;
&lt;li&gt;The runtime needs to be embedded into the model&amp;rsquo;s internal engineering pipeline. Future IDEs, agents, and auto-repair pipelines may directly invoke the runtime&amp;rsquo;s API.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is not a short-term business integration, but a manifestation of the trend toward compressed engineering pipelines.&lt;/p&gt;
&lt;h2 id="runtime-requirements-differentiation-in-the-ai-coding-era"&gt;Runtime Requirements Differentiation in the AI Coding Era&lt;/h2&gt;
&lt;p&gt;Based on observations of agentic runtimes over the past year, runtime requirements in the AI coding era are diverging. The following list summarizes the main engineering abstractions trending in this space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Determinism: AI-generated code is not reviewed line by line; execution results must be consistent across machines and over time.&lt;/li&gt;
&lt;li&gt;Minimal distribution unit: Users no longer install language environments and numerous dependencies. Verifiable, replicable, and portable single execution units are becoming the norm.&lt;/li&gt;
&lt;li&gt;Tool isolation: Models cannot directly access all OS capabilities; the context and permissions visible to tools must be strictly defined.&lt;/li&gt;
&lt;li&gt;Short-lived execution: Agent invocation patterns resemble &amp;ldquo;batch jobs&amp;rdquo; rather than long-running services.&lt;/li&gt;
&lt;li&gt;Capability declaration: The runtime must expose &amp;ldquo;what I can do,&amp;rdquo; rather than the entire OS interface.&lt;/li&gt;
&lt;li&gt;Embeddable self-testing pipeline: After generating code, models need to immediately execute tests, collect errors, and iterate. The runtime must provide observability and diagnostic primitives.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These requirements are not unique to Bun, nor did Bun originate them, but Bun&amp;rsquo;s &amp;ldquo;monolithic and controllable&amp;rdquo; runtime structure is more conducive to evolving in this direction.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/bun-anthropic-runtime-shift/527ff200956d6b73178a0e3521f42fc2.svg" data-img="https://assets.jimmysong.io/images/blog/bun-anthropic-runtime-shift/527ff200956d6b73178a0e3521f42fc2.svg" alt="Figure 1: Minimal execution loop of an AI-native runtime" data-caption="Figure 1: Minimal execution loop of an AI-native runtime"
width="599"
height="1105"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Minimal execution loop of an AI-native runtime&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="buns-potential-role-within-anthropics-system"&gt;Bun&amp;rsquo;s Potential Role Within Anthropic&amp;rsquo;s System&lt;/h2&gt;
&lt;p&gt;If Bun is seen merely as a Node.js replacement, the acquisition is of limited significance. But if it is viewed as the execution foundation for future AI coding systems, the logic becomes clearer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code is generated by models&lt;/li&gt;
&lt;li&gt;Building is handled by the runtime&amp;rsquo;s built-in toolchain&lt;/li&gt;
&lt;li&gt;Testing, validation, and repair are performed by models repeatedly invoking the runtime&lt;/li&gt;
&lt;li&gt;All execution behaviors are defined by the runtime&amp;rsquo;s semantics&lt;/li&gt;
&lt;li&gt;The runtime forms Anthropic&amp;rsquo;s internal &amp;ldquo;minimal stable layer&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This model is similar to the relationship between Chrome and V8: the execution engine and upper-layer system co-evolve over time, with performance and semantics advancing in sync.&lt;/p&gt;
&lt;p&gt;Whether Bun can fulfill this role depends on Anthropic&amp;rsquo;s architectural choices, but the event itself has opened up possibilities in this direction.&lt;/p&gt;
&lt;h2 id="industry-trends-and-future-evolution"&gt;Industry Trends and Future Evolution&lt;/h2&gt;
&lt;p&gt;Combining facts, signals, and engineering trends, the following directions can be anticipated:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &amp;ldquo;Agent Runtime&amp;rdquo; category will gradually become more defined&lt;/li&gt;
&lt;li&gt;The boundaries between bundler, runtime, and test runner will continue to blur&lt;/li&gt;
&lt;li&gt;Cloud vendors will launch controllable runtimes with capability declarations&lt;/li&gt;
&lt;li&gt;Permission models and secure sandboxes will move down to the language runtime layer&lt;/li&gt;
&lt;li&gt;Runtimes will become part of the model toolchain, rather than an external environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These trends will not all materialize in the short term, but they represent the inevitable path of engineering evolution.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The combination of Bun × Anthropic is not about &amp;ldquo;an open-source project being absorbed,&amp;rdquo; but about a language runtime being actively integrated into the engineering pipeline of a large model system for the first time. Competition at the model layer will continue, but what truly reshapes software is the structural transformation of AI-native runtimes. This is a foundational change worth long-term attention.&lt;/p&gt;</content:encoded></item><item><title>Agentic Runtime Realism: Insights from McKinsey Ark on 2026 Infrastructure Trends</title><link>https://jimmysong.io/blog/agentic-runtime-realism/</link><pubDate>Tue, 02 Dec 2025 12:07:45 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/agentic-runtime-realism/</guid><description>Analyzing Ark from architecture, semantics, community activity, and engineering paradigms to reveal its impact on 2026 AI Infra trends and the ArkSphere community.</description><content:encoded>
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Statement
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
ArkSphere has no affiliation or association with McKinsey Ark.
&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;The value of Agentic Runtime lies not in unified interfaces, but in semantic governance and the transformation of engineering paradigms. Ark is just a reflection of the trend; the future belongs to governable Agentic Workloads.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently, the &lt;a href="https://jimmysong.io/en/community"&gt;ArkSphere community&lt;/a&gt; has been focusing on McKinsey&amp;rsquo;s open-source &lt;a href="https://github.com/mckinsey/agents-at-scale-ark" target="_blank" rel="noopener"&gt;Ark&lt;/a&gt; (Agentic Runtime for Kubernetes). Although the project is still in technical preview, its architecture and semantic model have already become key indicators for the direction of AI Infra in 2026.&lt;/p&gt;
&lt;p&gt;This article analyzes the engineering paradigm and semantic model of Ark, highlighting its industry implications. It avoids repeating the reasons for the failure of unified model APIs and generic infrastructure logic, instead focusing on the unique perspective of the ArkSphere community.&lt;/p&gt;
&lt;h2 id="arks-semantic-model-and-engineering-paradigm"&gt;Ark&amp;rsquo;s Semantic Model and Engineering Paradigm&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s greatest value is in making Agents first-class citizens in Kubernetes, achieving closed-loop tasks through CRD (Custom Resource Definition) and controllers (Reconcilers). This semantic abstraction not only enhances governance capabilities but also aligns closely with the Agentic Runtime strategies of major cloud providers.&lt;/p&gt;
&lt;p&gt;Ark&amp;rsquo;s main resources include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agent (inference entity)&lt;/li&gt;
&lt;li&gt;Model (model selection and configuration)&lt;/li&gt;
&lt;li&gt;Tools (capability plugins/MCP, Model Capability Plugin)&lt;/li&gt;
&lt;li&gt;Team (multi-agent collaboration)&lt;/li&gt;
&lt;li&gt;Query (task lifecycle)&lt;/li&gt;
&lt;li&gt;Evaluation (assessment)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The diagram below illustrates the semantic relationships in Agentic Runtime:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/2d76bfcb312694080bd94942b084f210.svg" data-img="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/2d76bfcb312694080bd94942b084f210.svg" alt="Figure 1: Agentic Runtime Semantic Relationships" data-caption="Figure 1: Agentic Runtime Semantic Relationships"
width="1450"
height="589"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Agentic Runtime Semantic Relationships&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="architecture-and-community-activity"&gt;Architecture and Community Activity&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s architecture adopts a standard control plane system, emphasizing unified runtime semantics. The community is highly active, engineer-driven, and the codebase is well-structured, though production readiness is still being improved.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/f481241843db17b6e6172e8093a1daa6.svg" data-img="https://assets.jimmysong.io/images/blog/agentic-runtime-realism/f481241843db17b6e6172e8093a1daa6.svg" alt="Figure 2: Ark Architecture and Control Plane Flow" data-caption="Figure 2: Ark Architecture and Control Plane Flow"
width="4130"
height="565"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Ark Architecture and Control Plane Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="arkspheres-boundaries-and-inspirations"&gt;ArkSphere&amp;rsquo;s Boundaries and Inspirations&lt;/h2&gt;
&lt;p&gt;The emergence of Ark has clarified the boundaries of ArkSphere. ArkSphere does not aim for unified model interfaces, multi-cloud abstraction, a collection of miscellaneous tools, or a comprehensive framework layer. Instead, it focuses on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The semantic system of Agentic Runtime (tasks, states, tool invocation, collaboration graphs, etc.)&lt;/li&gt;
&lt;li&gt;Enterprise-grade runtime governance models (permissions, auditing, isolation, multi-tenancy, compliance, cost tracking)&lt;/li&gt;
&lt;li&gt;Integration capabilities for domestic ecosystem tools&lt;/li&gt;
&lt;li&gt;Engineering paradigms from a runtime perspective&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ArkSphere is an ecosystem and engineering system at the runtime level, not a &amp;ldquo;model abstraction layer&amp;rdquo; or an &amp;ldquo;agent development framework.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="key-changes-in-2026"&gt;Key Changes in 2026&lt;/h2&gt;
&lt;p&gt;2026 will usher in the era of Agentic Runtime, where Agents are no longer just classes but workloads that require governance rather than mere importation. Ark is just one example of this trend, and the direction is clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Semantic models and governability become highlights&lt;/li&gt;
&lt;li&gt;Closed-loop tasks are the core value&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s realism teaches us that the future belongs to runtime, semantics, governability, and workload-level Agents. The industry will no longer pursue unified APIs or framework implementations, but will focus on governable runtime semantics and engineering paradigms.&lt;/p&gt;</content:encoded></item><item><title>In-Depth Analysis of Ark: Kubernetes for the AI Era or a New Engineering Paradigm Shift?</title><link>https://jimmysong.io/blog/ark-agentic-runtime-analysis/</link><pubDate>Tue, 02 Dec 2025 10:54:34 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ark-agentic-runtime-analysis/</guid><description>Analysis of McKinsey&amp;#39;s Ark project: architecture, CRDs, control plane, design paradigms, production readiness, and implications for ArkSphere and AI infrastructure.</description><content:encoded>
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Statement
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
ArkSphere has no affiliation or association with McKinsey Ark.
&lt;/div&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;The greatest value of Ark lies in reshaping engineering paradigms, not just its features. It points the way for AI Infra and leaves vast space for community ecosystems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently, many members in our &lt;a href="https://arksphere.dev/" target="_blank" rel="noopener"&gt;ArkSphere community&lt;/a&gt; have started exploring McKinsey&amp;rsquo;s open-source &lt;a href="https://github.com/mckinsey/agents-at-scale-ark" target="_blank" rel="noopener"&gt;Ark (Agentic Runtime for Kubernetes)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some see it as radical, some think it&amp;rsquo;s just a consulting firm&amp;rsquo;s experiment, and others quote a realistic maxim:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;What we need now is &amp;ldquo;agentic runtime realism,&amp;rdquo; not &amp;ldquo;unified model romanticism.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I strongly agree with this sentiment.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve spent some time analyzing Ark&amp;rsquo;s source code, architecture, and design philosophy, combined with our community discussions. My conclusion is:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ark&amp;rsquo;s significance is not in its features, but in its paradigm.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;It&amp;rsquo;s not the answer, but it points toward the future of AI Infra.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Below is my interpretation of Ark, focusing on engineering, architecture, trends, and its inspiration for ArkSphere.&lt;/p&gt;
&lt;h2 id="what-exactly-is-ark"&gt;What Exactly Is Ark?&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s core positioning is: &lt;strong&gt;A runtime that treats Agents as Kubernetes Workloads.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not a framework, not an SDK, not an AutoGen-style multi-agent tool, but a complete system including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Control plane (Controller)&lt;/li&gt;
&lt;li&gt;Custom resource models (CRD, Custom Resource Definition)&lt;/li&gt;
&lt;li&gt;API service&lt;/li&gt;
&lt;li&gt;Dashboard&lt;/li&gt;
&lt;li&gt;CLI&lt;/li&gt;
&lt;li&gt;Python SDK&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Essentially, Ark is the &lt;strong&gt;control plane for Agents&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Ark defines seven core CRDs in Kubernetes. The following flowchart shows the relationships among these resources:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/df35874e6886350db30fdf036a118099.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/df35874e6886350db30fdf036a118099.svg" alt="Figure 1: Ark CRD Resource Relationships" data-caption="Figure 1: Ark CRD Resource Relationships"
width="816"
height="557"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Ark CRD Resource Relationships&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Through this set of CRDs, Ark makes Agent systems resource-oriented and declarative, enabling capabilities such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lifecycle management&lt;/li&gt;
&lt;li&gt;Multi-tenant isolation&lt;/li&gt;
&lt;li&gt;RBAC (Role-Based Access Control)&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Upgradability&lt;/li&gt;
&lt;li&gt;Extensibility (tools, models, MCP)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, Ark is not about &amp;ldquo;how to write Agents,&amp;rdquo; but &amp;ldquo;how to operate Agents in enterprise-grade systems.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="three-layer-architecture-mixed-languages-and-components-but-a-complete-system"&gt;Three-Layer Architecture: Mixed Languages and Components, but a Complete System&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s overall architecture is divided into three layers, each with different tech stacks and responsibilities. The following flowchart illustrates the relationships among components in each layer:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/f6ccd732f54e5aa2a0ca3f6283103eb3.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/f6ccd732f54e5aa2a0ca3f6283103eb3.svg" alt="Figure 2: Ark Three-Layer Architecture Components" data-caption="Figure 2: Ark Three-Layer Architecture Components"
width="1532"
height="806"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Ark Three-Layer Architecture Components&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is not a &amp;ldquo;wrapper project,&amp;rdquo; but a fully operational AI Runtime system, with a level of engineering far beyond most agent frameworks on the market.&lt;/p&gt;
&lt;h2 id="is-it-the-kubernetes-of-the-ai-era"&gt;Is It the Kubernetes of the AI Era?&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s revisit Kubernetes&amp;rsquo; core value:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes was never about &amp;ldquo;unifying cloud APIs&amp;rdquo;; it unified the &amp;ldquo;application runtime model.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Cloud provider APIs aren&amp;rsquo;t unified, nor are networking or storage. What is unified: Pod, Deployment, Service—these application models.&lt;/p&gt;
&lt;p&gt;Kubernetes succeeded because:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It provides a stable application abstraction on top of diversity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Ark&amp;rsquo;s goal is not to unify all large language models (LLMs), MCPs, or tool formats, but rather:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Agent resource model (CRD) + control plane (Reconciler) + lifecycle.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From this perspective, Ark offers a prototype of a &amp;ldquo;declarative application model&amp;rdquo; for the AI era.&lt;/p&gt;
&lt;p&gt;Whether it will become &amp;ldquo;Kubernetes for AI&amp;rdquo; is still too early to say, but it has already planted a seed.&lt;/p&gt;
&lt;h2 id="comparison-with-other-frameworks-not-on-the-same-level"&gt;Comparison with Other Frameworks: Not on the Same Level&lt;/h2&gt;
&lt;p&gt;Current mainstream agent frameworks like LangChain, CrewAI, AutoGen, MetaGPT, etc., address problems fundamentally different from Ark.&lt;/p&gt;
&lt;p&gt;The table below compares the positioning and limitations of each framework:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;What Problem Does It Solve&lt;/th&gt;
&lt;th&gt;Core Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;td&gt;Agent/Tool composition&lt;/td&gt;
&lt;td&gt;Doesn&amp;rsquo;t address deployment or governance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen&lt;/td&gt;
&lt;td&gt;Multi-agent conversations&lt;/td&gt;
&lt;td&gt;Lacks control plane and lifecycle&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;Workflow-style multi-agent&lt;/td&gt;
&lt;td&gt;Missing scheduling, RBAC, resource model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MetaGPT&lt;/td&gt;
&lt;td&gt;Agent SOP&lt;/td&gt;
&lt;td&gt;Just execution logic, not a platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenDevin&lt;/td&gt;
&lt;td&gt;AI IDE/Dev Assistant&lt;/td&gt;
&lt;td&gt;Not an Agent Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ark&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Agent control plane + resource system&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Functionality not yet mature&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Mainstream Agent Frameworks vs. Ark
&lt;/figcaption&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Other tools focus on &amp;ldquo;how to write Agents.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Ark focuses on &amp;ldquo;how Agents run, schedule, govern, observe, and extend.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That&amp;rsquo;s an architectural difference.&lt;/p&gt;
&lt;h2 id="execution-flow-agents-scheduled-like-pods"&gt;Execution Flow: Agents Scheduled Like Pods&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s execution flow closely resembles the Kubernetes controller model. The following sequence diagram shows the core process:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/1ce387835a38f3380734332ea9e769f7.svg" data-img="https://assets.jimmysong.io/images/blog/ark-agentic-runtime-analysis/1ce387835a38f3380734332ea9e769f7.svg" alt="Figure 3: Ark Agent Execution Flow" data-caption="Figure 3: Ark Agent Execution Flow"
width="961"
height="553"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Ark Agent Execution Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;You can see Ark&amp;rsquo;s process logic is transparent, with a clear engineering path, bringing agent systems into a &amp;ldquo;controllable&amp;rdquo; state.&lt;/p&gt;
&lt;h2 id="production-readiness-right-direction-still-a-tech-preview"&gt;Production Readiness: Right Direction, Still a Tech Preview&lt;/h2&gt;
&lt;p&gt;According to official notes and code maturity, Ark currently offers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Runnable&lt;/li&gt;
&lt;li&gt;Learnable&lt;/li&gt;
&lt;li&gt;Extensible&lt;/li&gt;
&lt;li&gt;But not recommended for large-scale production use yet&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Main reasons include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CRD structures may change&lt;/li&gt;
&lt;li&gt;APIs are not yet stable&lt;/li&gt;
&lt;li&gt;MCP ecosystem is still forming&lt;/li&gt;
&lt;li&gt;Memory service is still basic&lt;/li&gt;
&lt;li&gt;Multi-agent team execution strategies are primitive&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the engineering system is already taking shape, which is crucial.&lt;/p&gt;
&lt;h2 id="community-activity-small-but-elite-strong-mckinsey-drive"&gt;Community Activity: Small but Elite, Strong McKinsey Drive&lt;/h2&gt;
&lt;p&gt;From GitHub data:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stars: 222&lt;/li&gt;
&lt;li&gt;Forks: 50&lt;/li&gt;
&lt;li&gt;Contributors: 48&lt;/li&gt;
&lt;li&gt;Commit frequency is steady&lt;/li&gt;
&lt;li&gt;The vast majority of contributions come from within McKinsey&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Note: Data as of December 2, 2025.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;High stability, but limited openness.&lt;/p&gt;
&lt;p&gt;This is also ArkSphere&amp;rsquo;s opportunity:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The paradigm is right, but the ecosystem needs community-driven growth.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="trends-for-2026-from-framework-era-to-runtime-era"&gt;Trends for 2026: From Framework Era to Runtime Era&lt;/h2&gt;
&lt;p&gt;After deep analysis, I&amp;rsquo;m increasingly convinced:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2023–2024: Large model API call era&lt;/li&gt;
&lt;li&gt;2024–2025: Agent framework era&lt;/li&gt;
&lt;li&gt;2025–2027: Agent Runtime / Control Plane era (Ark&amp;rsquo;s direction)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While everyone is writing Python scripts for agents, the real value lies in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multi-agent task scheduling&lt;/li&gt;
&lt;li&gt;Tool registration and governance&lt;/li&gt;
&lt;li&gt;Session/Memory lifecycle&lt;/li&gt;
&lt;li&gt;Result reproducibility&lt;/li&gt;
&lt;li&gt;RBAC, auditing, tenant isolation&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Enterprise internal personalized agent systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ark is providing a practical path forward.&lt;/p&gt;
&lt;h2 id="inspiration-for-arksphere"&gt;Inspiration for ArkSphere&lt;/h2&gt;
&lt;p&gt;Ark&amp;rsquo;s inspiration for ArkSphere is both critical and direct:&lt;/p&gt;
&lt;h3 id="arksphere-should-focus-on-paradigm-building-not-feature-stacking"&gt;ArkSphere Should Focus on &amp;ldquo;Paradigm Building,&amp;rdquo; Not &amp;ldquo;Feature Stacking&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Ark offers a prototype for future Agentic Runtime:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Resource model&lt;/li&gt;
&lt;li&gt;Control plane&lt;/li&gt;
&lt;li&gt;Tool registration&lt;/li&gt;
&lt;li&gt;Multi-agent collaboration&lt;/li&gt;
&lt;li&gt;Evaluation and governance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ArkSphere&amp;rsquo;s role should be:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Aggregate paradigms, produce standards, incubate ecosystems, not rewrite Ark itself.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is the &amp;ldquo;CNCF (Cloud Native Computing Foundation) for the AI-native era.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="huge-potential-for-localization-in-china"&gt;Huge Potential for Localization in China&lt;/h3&gt;
&lt;p&gt;Localization opportunities include but are not limited to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Integration with domestic large language models (e.g., Qwen, DeepSeek, Zhipu)&lt;/li&gt;
&lt;li&gt;Enterprise privatization scenarios&lt;/li&gt;
&lt;li&gt;Local tool/MCP discovery ecosystem&lt;/li&gt;
&lt;li&gt;Multi-cluster/edge inference&lt;/li&gt;
&lt;li&gt;Enterprise-grade RBAC, auditing, data isolation&lt;/li&gt;
&lt;li&gt;AgentSpec enhancements for industrial scenarios&lt;/li&gt;
&lt;li&gt;Enhanced versions of Runtime/Controller&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ark solves the &amp;ldquo;model,&amp;rdquo; while ArkSphere can solve the &amp;ldquo;ecosystem.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 id="what-we-need-is-not-kubernetes-for-the-llm-era-but-an-industry-grade-cognition-system-for-ai-runtime"&gt;What We Need Is Not &amp;ldquo;Kubernetes for the LLM Era,&amp;rdquo; But an &amp;ldquo;Industry-Grade Cognition System for AI Runtime&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;The biggest takeaway from dissecting Ark:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The future of AI-native is not a pile of tools, but an engineering system.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;ArkSphere can be the initiator of this system.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Ark is not a &amp;ldquo;universal runtime,&amp;rdquo; nor is it the &amp;ldquo;ultimate Kubernetes for the AI era.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But it has done one crucial thing right:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;It abstracts all the pain points people faced when writing Python agent scripts into Kubernetes resources and controllers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It represents engineering, not just a demo.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not mature yet, but it&amp;rsquo;s heading in the right direction.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not the end, but it gives us a clear roadmap.&lt;/p&gt;
&lt;p&gt;For the ArkSphere community I&amp;rsquo;m running, Ark provides a clear inspiration:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The future belongs to Runtime, to Control Plane, to governable agent systems.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;And the ones who can truly scale this system are not McKinsey, but the community.&lt;/strong&gt;&lt;/p&gt;</content:encoded></item><item><title>From Using AI to Relying on AI: Why the Era of AI Engineering Has Yet to Begin</title><link>https://jimmysong.io/blog/from-using-ai-to-building-ai-systems/</link><pubDate>Sat, 29 Nov 2025 12:40:54 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/from-using-ai-to-building-ai-systems/</guid><description>AI&amp;#39;s real turning point is moving from using AI tools to building AI systems. Why the era of AI engineering hasn&amp;#39;t begun, and the developer opportunity in the next three years.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The real inflection point for AI engineering is not &amp;ldquo;how many people use it,&amp;rdquo; but &amp;ldquo;how many people cannot do without it.&amp;rdquo; Only when not using AI leads to direct loss of opportunity and efficiency, can we say the era of AI engineering has truly arrived.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="starting-point-predictions-for-ai-in-2026"&gt;Starting Point: Predictions for AI in 2026&lt;/h2&gt;
&lt;p&gt;Recently, I came across two &lt;a href="https://thenewstack.io/amazon-cto-werner-vogels-predictions-for-2026/" target="_blank" rel="noopener"&gt;predictions for 2026 from Amazon CTO Werner Vogels&lt;/a&gt; that struck me the most:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Renaissance Developer&lt;/strong&gt;: Developers must span code, product, business, and social impact.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Personalized Learning&lt;/strong&gt;: AI will reshape education, focusing on differentiated paths rather than a unified curriculum.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both point to the same trend: AI is not just a tool, but is redefining how people grow and how they are defined.&lt;/p&gt;
&lt;p&gt;There is a gap between prediction and reality, and it is worth exploring.&lt;/p&gt;
&lt;h2 id="correction-will-ai-really-be-saturated-by-2026"&gt;Correction: Will AI Really Be &amp;ldquo;Saturated&amp;rdquo; by 2026?&lt;/h2&gt;
&lt;p&gt;My initial prediction was that AI usage would reach saturation by 2026. Reality has shown me this is too optimistic.&lt;/p&gt;
&lt;p&gt;By the end of 2025, even among internet professionals, most people&amp;rsquo;s use of AI remains at the &amp;ldquo;heard of it&amp;rdquo; or &amp;ldquo;tried it a few times&amp;rdquo; stage. It is still far from being a daily workflow necessity.&lt;/p&gt;
&lt;p&gt;More importantly, this judgment is &lt;strong&gt;conditional&lt;/strong&gt;: infrastructure supply, regulation, and compute costs must not reverse in the next 3–6 years. If any variable breaks down (costs double, models go offline, policy shifts), the adoption curve will be disrupted.&lt;/p&gt;
&lt;h2 id="the-truth-about-the-inflection-point-from-using-to-relying-on"&gt;The Truth About the Inflection Point: From &amp;ldquo;Using&amp;rdquo; to &amp;ldquo;Relying On&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;&amp;ldquo;Relying on&amp;rdquo; is a vague term. A more precise definition requires measurable indicators.&lt;/p&gt;
&lt;p&gt;Here is a diagram that visualizes the metrics for being truly dependent on AI:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This diagram visualizes the quantitative metrics for being truly dependent on AI, comparing target thresholds with current status:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-dependency-metrics.svg" data-img="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-dependency-metrics.svg" alt="Figure 1: Quantitative Definition of AI Dependency" data-caption="Figure 1: Quantitative Definition of AI Dependency"
width="1776"
height="616"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Quantitative Definition of AI Dependency&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Most industries have not reached the &amp;ldquo;cannot operate without&amp;rdquo; stage, unlike the internet, mobile, or payment inflection points. Most metrics are still far below the threshold, which is why the most likely outcome for 2026 is: &lt;strong&gt;more people will use AI, but those who truly rely on it will remain a minority&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="using--building-the-five-level-capability-ladder"&gt;Using ≠ Building: The Five-Level Capability Ladder&lt;/h2&gt;
&lt;p&gt;This difference is not binary, but a clear progression.&lt;/p&gt;
&lt;p&gt;The following table shows the five-level model of AI capability maturity.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Scarcity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Tool User&lt;/td&gt;
&lt;td&gt;ChatGPT/Claude, Coding, Copywriting, Accelerator, Optional&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Integrator&lt;/td&gt;
&lt;td&gt;LLM API + Vector DB, AI layered on existing systems, Usable, not critical&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Settler&lt;/td&gt;
&lt;td&gt;Restructuring data flow, business decisions, AI becomes critical path&lt;/td&gt;
&lt;td&gt;Rising&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Engineering Abstraction&lt;/td&gt;
&lt;td&gt;Extracting frameworks, runtimes, providing infra for ecosystem&lt;/td&gt;
&lt;td&gt;Extremely High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Autonomous System&lt;/td&gt;
&lt;td&gt;Self-feedback, self-optimizing, redefining human-AI relationship&lt;/td&gt;
&lt;td&gt;Future&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Five-Level Model of AI Capability Maturity
&lt;/figcaption&gt;
&lt;p&gt;Currently, the biggest gap is at &lt;strong&gt;Level 3 and Level 4&lt;/strong&gt;. Most people are stuck at Level 1 or 2, with very few reaching Level 4. This means &lt;strong&gt;high-value scarcity will not disappear, but will continue to rise&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="why-the-era-of-ai-engineering-has-not-arrived-three-dimensional-delaying-factors"&gt;Why the Era of AI Engineering Has Not Arrived: Three-Dimensional Delaying Factors&lt;/h2&gt;
&lt;p&gt;It is not technology alone that is holding things back, but constraints in three dimensions.&lt;/p&gt;
&lt;p&gt;The following diagram illustrates the three main constraints delaying AI engineering maturity:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This diagram illustrates the three main constraints (technical, institutional, and organizational) that are delaying AI engineering maturity, along with their delay metrics:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-engineering-constraints.svg" data-img="https://assets.jimmysong.io/images/blog/from-using-ai-to-building-ai-systems/ai-engineering-constraints.svg" alt="Figure 2: Three-Dimensional Constraints on AI Engineering Maturity" data-caption="Figure 2: Three-Dimensional Constraints on AI Engineering Maturity"
width="1783"
height="803"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Three-Dimensional Constraints on AI Engineering Maturity&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The key observation: &lt;strong&gt;If any one dimension is stuck, the entire ecosystem&amp;rsquo;s maturity will be delayed&lt;/strong&gt;. Currently, none of the three dimensions have fully mature solutions.&lt;/p&gt;
&lt;h2 id="the-realistic-window-three-paths-for-capability-advancement"&gt;The Realistic Window: Three Paths for Capability Advancement&lt;/h2&gt;
&lt;p&gt;The next three years will not be &amp;ldquo;winner takes all,&amp;rdquo; but rather a period where multiple capability levels appreciate simultaneously.&lt;/p&gt;
&lt;p&gt;Below is a table comparing the value and bottlenecks of different capability advancement paths:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability Path&lt;/th&gt;
&lt;th&gt;Short-Term Value&lt;/th&gt;
&lt;th&gt;Long-Term Outlook&lt;/th&gt;
&lt;th&gt;Bottleneck&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Level 1→2 (Tool→Integration)&lt;/td&gt;
&lt;td&gt;⭐⭐ Rapid Depreciation&lt;/td&gt;
&lt;td&gt;⭐ Saturation&lt;/td&gt;
&lt;td&gt;Low barrier, fierce competition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Level 2→3 (Integration→Settlement)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐ Scarce&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐ Continual Appreciation&lt;/td&gt;
&lt;td&gt;Requires industry depth, long-term iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Level 3→4 (Settlement→Abstraction)&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐ Extremely Scarce&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐ Defines Ecosystem&lt;/td&gt;
&lt;td&gt;Large cognitive leap, needs community influence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: AI Capability Advancement Paths and Value Comparison
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Key conclusion&lt;/strong&gt;: While the number of &amp;ldquo;AI users&amp;rdquo; is rapidly increasing (depressing Level 1 value), due to the three-dimensional delaying factors, scarcity at Level 3 and 4 will only rise.&lt;/p&gt;
&lt;h2 id="what-im-doing-on-arkspheredev"&gt;What I&amp;rsquo;m Doing on arksphere.dev&lt;/h2&gt;
&lt;p&gt;Based on the above judgment, I focus on exploring the architectural evolution of AI Native Infrastructure. The goal is not to catalog model usage, but to study the foundational capability stack supporting scalable intelligent systems: scheduling, storage, inference, Agent Runtime, autonomous control, observability, and reliability.&lt;/p&gt;
&lt;p&gt;The content is no longer a collection of courses or tips, but a continuous record of evolution around Infra → Runtime → System Abstraction. &lt;a href="https://arksphere.dev" target="_blank" rel="noopener"&gt;arksphere.dev&lt;/a&gt; is the site for this experiment and settlement.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The inflection point for the era of AI engineering is not &amp;ldquo;how many people use it,&amp;rdquo; but &amp;ldquo;how many people cannot do without it.&amp;rdquo; The latter requires five measurable indicators to reach their thresholds, and we are still far from that.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Using ≠ Building&amp;rdquo; is not a binary, but a five-level progression. &lt;strong&gt;Scarcity at Level 3 and 4 will rise as the number of Level 1 users increases&lt;/strong&gt;—this is the biggest opportunity window in the next three years.&lt;/p&gt;
&lt;p&gt;But the width of this window depends largely on how technology, institutions, and organizations evolve together. I hope more people working on AI engineering will not only focus on technical innovation, but also invest equal thought into institutional development, talent growth, and risk governance—these &amp;ldquo;invisible engineering&amp;rdquo; challenges.&lt;/p&gt;</content:encoded></item><item><title>Antigravity VS Code Setup Guide: Build a Practical AI IDE Workflow</title><link>https://jimmysong.io/blog/antigravity-vscode-style-ide/</link><pubDate>Thu, 20 Nov 2025 03:55:30 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/antigravity-vscode-style-ide/</guid><description>A practical Antigravity setup guide for developers who want a VS Code-style AI IDE, including marketplace switch, AMP and CodeX installation, and workflow tuning.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The biggest pain point when switching IDEs is user habits. By installing a series of plugins and tweaking configurations, you can make Antigravity feel much more like VS Code—preserving familiar workflows while adding Open Agent Manager capabilities.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you searched for a practical Antigravity VS Code setup, this walkthrough is optimized for that exact use case. The goal is not to replicate VS Code pixel by pixel, but to restore a familiar extension marketplace, keep your daily coding ergonomics, and still use Antigravity&amp;rsquo;s stronger agent-style execution. I focus on the concrete setup steps that materially change productivity: marketplace migration, AMP and CodeX installation, editor behavior alignment, and the trade-offs versus GitHub Copilot in real daily work.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/antigravity-ui.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/antigravity-ui.webp" alt="Figure 1: Antigravity IDE UI" data-caption="Figure 1: Antigravity IDE UI"
width="5120"
height="2880"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Antigravity IDE UI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="continue-reading"&gt;Continue Reading&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/qoder-alibaba-ai-ide-personal-review/"&gt;Qoder AI IDE review and hands-on comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/open-source-ai-agent-workflow-comparison/"&gt;Open-source AI Agent and workflow platform comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/blog/vibe-coding-free-tools/"&gt;Free Vibe Coding tools I actually use&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/ai/oh-my-opencode/"&gt;Oh My OpenCode in AI OSS Landscape&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below are the configurations and steps I actually use. Feel free to follow along.&lt;/p&gt;
&lt;h2 id="first-impressions-of-antigravity"&gt;First Impressions of Antigravity&lt;/h2&gt;
&lt;p&gt;A few subjective observations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The interface is split between agent management and editor views, somewhat like AgentHQ + VS Code.&lt;/li&gt;
&lt;li&gt;Agents modify code very quickly, with a much higher completion rate than typical &amp;ldquo;chat-based&amp;rdquo; assistants.&lt;/li&gt;
&lt;li&gt;The editor and context windows are large, ideal for long diffs and logs.&lt;/li&gt;
&lt;li&gt;By default, it uses OpenVSX / OpenVSCode Gallery, so the extension ecosystem isn&amp;rsquo;t identical to my VS Code setup.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All subsequent steps focus on one goal: keep Antigravity&amp;rsquo;s agent features while maintaining my VS Code workflow.&lt;/p&gt;
&lt;h2 id="switching-the-extension-marketplace-to-vs-code-official"&gt;Switching the Extension Marketplace to VS Code Official&lt;/h2&gt;
&lt;p&gt;Antigravity is essentially a VS Code fork, so you can directly change the Marketplace configuration.&lt;/p&gt;
&lt;p&gt;In Antigravity:&lt;/p&gt;
&lt;p&gt;Go to &lt;strong&gt;Settings&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Antigravity Settings&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Editor&lt;/strong&gt;, and update the following URLs to point to VS Code:&lt;/p&gt;
&lt;p&gt;Marketplace Item URL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;https://marketplace.visualstudio.com/items
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Marketplace Gallery URL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;https://marketplace.visualstudio.com/_apis/public/gallery
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/vscode-marketplace.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/vscode-marketplace.webp" alt="Figure 2: VSCode Marketplace Configuration" data-caption="Figure 2: VSCode Marketplace Configuration"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: VSCode Marketplace Configuration&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Restart Antigravity.&lt;/p&gt;
&lt;p&gt;After this change, searching and installing extensions works just like the official VS Code Marketplace. Installing AMP, GitHub Theme, VS Code Icon, etc., all follow this process.&lt;/p&gt;
&lt;h2 id="installing-the-amp-extension"&gt;Installing the AMP Extension&lt;/h2&gt;
&lt;p&gt;AMP isn&amp;rsquo;t officially supported on Antigravity yet, but you can install it directly via the VS Code Marketplace.&lt;/p&gt;
&lt;p&gt;Steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open the Extensions panel (the same icon as in VS Code).&lt;/li&gt;
&lt;li&gt;Search for the AMP extension and install it as usual.&lt;/li&gt;
&lt;li&gt;Log in using your AMP API Key.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Currently, Antigravity doesn&amp;rsquo;t support one-click account login like VS Code; you have to use the API key.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
Summary
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
Once installed, AMP works almost identically in Antigravity as in VS Code—completion and refactoring features are available. The only difference is manual login configuration.
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;I recommend AMP because it offers a free mode. In my experience, it&amp;rsquo;s great for writing documentation, running scripts, and as a daily command-line tool. It&amp;rsquo;s fast, and especially useful for optimizing prompts.&lt;/p&gt;
&lt;h2 id="importing-the-codex-extension"&gt;Importing the CodeX Extension&lt;/h2&gt;
&lt;p&gt;CodeX doesn&amp;rsquo;t provide a direct VSIX download link on the web. My approach is to export it from VS Code and then import it into Antigravity.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/codex-extension.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/codex-extension.webp" alt="Figure 3: Exporting Codex Extension in VS Code" data-caption="Figure 3: Exporting Codex Extension in VS Code"
width="3016"
height="2264"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Exporting Codex Extension in VS Code&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install the CodeX extension in VS Code (if you haven&amp;rsquo;t already).&lt;/li&gt;
&lt;li&gt;In VS Code&amp;rsquo;s extension manager, find CodeX and export it as a &lt;code&gt;.vsix&lt;/code&gt; file.&lt;/li&gt;
&lt;li&gt;Switch to Antigravity, open the Extensions panel, and select &amp;ldquo;Install from VSIX&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Choose the exported &lt;code&gt;codex-x.x.x.vsix&lt;/code&gt; file to complete installation.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="alert alert-tip-container"&gt;
&lt;div class="alert-tip-title px-2"&gt;
Tip
&lt;/div&gt;
&lt;div class="alert-tip px-2"&gt;
Since my local VS Code is already logged into CodeX, importing it into Antigravity automatically reuses the login state—I didn&amp;rsquo;t need to log in again.
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="optimizing-editor-settings"&gt;Optimizing Editor Settings&lt;/h2&gt;
&lt;p&gt;Beyond the marketplace and plugins, a few tweaks make the experience even closer to VS Code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Theme&lt;/strong&gt;: Choose the same color scheme as VS Code to minimize visual switching. I use GitHub Theme and vscode-icons.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Editor Settings&lt;/strong&gt;: In &amp;ldquo;Open Editor Settings&amp;rdquo;, set indentation, formatting, line width, etc., to match your VS Code preferences. I define these in the workspace&amp;rsquo;s &lt;code&gt;settings.json&lt;/code&gt;, so no migration is needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After these changes, the editing area is essentially &amp;ldquo;VS Code with an agent console&amp;rdquo;.&lt;/p&gt;
&lt;h2 id="remaining-issues"&gt;Remaining Issues&lt;/h2&gt;
&lt;p&gt;To fully migrate from VS Code/GitHub Copilot to Antigravity, I think there are still several key challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Limited Customization&lt;/strong&gt;: Antigravity can&amp;rsquo;t support custom prompts and agents like Copilot Chat. Currently, only &amp;ldquo;rules&amp;rdquo; configuration is available, which limits workflow flexibility.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model Ecosystem Needs Improvement&lt;/strong&gt;: Antigravity hasn&amp;rsquo;t natively integrated the latest models from major vendors (OpenAI, Anthropic, Microsoft, xAI, etc.), whereas GitHub Copilot excels here.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost Considerations&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;Future pricing may start at $20/month.&lt;/li&gt;
&lt;li&gt;No free models are supported, unlike GitHub Copilot (even Copilot Pro users have free model options).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stability Issues&lt;/strong&gt;: Agents often encounter &amp;ldquo;Agent terminated due to error&amp;rdquo; during operation, requiring manual retries or new sessions. This affects workflow smoothness, though I expect improvements in the future.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="github-copilot-vs-antigravity"&gt;GitHub Copilot VS. Antigravity&lt;/h2&gt;
&lt;p&gt;Although Antigravity excels in several areas, there is still significant room for improvement compared to the combination of GitHub Copilot and VS Code.&lt;/p&gt;
&lt;p&gt;The large language models (LLMs, Large Language Models) I frequently use are all supported in VS Code:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/models.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/models.webp" alt="Figure 4: Copilot-supported LLMs (partial)" data-caption="Figure 4: Copilot-supported LLMs (partial)"
width="1252"
height="1240"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Copilot-supported LLMs (partial)&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;My long-accumulated custom prompts:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/prompts.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/prompts.webp" alt="Figure 5: Copilot Chat enables quick access to custom prompts" data-caption="Figure 5: Copilot Chat enables quick access to custom prompts"
width="1252"
height="852"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Copilot Chat enables quick access to custom prompts&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;My collection of agents:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/agents.webp" data-img="https://assets.jimmysong.io/images/blog/antigravity-vscode-style-ide/agents.webp" alt="Figure 6: Copilot Chat allows selection of custom agents" data-caption="Figure 6: Copilot Chat allows selection of custom agents"
width="1258"
height="2594"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: Copilot Chat allows selection of custom agents&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Here are some personal experiences using VS Code and Copilot that, for now, are hard to replace with other IDEs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Ask/Edit/Agent/Plan workflow perfectly fits my working habits.&lt;/li&gt;
&lt;li&gt;Support for custom prompts and agents is essential. Many of my prompts and agents have been refined over time and are deeply integrated into my daily workflow—it&amp;rsquo;s hard to find alternatives elsewhere.&lt;/li&gt;
&lt;li&gt;New models are integrated at lightning speed. Whenever a new model is released, GitHub Copilot is among the first to support it.&lt;/li&gt;
&lt;li&gt;The integration with VS Code is seamless—no extra configuration required, making it extremely convenient.&lt;/li&gt;
&lt;li&gt;Frequent updates: just a few days ago, a bug I reported to VS Code was fixed the same night.&lt;/li&gt;
&lt;li&gt;Copilot Chat&amp;rsquo;s keyboard shortcuts make it easy to quickly access various features.&lt;/li&gt;
&lt;li&gt;GitHub has granted me a free Pro account. Although the monthly premium quota is only 300 calls, combining Copilot with other plugins like AMP, Codex, Droid, and Qwen enables a highly efficient workflow. Even if I upgrade to a paid account in the future, the $10/month fee is very cost-effective compared to similar products.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="practical-experience"&gt;Practical Experience&lt;/h2&gt;
&lt;p&gt;A few subjective tips from my actual usage—take them as reference:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t treat Antigravity as &amp;ldquo;VS Code + chat box&amp;rdquo;. Use its agent features for complete tasks: let the agent propose a plan, then execute changes.&lt;/li&gt;
&lt;li&gt;For major changes, always create a new Git branch and restrict agent actions to that branch. Handle all diffs via standard Pull Request (PR) workflows.&lt;/li&gt;
&lt;li&gt;Ask agents to produce &amp;ldquo;artifacts&amp;rdquo; (plans, proposals, test descriptions), not just final code. This makes it easier to review and track changes.&lt;/li&gt;
&lt;li&gt;Plugins you&amp;rsquo;re already comfortable with in VS Code (like AMP, CodeX) can be migrated directly, reducing cognitive load and letting you focus on new agent workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;My current experience: Antigravity delivers powerful agent capabilities and multi-view consoles. By following these steps to align the interface and plugin ecosystem with VS Code, you can smoothly transition your daily development workflow.&lt;/p&gt;</content:encoded></item><item><title>Cloudflare November 18 Global Outage: The Dangers of Implicit Assumptions in Modern Infrastructure</title><link>https://jimmysong.io/blog/cloudflare-2025-11-18-outage-analysis/</link><pubDate>Wed, 19 Nov 2025 18:56:34 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/cloudflare-2025-11-18-outage-analysis/</guid><description>An analysis of the Cloudflare global outage on November 18, 2025, exploring implicit assumptions, automated configuration pipelines, and systemic risks in modern infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The greatest risks to modern internet infrastructure often aren&amp;rsquo;t in the code itself, but in those implicit assumptions and automated configuration pipelines that go undefined. Cloudflare&amp;rsquo;s outage is a wake-up call every Infra/AI engineer must heed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Yesterday (November 18), Cloudflare experienced its largest global outage since 2019. As this site is hosted on Cloudflare, it was also affected—one of the rare times in eight years that the site was inaccessible due to an outage (the last time was a GitHub Pages failure, which happened the year Microsoft acquired GitHub).&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloudflare-2025-11-18-outage-analysis/jimmysongio-down.webp" data-img="https://assets.jimmysong.io/images/blog/cloudflare-2025-11-18-outage-analysis/jimmysongio-down.webp" alt="Figure 1: jimmysong.io was down for 27 minutes due to the Cloudflare outage" data-caption="Figure 1: jimmysong.io was down for 27 minutes due to the Cloudflare outage"
width="2694"
height="1424"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: jimmysong.io was down for 27 minutes due to the Cloudflare outage&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This incident was not caused by an attack or a traditional software bug, but by a seemingly &amp;ldquo;safe&amp;rdquo; permissions update that triggered the weakest link in modern infrastructure: &lt;strong&gt;implicit assumptions (Implicit Assumption) and automated configuration pipelines (Automated Configuration Pipeline)&lt;/strong&gt;. Cloudflare has published a blog post &lt;a href="https://blog.cloudflare.com/18-november-2025-outage/" target="_blank" rel="noopener"&gt;Cloudflare outage on November 18, 2025&lt;/a&gt; explaining the cause.&lt;/p&gt;
&lt;p&gt;Here is the chain reaction process of the outage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A permissions adjustment led to metadata changes;&lt;/li&gt;
&lt;li&gt;The metadata change doubled the lines in the feature file;&lt;/li&gt;
&lt;li&gt;The doubled lines triggered the proxy module&amp;rsquo;s memory limit;&lt;/li&gt;
&lt;li&gt;The memory limit caused the core proxy to panic;&lt;/li&gt;
&lt;li&gt;The proxy panic led to a cascade failure in downstream systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This kind of chain reaction is the most typical—and dangerous—systemic failure mode at today&amp;rsquo;s internet scale.&lt;/p&gt;
&lt;h2 id="root-cause-implicit-assumptions-are-not-contracts"&gt;Root Cause: Implicit Assumptions Are Not Contracts&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s first look at the core hidden risk in this incident. The Bot Management feature file is automatically generated every five minutes, relying on a default premise:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The system.columns query result contains only the default database.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This assumption was not documented or validated in configuration—it existed only in the engineer&amp;rsquo;s mental model.&lt;/p&gt;
&lt;p&gt;After a ClickHouse permissions update, the underlying r0 tables were exposed, instantly doubling the query results. The file size exceeded the &lt;a href="https://blog.cloudflare.com/20-percent-internet-upgrade/" target="_blank" rel="noopener"&gt;FL2&lt;/a&gt; preset of 200 features in memory, ultimately causing a panic.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Once an implicit assumption is broken, the system lacks a buffer and is highly prone to cascading failures.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="configuration-pipelines-are-riskier-than-code-pipelines"&gt;Configuration Pipelines Are Riskier Than Code Pipelines&lt;/h2&gt;
&lt;p&gt;This incident was not caused by code changes, but by data-plane changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SQL query behavior changed;&lt;/li&gt;
&lt;li&gt;Feature files were automatically generated;&lt;/li&gt;
&lt;li&gt;The files were broadcast across the network.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A typical phenomenon in modern infrastructure: &lt;strong&gt;data, schema, and metadata are far more likely to destabilize systems than code.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Cloudflare&amp;rsquo;s feature file is a &amp;ldquo;supply chain input,&amp;rdquo; not a regular configuration. Anything entering the automated broadcast path is equivalent to a system-level command.&lt;/p&gt;
&lt;h2 id="language-safety-cant-eliminate-boundary-layer-complexity"&gt;Language Safety Can&amp;rsquo;t Eliminate Boundary Layer Complexity&lt;/h2&gt;
&lt;p&gt;A former Cloudflare engineer summarized it well:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Rust can prevent a class of errors, but the complexity of boundary layers, data contracts, and configuration pipelines does not disappear.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The FL2 panic stemmed from a single &lt;code&gt;unwrap()&lt;/code&gt;. This isn&amp;rsquo;t a language issue, but a lack of system contracts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No upper-bound validation for feature count;&lt;/li&gt;
&lt;li&gt;File schema lacked version constraints;&lt;/li&gt;
&lt;li&gt;Feature generation logic depended on implicit behavior;&lt;/li&gt;
&lt;li&gt;Core proxy error mode was panic, not graceful degradation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Most incidents in modern distributed systems (Distributed System) come from &amp;ldquo;bad input,&amp;rdquo; not &amp;ldquo;bad memory.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="core-proxies-need-controllable-failure-paths"&gt;Core Proxies Need Controllable Failure Paths&lt;/h2&gt;
&lt;p&gt;FL/FL2 are Cloudflare&amp;rsquo;s core proxies; all requests must pass through them. Such components should not fail with a panic, but have the following capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ignore abnormal features;&lt;/li&gt;
&lt;li&gt;Truncate over-limit fields;&lt;/li&gt;
&lt;li&gt;Roll back to previous versions;&lt;/li&gt;
&lt;li&gt;Fail-open or fail-close;&lt;/li&gt;
&lt;li&gt;Skip the Bot module and continue processing traffic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As long as the proxy &amp;ldquo;stays alive,&amp;rdquo; the entire network won&amp;rsquo;t be completely paralyzed.&lt;/p&gt;
&lt;h2 id="data-changes-are-more-uncontrollable-than-code-changes"&gt;Data Changes Are More Uncontrollable Than Code Changes&lt;/h2&gt;
&lt;p&gt;The essence of this incident:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Subtle permission changes;&lt;/li&gt;
&lt;li&gt;ClickHouse default behavior changed;&lt;/li&gt;
&lt;li&gt;Query results propagated to distributed systems;&lt;/li&gt;
&lt;li&gt;Automated publishing amplified the error;&lt;/li&gt;
&lt;li&gt;Edge proxies crashed due to uncontrolled input.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Future AI Infra (AI Infrastructure) will be even more complex: models, tokenizers, adapters, RAG indexes, and KV snapshots all require frequent updates.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In future AI infrastructure, data-plane risks will far exceed those of the code-plane.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="recovery-process-shows-engineering-maturity"&gt;Recovery Process Shows Engineering Maturity&lt;/h2&gt;
&lt;p&gt;During the incident, Cloudflare took several measures:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stopped generating erroneous feature files;&lt;/li&gt;
&lt;li&gt;Force-distributed the previous version of the file;&lt;/li&gt;
&lt;li&gt;Rolled back Bot module configuration;&lt;/li&gt;
&lt;li&gt;Ran Workers KV and Access outside the core proxy;&lt;/li&gt;
&lt;li&gt;Restored traffic in stages.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Restoring hundreds of PoPs worldwide simultaneously demonstrates a high level of engineering maturity.&lt;/p&gt;
&lt;h2 id="lessons-for-infraaicloud-native-engineers"&gt;Lessons for Infra/AI/Cloud Native Engineers&lt;/h2&gt;
&lt;p&gt;The Cloudflare event highlights four common risks in large-scale systems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Implicit assumptions fail;&lt;/li&gt;
&lt;li&gt;Configuration supply chain contamination;&lt;/li&gt;
&lt;li&gt;Automated publishing amplifies errors;&lt;/li&gt;
&lt;li&gt;Core proxies lack graceful degradation paths.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For AI Infra practitioners, these risks are even more relevant:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model weight updates without schema validation;&lt;/li&gt;
&lt;li&gt;Adapter merges may be contaminated;&lt;/li&gt;
&lt;li&gt;RAG index incremental builds are unstable;&lt;/li&gt;
&lt;li&gt;Inference graph configuration may be broken by bad data;&lt;/li&gt;
&lt;li&gt;Automatically rolled-out models may propagate errors network-wide.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;AI engineering is replaying Cloudflare&amp;rsquo;s infrastructure dilemmas—just at greater speed and scale.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary-of-former-cloudflare-engineers-views"&gt;Summary of Former Cloudflare Engineer&amp;rsquo;s Views&lt;/h2&gt;
&lt;p&gt;His insights pinpoint the hardest problems in distributed systems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The issue isn&amp;rsquo;t code, but missing contracts;&lt;/li&gt;
&lt;li&gt;Not the language, but undefined input boundaries;&lt;/li&gt;
&lt;li&gt;Not modules, but lack of validation in the configuration supply chain;&lt;/li&gt;
&lt;li&gt;Not bugs, but absence of fail-safe mechanisms.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This incident proves: &lt;strong&gt;The real fragility in modern infrastructure lies in &amp;ldquo;behavioral boundaries,&amp;rdquo; not &amp;ldquo;memory boundaries.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The Cloudflare November 18 outage was not a coincidence, but an inevitable result of modern internet infrastructure evolving to large-scale, highly automated stages.&lt;/p&gt;
&lt;p&gt;Key takeaways from this event:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;System assumptions must be made explicit;&lt;/li&gt;
&lt;li&gt;Configuration pipelines must be validated;&lt;/li&gt;
&lt;li&gt;Automated publishing needs &amp;ldquo;dead-end&amp;rdquo; mechanisms;&lt;/li&gt;
&lt;li&gt;Core proxies must be designed with controllable failure paths;&lt;/li&gt;
&lt;li&gt;Data-plane contracts must be stricter than code-plane contracts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the AI-native Infra era, these requirements will only become more stringent.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.cloudflare.com/18-november-2025-outage/" target="_blank" rel="noopener"&gt;Cloudflare outage on November 18, 2025 - blog.cloudflare.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.cloudflare.com/20-percent-internet-upgrade/" target="_blank" rel="noopener"&gt;20% of the Internet upgraded - blog.cloudflare.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>The Second Half of Cloud Native: The Era of AI Native Platform Engineering Has Arrived</title><link>https://jimmysong.io/blog/cloud-native-second-half-ai-native-platform-engineering/</link><pubDate>Mon, 17 Nov 2025 11:07:40 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/cloud-native-second-half-ai-native-platform-engineering/</guid><description>A decade of cloud native evolution, a look ahead to AI-Native Platform engineering, technical layers, and key changes. KubeCon NA 2025 signals a new era.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The second half of cloud native isn&amp;rsquo;t about being replaced by AI, but being rewritten by it. The future of platform engineering will revolve around models and agents, reshaping the tech stack and developer experience.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Since I first encountered Docker and Kubernetes in 2015, I&amp;rsquo;ve followed the cloud native journey: from writing Deployments in YAML, to exploring Service Mesh and observability, and in recent years, focusing on AI Infra and AI-Native Platforms. Looking back from 2025, the years 2015–2025 can be seen as the &amp;ldquo;first half&amp;rdquo; of cloud native. Marked by &lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/" target="_blank" rel="noopener"&gt;KubeCon / CloudNativeCon NA 2025&lt;/a&gt;, the industry is collectively entering the &amp;ldquo;second half&amp;rdquo;: the era of AI-Native Platform engineering.&lt;/p&gt;
&lt;p&gt;This article reviews the past decade of cloud native, and, combined with KubeCon NA 2025, outlines key turning points and the technical coordinates for the next ten years.&lt;/p&gt;
&lt;h2 id="20152025-the-first-half-of-cloud-native"&gt;2015–2025: The &amp;ldquo;First Half&amp;rdquo; of Cloud Native&lt;/h2&gt;
&lt;p&gt;Over the past decade, cloud native technology themes have evolved through three main stages. The following flowchart illustrates the progression.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The flowchart below illustrates the progression of cloud native technology themes over the past decade:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/cloud-native-decade-evolution.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/cloud-native-decade-evolution.svg" alt="Figure 1: Cloud Native Decade Technology Evolution Flow" data-caption="Figure 1: Cloud Native Decade Technology Evolution Flow"
width="2543"
height="223"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Cloud Native Decade Technology Evolution Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The first stage focused on containerization and orchestration standardization.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Docker realized the engineering dream of &amp;ldquo;build once, run anywhere&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Kubernetes won the orchestration wars and became the de facto standard&lt;/li&gt;
&lt;li&gt;CNCF was founded, with Prometheus, Envoy, and other projects joining&lt;/li&gt;
&lt;li&gt;Enterprises focused on migrating applications to Kubernetes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Typical tasks during this phase involved moving Java services from VMs to containers and K8s, emphasizing understanding of Deployment, Service, and Ingress.&lt;/p&gt;
&lt;p&gt;The second stage, 2018–2020, saw complexity shift from &amp;ldquo;deployment&amp;rdquo; to &amp;ldquo;communication&amp;rdquo; and &amp;ldquo;operations&amp;rdquo;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Service Mesh (Istio / Linkerd / Consul) addressed east-west traffic management&lt;/li&gt;
&lt;li&gt;The observability trio (Logs / Metrics / Traces) became default configurations&lt;/li&gt;
&lt;li&gt;Multi-cluster and multi-region practices matured&lt;/li&gt;
&lt;li&gt;Enterprises focused on managing large microservice systems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During this period, I spent significant time researching Istio, service mesh, and traffic management, and authored Kubernetes and Istio books. The focus shifted to system stability, observability, and reliability.&lt;/p&gt;
&lt;p&gt;The third stage, 2021–2025, is defined by Platform Engineering and GitOps.&lt;/p&gt;
&lt;p&gt;As microservices and tools proliferated, platform complexity began to overwhelm developers, making Platform Engineering a key industry term.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitOps (Argo CD / Flux) drove declarative delivery processes&lt;/li&gt;
&lt;li&gt;Internal Developer Platforms (IDP) became priorities for large enterprises&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Platform as a product&amp;rdquo; philosophy spread&lt;/li&gt;
&lt;li&gt;FinOps, cost management, and compliance auditing became platform concerns&lt;/li&gt;
&lt;li&gt;DevOps evolved from &amp;ldquo;tool practice&amp;rdquo; to &amp;ldquo;organizational + platform capability&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My takeaway: simply giving developers a pile of tools isn&amp;rsquo;t enough. End-to-end delivery paths and stable abstraction layers are needed so developers can focus on business, not tool integration.&lt;/p&gt;
&lt;p&gt;The table below summarizes the main features of each &amp;ldquo;first half&amp;rdquo; stage.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Core Challenge&lt;/th&gt;
&lt;th&gt;Key Tech Stack&lt;/th&gt;
&lt;th&gt;Typical Issues&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2015–2017 Orchestration&lt;/td&gt;
&lt;td&gt;Migrating from VM to containers&lt;/td&gt;
&lt;td&gt;Docker, Kubernetes, CNI&lt;/td&gt;
&lt;td&gt;Reliable deployment, rolling upgrades&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2018–2020 Mesh&lt;/td&gt;
&lt;td&gt;Microservice scale, complex communication &amp;amp; observability&lt;/td&gt;
&lt;td&gt;Istio/Linkerd, Prometheus, Jaeger&lt;/td&gt;
&lt;td&gt;Troubleshooting, fragmented observability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2021–2025 Platform&lt;/td&gt;
&lt;td&gt;Tool sprawl, declining developer experience&lt;/td&gt;
&lt;td&gt;GitOps, IDP, FinOps, Policy-as-Code&lt;/td&gt;
&lt;td&gt;Developer fatigue, platform team overload&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Cloud Native First Half Stage Features
&lt;/figcaption&gt;
&lt;h2 id="kubecon-na-2025-signals-of-cloud-natives-second-half"&gt;KubeCon NA 2025: Signals of Cloud Native&amp;rsquo;s &amp;ldquo;Second Half&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;The main theme of KubeCon 2025 is no longer &amp;ldquo;how to use Kubernetes well,&amp;rdquo; but how to reconstruct Kubernetes and the cloud native ecosystem into AI-Native Platforms for the AI era.&lt;/p&gt;
&lt;p&gt;Key signals from KubeCon NA 2025 include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CNCF released the &lt;a href="https://github.com/cncf/k8s-ai-conformance" target="_blank" rel="noopener"&gt;Certified Kubernetes AI Conformance Program&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dynamic Resource Allocation (DRA) entered mainstream discussions&lt;/li&gt;
&lt;li&gt;Model Runtime / Agent Runtime projects became conference hotspots&lt;/li&gt;
&lt;li&gt;Vendors focused on AI SRE, AI-assisted development, AI security, and supply chain governance&lt;/li&gt;
&lt;li&gt;Speakers like Alex Zenla openly stated that Kubernetes&amp;rsquo; underlying structure needs rethinking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Together, these mark a clear dividing line: cloud native has officially entered its &amp;ldquo;second half.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="first-half-vs-second-half-shifting-the-cloud-native-narrative"&gt;First Half vs Second Half: Shifting the Cloud Native Narrative&lt;/h2&gt;
&lt;p&gt;If 2015–2025 is the &amp;ldquo;first half,&amp;rdquo; then 2025–2035 is likely the &amp;ldquo;second half.&amp;rdquo; The table below compares their core differences.&lt;/p&gt;
&lt;p&gt;It highlights changes in platform objects, goals, abstraction layers, and more.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;First Half (2015–2025)&lt;/th&gt;
&lt;th&gt;Second Half (2025–2035, AI Native)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Objects&lt;/td&gt;
&lt;td&gt;Containers, Pods, Microservices&lt;/td&gt;
&lt;td&gt;Models, inference tasks, Agents, data pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platform Goals&lt;/td&gt;
&lt;td&gt;Stable application delivery&lt;/td&gt;
&lt;td&gt;Efficient, continuous AI workload &amp;amp; agent orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abstraction Layers&lt;/td&gt;
&lt;td&gt;Deployment / Service / Ingress / Job&lt;/td&gt;
&lt;td&gt;Model / Endpoint / Graph / Policy / Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource Scheduling&lt;/td&gt;
&lt;td&gt;CPU / Memory / Node&lt;/td&gt;
&lt;td&gt;GPU / TPU / ASIC / KV Cache / Bandwidth / Power&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering Focus&lt;/td&gt;
&lt;td&gt;DevOps / GitOps / Platform Engineering 1.0&lt;/td&gt;
&lt;td&gt;AI Native Platform Engineering / AI SRE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security &amp;amp; Compliance&lt;/td&gt;
&lt;td&gt;Image security, CVE, supply chain SBOM&lt;/td&gt;
&lt;td&gt;Model security, data security, AI supply chain &amp;amp; &amp;ldquo;hallucination dependencies&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime Forms&lt;/td&gt;
&lt;td&gt;Container + VM + Serverless&lt;/td&gt;
&lt;td&gt;Container + WASM + Nix + Agent Runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Core Differences: First vs Second Half of Cloud Native
&lt;/figcaption&gt;
&lt;p&gt;From a developer&amp;rsquo;s perspective, the most direct change is: future platforms will no longer treat &amp;ldquo;services&amp;rdquo; as first-class citizens, but will center on &amp;ldquo;models + agents.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="example-technical-layers-of-an-ai-native-platform"&gt;Example: Technical Layers of an AI Native Platform&lt;/h2&gt;
&lt;p&gt;To clarify the structure of an AI-Native Platform, the following layered diagram shows the relationships between technical levels.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The layered diagram below shows the relationships between different technical levels in an AI-Native Platform:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/ai-native-platform-layering.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-second-half-ai-native-platform-engineering/ai-native-platform-layering.svg" alt="Figure 2: AI Native Platform Layering Diagram" data-caption="Figure 2: AI Native Platform Layering Diagram"
width="2063"
height="1643"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: AI Native Platform Layering Diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Historically, cloud native focused on L0 + L2 (Kubernetes + platform engineering), but in the AI Native era, L1 (Model Runtime, Agent Runtime, heterogeneous resource scheduling) becomes the new battleground.&lt;/p&gt;
&lt;h2 id="key-change-1-from-container-centric-to-model-centric"&gt;Key Change 1: From &amp;ldquo;Container-Centric&amp;rdquo; to &amp;ldquo;Model-Centric&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;In the first half, cloud native&amp;rsquo;s main object was the application process, with containers as packaging. The second half requires handling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Model version management and canary releases&lt;/li&gt;
&lt;li&gt;Balancing inference performance, latency, and cost&lt;/li&gt;
&lt;li&gt;Multi-model composition, routing, A/B testing&lt;/li&gt;
&lt;li&gt;Relationships between models, data, features, and vector indexes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At KubeCon NA 2025, CNCF&amp;rsquo;s AI Conformance Program aims to standardize model workloads, managing them like Deployments. Platform engineering will gain new abstractions—not just &amp;ldquo;deploying services,&amp;rdquo; but &amp;ldquo;deploying model capabilities.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="key-change-2-dra-and-the-golden-window-for-heterogeneous-resource-scheduling"&gt;Key Change 2: DRA and the Golden Window for Heterogeneous Resource Scheduling&lt;/h2&gt;
&lt;p&gt;Previously, writing a Deployment meant focusing on CPU and memory. Now, GPU inference, training, and Agent Runtime scenarios demand more than static quotas.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jimmysong.io/book/kubernetes-handbook/ai-native/k8s-device-plugin/"&gt;Dynamic Resource Allocation (DRA)&lt;/a&gt; brings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pluggable resource types (GPU/TPU/FPGA/ASIC)&lt;/li&gt;
&lt;li&gt;Topology-aware, NUMA, and memory fragmentation scheduling&lt;/li&gt;
&lt;li&gt;Binding inference requests to compute allocation for fine-grained QoS&lt;/li&gt;
&lt;li&gt;Cost optimization and power control in scheduling decisions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the most significant &amp;ldquo;resource perspective&amp;rdquo; upgrade since Kubernetes&amp;rsquo; inception. The scheduler is no longer just a cluster component, but the AI platform&amp;rsquo;s policy engine.&lt;/p&gt;
&lt;h2 id="key-change-3-agent-runtime-as-the-new-generation-of-runtime"&gt;Key Change 3: Agent Runtime as the New Generation of Runtime&lt;/h2&gt;
&lt;p&gt;KubeCon showcased several representative projects:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://edera.dev" target="_blank" rel="noopener"&gt;Edera&lt;/a&gt;: Minimal, verifiable runtime redesign&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/flox/flox" target="_blank" rel="noopener"&gt;Flox&lt;/a&gt;: Nix-based &amp;ldquo;uncontained&amp;rdquo; runtime environment&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/golemcloud/golem" target="_blank" rel="noopener"&gt;Golem&lt;/a&gt;: WASM-based large-scale agent orchestration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The consensus: AI agents aren&amp;rsquo;t suited to traditional container runtime models. Agents have these traits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Strong statefulness: context, memory, sessions&lt;/li&gt;
&lt;li&gt;High concurrency but fine granularity: massive lightweight tasks&lt;/li&gt;
&lt;li&gt;Extremely sensitive to latency and cold starts&lt;/li&gt;
&lt;li&gt;Need to resume after failure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Next-gen runtimes focus on reliably executing, managing state, and auditing &amp;ldquo;hundreds of thousands of agents,&amp;rdquo; not just &amp;ldquo;spinning up more Pods.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="key-change-4-ai-sre-and-ai-security"&gt;Key Change 4: AI SRE and AI Security&lt;/h2&gt;
&lt;p&gt;At KubeCon NA 2025, security and operations topics were amplified by AI:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Software supply chain attacks and CVEs continue to rise&lt;/li&gt;
&lt;li&gt;LLM-assisted coding introduces &amp;ldquo;hallucination dependencies&amp;rdquo; and &amp;ldquo;vibecoded vulnerabilities&amp;rdquo;&lt;/li&gt;
&lt;li&gt;AI-driven artifact scanning, dependency auditing, and license analysis&lt;/li&gt;
&lt;li&gt;&amp;ldquo;AI SRE&amp;rdquo; is now a formal product category&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Traditional cloud native already emphasized security and SRE, but now must address model weights, datasets, vector stores, and agent workflows. AI-Native Platform engineering must answer:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Are code and dependencies secure?&lt;/li&gt;
&lt;li&gt;Are models and data trustworthy?&lt;/li&gt;
&lt;li&gt;Are agent behaviors controllable?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This will drive deep integration of Policy-as-Code, MCP, graph permission systems, and AI.&lt;/p&gt;
&lt;h2 id="key-change-5-open-source-participation-becomes-a-baseline"&gt;Key Change 5: Open Source Participation Becomes a Baseline&lt;/h2&gt;
&lt;p&gt;In interviews, platform engineering leaders noted:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hiring increasingly values upstream contributions to Kubernetes and related projects&lt;/li&gt;
&lt;li&gt;Open source involvement shortens ramp-up time&lt;/li&gt;
&lt;li&gt;New AI Native projects (Model Runtime, Agent Runtime, Scheduler) are also open source&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For career growth, contributing to AI Native open source projects will become a basic requirement for platform engineering and AI Infra roles—not just a resume bonus.&lt;/p&gt;
&lt;h2 id="the-contours-of-cloud-natives-second-half"&gt;The Contours of Cloud Native&amp;rsquo;s &amp;ldquo;Second Half&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;The table below summarizes the technical focus and essential differences of the &amp;ldquo;second half.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;It highlights the key coordinates of AI-Native Platform engineering.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;th&gt;Technical Focus&lt;/th&gt;
&lt;th&gt;Essential Difference from First Half&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI Native Platform&lt;/td&gt;
&lt;td&gt;Models/Agents as first-class citizens, unified abstraction &amp;amp; governance&lt;/td&gt;
&lt;td&gt;Objects shift from services to models &amp;amp; inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource Scheduling&lt;/td&gt;
&lt;td&gt;DRA, heterogeneous compute, topology awareness, power &amp;amp; cost&lt;/td&gt;
&lt;td&gt;From static quotas to dynamic, policy-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime&lt;/td&gt;
&lt;td&gt;Container + WASM + Nix + Agent Runtime&lt;/td&gt;
&lt;td&gt;From &amp;ldquo;process containerization&amp;rdquo; to &amp;ldquo;execution graph containerization&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platform Engineering&lt;/td&gt;
&lt;td&gt;IDP + AI SRE + Security + Cost + Compliance&lt;/td&gt;
&lt;td&gt;From toolset to &amp;ldquo;autonomous platform&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security &amp;amp; Supply Chain&lt;/td&gt;
&lt;td&gt;LLM dependencies, model weights, datasets, vector store governance&lt;/td&gt;
&lt;td&gt;Protection expands from images to &amp;ldquo;all AI engineering assets&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open Source &amp;amp; Ecosystem&lt;/td&gt;
&lt;td&gt;AI Infra / Model Runtime / Agent Runtime upstream collaboration&lt;/td&gt;
&lt;td&gt;Not just &amp;ldquo;using open source,&amp;rdquo; but &amp;ldquo;building the future in open source&amp;rdquo;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Cloud Native Second Half Technical Coordinates
&lt;/figcaption&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Over the past decade, cloud native evolved from container orchestration to platform engineering 1.0. With KubeCon NA 2025 as a milestone, the industry systematically brings AI into cloud native technology and organizational stacks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kubernetes is no longer just &amp;ldquo;infrastructure for microservices,&amp;rdquo; but &amp;ldquo;runtime for AI workloads&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Platform Engineering is no longer just &amp;ldquo;tool integration,&amp;rdquo; but &amp;ldquo;autonomous platforms for models and agents&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Security, SRE, runtime, scheduling, and networking will all be reimagined under AI&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For me, the past ten years were about &amp;ldquo;making applications more stable in the cloud native world.&amp;rdquo; The next ten will focus on &amp;ldquo;making AI better, safer, and more controllable in the cloud native world.&amp;rdquo; This is, in my view, the opening whistle for cloud native&amp;rsquo;s &amp;ldquo;second half.&amp;rdquo;&lt;/p&gt;</content:encoded></item><item><title>NotebookLM: My Most Recommended AI Tool for Learning and Knowledge Organization</title><link>https://jimmysong.io/blog/notebooklm-learning-and-knowledge-organization/</link><pubDate>Mon, 17 Nov 2025 08:44:45 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/notebooklm-learning-and-knowledge-organization/</guid><description>Based on months of deep usage, this article analyzes how NotebookLM helps me learn new technologies, read complex documents, generate teaching outlines, and shares future improvement expectations.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;NotebookLM is the most tailored AI tool I&amp;rsquo;ve used for knowledge workers. It truly helps me structure massive information and dramatically boosts my learning and content creation efficiency.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As a lifelong learner who reads technical specs and researches open-source projects, I&amp;rsquo;ve always sought a tool that can &amp;ldquo;shortcut&amp;rdquo; my way through mountains of material, reduce mechanical reading, and help me quickly build a global understanding. &lt;a href="https://notebooklm.google.com" target="_blank" rel="noopener"&gt;NotebookLM&lt;/a&gt; has been the smoothest and most reliable experience for me over the past year.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not a traditional &amp;ldquo;chat-style AI tool&amp;rdquo;—it&amp;rsquo;s more like an &lt;strong&gt;AI-native learning and content organization system&lt;/strong&gt; that ingests your materials, organizes them, and presents them in various structured formats. The more I use it, the more I realize its help in learning new technologies, understanding unfamiliar fields, organizing large project documents, and building teaching materials—things that general large language models (LLM, Large Language Model) simply can&amp;rsquo;t match.&lt;/p&gt;
&lt;h2 id="the-core-value-notebooklm-brings-me"&gt;The Core Value NotebookLM Brings Me&lt;/h2&gt;
&lt;p&gt;NotebookLM has significantly improved my workflow, especially in learning new technologies, organizing documents, and content creation.&lt;/p&gt;
&lt;h2 id="quickly-understanding-new-technologies-feed-in-complex-materials-get-a-learnable-version"&gt;Quickly Understanding New Technologies: Feed in Complex Materials, Get a &amp;ldquo;Learnable Version&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;My most frequent and indispensable scenario is &lt;strong&gt;learning a completely unfamiliar technology or development framework&lt;/strong&gt;. Faced with dozens or even hundreds of pages of documentation, my typical approach is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add official docs, README files, design documents, and architecture diagrams into a single Notebook&lt;/li&gt;
&lt;li&gt;Let NotebookLM generate:
&lt;ul&gt;
&lt;li&gt;Study guides&lt;/li&gt;
&lt;li&gt;Briefings&lt;/li&gt;
&lt;li&gt;Key knowledge points&lt;/li&gt;
&lt;li&gt;FAQs&lt;/li&gt;
&lt;li&gt;Quizzes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Ultimately, I get a clearly structured &amp;ldquo;learning entry point&amp;rdquo; instead of a flood of raw materials.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following flowchart illustrates how NotebookLM compresses complex documents into a learnable structure:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/042f3817d5b5c24e7bd54b9638272151.svg" data-img="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/042f3817d5b5c24e7bd54b9638272151.svg" alt="Figure 1: NotebookLM Document Structuring Flow" data-caption="Figure 1: NotebookLM Document Structuring Flow"
width="551"
height="833"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: NotebookLM Document Structuring Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In the end, what I gain is an &amp;ldquo;organized knowledge system&amp;rdquo; rather than a pile of PDFs waiting to be consumed.&lt;/p&gt;
&lt;h2 id="generating-mindmaps-instantly-turning-large-documents-into-structured-knowledge-graphs"&gt;Generating MindMaps: Instantly Turning Large Documents into Structured Knowledge Graphs&lt;/h2&gt;
&lt;p&gt;I rely heavily on MindMaps to build the &amp;ldquo;skeleton of knowledge.&amp;rdquo; NotebookLM&amp;rsquo;s MindMap feature stands out for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Automatically identifying relationships between topics&lt;/li&gt;
&lt;li&gt;Interactive node expansion and collapse&lt;/li&gt;
&lt;li&gt;Integrating multiple source documents&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although it currently only exports PNG, the logical structure itself is already an excellent &amp;ldquo;knowledge compression.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The table below compares the auto-generation and visualization capabilities of different tools:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Auto-Generation&lt;/th&gt;
&lt;th&gt;Multi-Doc Integration&lt;/th&gt;
&lt;th&gt;Visualization Quality&lt;/th&gt;
&lt;th&gt;Export Formats&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NotebookLM&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;PNG only (SVG not yet supported)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Common LLM Tools&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Weak&lt;/td&gt;
&lt;td&gt;Poor&lt;/td&gt;
&lt;td&gt;Depends on tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MindMap Software (Manual)&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;Fully supported&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Comparison of MindMap Capabilities in Mainstream Tools
&lt;/figcaption&gt;
&lt;p&gt;NotebookLM&amp;rsquo;s greatest advantage is &lt;strong&gt;automation&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="generating-teaching-outlines-training-scripts-and-book-structures-truly-saving-me-time"&gt;Generating Teaching Outlines, Training Scripts, and Book Structures: Truly Saving Me Time&lt;/h2&gt;
&lt;p&gt;NotebookLM is more than just &amp;ldquo;summarization&amp;rdquo;—it can generate &lt;strong&gt;formal teaching structures&lt;/strong&gt; based on my prompts. By feeding in project docs, API references, architecture designs, case studies, videos, and blogs, and prompting it to generate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Teaching outlines&lt;/li&gt;
&lt;li&gt;Project training manuals&lt;/li&gt;
&lt;li&gt;Course structures&lt;/li&gt;
&lt;li&gt;Book chapter frameworks&lt;/li&gt;
&lt;li&gt;Slide text&lt;/li&gt;
&lt;li&gt;Training case descriptions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For anyone who needs to create content, conduct training, or give presentations, this feature is a huge time-saver.&lt;/p&gt;
&lt;p&gt;Below is a typical prompt I actually use:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;Based on the provided content excerpts, write a detailed training manual that systematically explains the core principles covered. The manual should use a professional and instructional tone, breaking down complex concepts into actionable steps and lessons. Ensure all content is strictly based on the source material and covers every aspect mentioned.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;The training manual should include:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1. Training objectives and expected outcomes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2. Training content and structure
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;3. Training methods and tools
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;4. Training evaluation and feedback
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;5. Training summary and follow-up actions
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;6. Training cases and examples
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;7. Training resources and references
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The results are often surprisingly good.&lt;/p&gt;
&lt;h2 id="multi-format-input-capability-the-most-stable-ive-seen"&gt;Multi-Format Input Capability: The Most Stable I&amp;rsquo;ve Seen&lt;/h2&gt;
&lt;p&gt;NotebookLM supports direct ingestion of various material types, with extremely stable parsing. The table below summarizes my actual experience:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input Type&lt;/th&gt;
&lt;th&gt;My Actual Experience&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PDF&lt;/td&gt;
&lt;td&gt;Most stable, clear structure parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google Docs&lt;/td&gt;
&lt;td&gt;Syncs instantly, very smooth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Word / PPT&lt;/td&gt;
&lt;td&gt;Recognized normally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;YouTube Video&lt;/td&gt;
&lt;td&gt;Auto-summary + key content extraction, very useful&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Website URL&lt;/td&gt;
&lt;td&gt;Depends on site structure, high success rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plain Text&lt;/td&gt;
&lt;td&gt;No issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Images&lt;/td&gt;
&lt;td&gt;Partial success, sufficient for screenshots&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: NotebookLM Multi-Format Input Experience
&lt;/figcaption&gt;
&lt;p&gt;By contrast, other tools often have format parsing issues, garbled text, missing content, or skipped paragraphs. NotebookLM is especially stable in &amp;ldquo;multi-format ingestion.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="my-most-common-notebooklm-workflow"&gt;My Most Common NotebookLM Workflow&lt;/h2&gt;
&lt;p&gt;The following flowchart shows my daily workflow with NotebookLM:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/95790bef2620a5625da7e72caea7bb00.svg" data-img="https://assets.jimmysong.io/images/blog/notebooklm-learning-and-knowledge-organization/95790bef2620a5625da7e72caea7bb00.svg" alt="Figure 2: NotebookLM Daily Workflow" data-caption="Figure 2: NotebookLM Daily Workflow"
width="1566"
height="532"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: NotebookLM Daily Workflow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Essentially: let AI help me grasp the big picture → then dive deeper → then output content.&lt;/p&gt;
&lt;h2 id="my-suggestions-and-minor-regrets"&gt;My Suggestions and Minor Regrets&lt;/h2&gt;
&lt;p&gt;NotebookLM is already excellent, but I still have some strong expectations for future improvements:&lt;/p&gt;
&lt;h3 id="mindmap-export-formats-should-support-svg-or-text-based-markmap"&gt;MindMap Export Formats Should Support SVG or Text-Based (Markmap)&lt;/h3&gt;
&lt;p&gt;Currently, only PNG is supported, which gets blurry when enlarged. The table below lists my expectations for future features:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Expected Feature&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SVG Export&lt;/td&gt;
&lt;td&gt;For writing books, making slides, scalable without loss&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Markmap Output&lt;/td&gt;
&lt;td&gt;Most friendly for Markdown writers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw JSON&lt;/td&gt;
&lt;td&gt;Allows custom rendering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Expected MindMap Export Formats
&lt;/figcaption&gt;
&lt;p&gt;I&amp;rsquo;m especially looking forward to NotebookLM supporting &lt;a href="https://markmap.js.org" target="_blank" rel="noopener"&gt;Markmap format&lt;/a&gt; export, which would be extremely friendly for users who write blogs and docs in Markdown.&lt;/p&gt;
&lt;p&gt;Recently, Google also launched &lt;a href="https://codewiki.google" target="_blank" rel="noopener"&gt;CodeWiki&lt;/a&gt;, similar to &lt;a href="https://deepwiki.com" target="_blank" rel="noopener"&gt;DeepWiki&lt;/a&gt;, which auto-generates image-rich Wikis for GitHub projects, but currently does not support Mermaid or Markmap.&lt;/p&gt;
&lt;h3 id="conversation-history-should-support-long-term-saving"&gt;Conversation History Should Support Long-Term Saving&lt;/h3&gt;
&lt;p&gt;Currently:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chats are not persistently saved&lt;/li&gt;
&lt;li&gt;Only manually &amp;ldquo;add to notes&amp;rdquo; preserves results&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This causes some knowledge context to be lost. I hope to see a &amp;ldquo;Notebook conversation history&amp;rdquo; feature in the future.&lt;/p&gt;
&lt;h3 id="slide-generation-should-support-templates-for-content-creators"&gt;Slide Generation Should Support Templates for Content Creators&lt;/h3&gt;
&lt;p&gt;Currently, Video Overview offers various visual styles, but cannot:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Upload custom PPT templates&lt;/li&gt;
&lt;li&gt;Apply enterprise/personal branding templates&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If PPT template support is added, NotebookLM could become the &amp;ldquo;video generation hub&amp;rdquo; for content creators.&lt;/p&gt;
&lt;h3 id="deep-research-should-launch-soon-and-be-fully-open"&gt;Deep Research Should Launch Soon and Be Fully Open&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;m especially looking forward to this feature, as it could upgrade NotebookLM from a &amp;ldquo;knowledge organization tool&amp;rdquo; to a &amp;ldquo;research-grade tool.&amp;rdquo; I hope it will:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reliably crawl more public web pages&lt;/li&gt;
&lt;li&gt;Ensure citation quality&lt;/li&gt;
&lt;li&gt;Integrate with existing Notebook materials&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a major upgrade I personally care about.&lt;/p&gt;
&lt;h3 id="mobile-experience-should-be-enhanced-beyond-content-playback"&gt;Mobile Experience Should Be Enhanced Beyond Content Playback&lt;/h3&gt;
&lt;p&gt;Currently, the mobile experience is minimal, only allowing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Listening to audio&lt;/li&gt;
&lt;li&gt;Viewing Notebook Guide summaries&lt;/li&gt;
&lt;li&gt;Simple Q&amp;amp;A&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I hope mobile will soon support:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Editing Notebooks&lt;/li&gt;
&lt;li&gt;Deep conversations&lt;/li&gt;
&lt;li&gt;MindMap interaction&lt;/li&gt;
&lt;li&gt;Content output (generating docs, outlines, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;NotebookLM is truly one of the AI tools I use every single day because it achieves a critical goal:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Organizing information, structuring knowledge, so I don&amp;rsquo;t have to start from scratch with massive documents.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Whether it&amp;rsquo;s:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Learning new technologies&lt;/li&gt;
&lt;li&gt;Reading long documents&lt;/li&gt;
&lt;li&gt;Creating courses&lt;/li&gt;
&lt;li&gt;Conducting training&lt;/li&gt;
&lt;li&gt;Writing books&lt;/li&gt;
&lt;li&gt;Drafting speeches&lt;/li&gt;
&lt;li&gt;Summarizing content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It saves me a huge amount of time upfront, letting me focus on &amp;ldquo;understanding&amp;rdquo; and &amp;ldquo;creating.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll continue to use NotebookLM as one of my essential tools and keep an eye on its progress in Deep Research, template systems, and mobile features.&lt;/p&gt;
&lt;p&gt;This is a tool truly designed for &amp;ldquo;knowledge workers&amp;rdquo; and deserves to be known by more people.&lt;/p&gt;</content:encoded></item><item><title>Helm v4: Paradigm Convergence and Plugin System Rebuild</title><link>https://jimmysong.io/blog/helm-4-delivery-and-plugin-rebuild/</link><pubDate>Fri, 14 Nov 2025 11:18:30 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/helm-4-delivery-and-plugin-rebuild/</guid><description>An analysis of Helm 4&amp;#39;s core changes, including Server-Side Apply, WASM plugin system, kstatus status model, reproducible builds, and content hash caching, with a timeline review of Helm&amp;#39;s history.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The release of Helm 4 is not just a technical upgrade, but a deep convergence of cloud-native delivery paradigms. The rebuilt plugin system and supply chain governance capabilities make Helm once again a driving force in the Kubernetes ecosystem.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Since its first release in 2016, Helm has been one of the most important application distribution tools in the Kubernetes ecosystem. &lt;a href="https://github.com/helm/helm/releases/tag/v4.0.0" target="_blank" rel="noopener"&gt;Helm v4&lt;/a&gt; is not a &amp;ldquo;minor enhancement,&amp;rdquo; but a comprehensive update around &lt;strong&gt;delivery methods, extension mechanisms, and supply chain approaches&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This article reconstructs Helm&amp;rsquo;s historical context and focuses on why Helm 4 represents a paradigm-converging release.&lt;/p&gt;
&lt;h2 id="helm-from-tiller-to-declarative-delivery"&gt;Helm: From Tiller to Declarative Delivery&lt;/h2&gt;
&lt;p&gt;Below is a textual timeline showing key milestones from Helm v2 to v4, helping you understand its technical evolution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2016: Helm v2 released, using the Tiller architecture.&lt;/li&gt;
&lt;li&gt;2017: Chart Hub expands, major projects begin providing official Charts.&lt;/li&gt;
&lt;li&gt;2018: Security model controversies intensify, Tiller&amp;rsquo;s permission issues become apparent.&lt;/li&gt;
&lt;li&gt;2019: Helm v3 released, Tiller removed, OCI support introduced.&lt;/li&gt;
&lt;li&gt;2021: GitOps becomes widespread, Server-Side Apply (SSA) becomes the mainstream delivery semantic.&lt;/li&gt;
&lt;li&gt;2023: kstatus widely adopted for controller status assessment and health calculation.&lt;/li&gt;
&lt;li&gt;2025: Helm v4 released, bringing SSA, WASM plugins, reproducible builds, and content hash caching.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each major Helm release closely follows Kubernetes paradigms, driving progress in declarative delivery and ecosystem tooling.&lt;/p&gt;
&lt;h2 id="fundamental-changes-in-helm-v4"&gt;Fundamental Changes in Helm v4&lt;/h2&gt;
&lt;p&gt;This section analyzes the core technical upgrades and paradigm shifts in Helm v4.&lt;/p&gt;
&lt;h3 id="delivery-paradigm-update-default-server-side-apply-ssa-server-side-apply"&gt;Delivery Paradigm Update: Default Server-Side Apply (SSA, Server-Side Apply)&lt;/h3&gt;
&lt;p&gt;In Helm v3 and earlier, Helm used a &amp;ldquo;three-way merge&amp;rdquo; model for resource delivery. Helm v4 fully switches to &lt;strong&gt;Server-Side Apply (SSA, Server-Side Apply)&lt;/strong&gt;, meaning the API Server determines field ownership.&lt;/p&gt;
&lt;p&gt;This shift brings several direct results:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Full semantic alignment with &lt;code&gt;kubectl apply&lt;/code&gt; and GitOps controllers (such as Argo, Flux)&lt;/li&gt;
&lt;li&gt;When multiple controllers manage the same object, silent overrides are avoided and conflicts are explainable&lt;/li&gt;
&lt;li&gt;Helm&amp;rsquo;s behavior now follows Kubernetes&amp;rsquo; officially recommended declarative delivery paradigm&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following flowchart compares the delivery semantics of Helm v3 and v4.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/helm-4-delivery-and-plugin-rebuild/f34683c90a9f13678e2cde12ab355e2f.svg" data-img="https://assets.jimmysong.io/images/blog/helm-4-delivery-and-plugin-rebuild/f34683c90a9f13678e2cde12ab355e2f.svg" alt="Figure 1: Helm v3/v4 Delivery Semantics Comparison" data-caption="Figure 1: Helm v3/v4 Delivery Semantics Comparison"
width="2400"
height="377"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Helm v3/v4 Delivery Semantics Comparison&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Helm is now aligned with the delivery semantics of modern Kubernetes versions, improving predictability and safety in resource management.&lt;/p&gt;
&lt;h3 id="kstatus-driven-wait-behavior-and-readiness-annotations"&gt;kstatus-Driven Wait Behavior and Readiness Annotations&lt;/h3&gt;
&lt;p&gt;In Helm 3, &lt;code&gt;--wait&lt;/code&gt; could only make fuzzy status judgments on limited resources, lacking extensibility and explainability.&lt;/p&gt;
&lt;p&gt;Helm 4 introduces &lt;strong&gt;kstatus (Kubernetes Status)&lt;/strong&gt; as the basis for health status parsing, and supports two key annotations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;helm.sh/readiness-success&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;helm.sh/readiness-failure&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Chart authors can precisely define conditions for installation success or failure. Helm&amp;rsquo;s waiting model now offers &amp;ldquo;explainability + extensibility,&amp;rdquo; upgrading from a &amp;ldquo;templating tool&amp;rdquo; to a true &amp;ldquo;deployment orchestrator.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="extension-system-rebuild-wasm-plugin-system"&gt;Extension System Rebuild: WASM Plugin System&lt;/h3&gt;
&lt;p&gt;Helm 4 thoroughly reconstructs the plugin model, mainly including:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Typed and Structured Plugins&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Arbitrary scripts are no longer allowed; plugins must follow typed and structured standards&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;WebAssembly Plugin Runtime (Extism)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More secure (sandbox isolation)&lt;/li&gt;
&lt;li&gt;Cross-language support&lt;/li&gt;
&lt;li&gt;Easy unified management in CI/CD and enterprise platforms&lt;/li&gt;
&lt;li&gt;Predictable and testable&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Post-renderer Integrated into Plugin System&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Moves beyond the &amp;ldquo;external executable black box&amp;rdquo; era&lt;/li&gt;
&lt;li&gt;Helm becomes a programmable platform, not just a template renderer&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="engineering-capabilities-upgrade-reproducible-builds-content-hash-caching-chart-api-v3"&gt;Engineering Capabilities Upgrade: Reproducible Builds, Content Hash Caching, chart API v3&lt;/h3&gt;
&lt;p&gt;Helm v4 brings the following engineering improvements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chart packaging is reproducible (supports signing, SBOM, SLSA, etc. for supply chain governance)&lt;/li&gt;
&lt;li&gt;Local cache uses content hashes, avoiding version-based conflicts&lt;/li&gt;
&lt;li&gt;chart API v3 (experimental) is stricter and more flexible&lt;/li&gt;
&lt;li&gt;SDK logging system upgraded to Go slog (modern logging)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities enable Helm charts to enter serious software supply chain governance.&lt;/p&gt;
&lt;h2 id="feature-comparison-helm-v3--v4"&gt;Feature Comparison (Helm v3 → v4)&lt;/h2&gt;
&lt;p&gt;The table below compares core features between Helm v3 and v4 for a quick understanding of the upgrade value.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Helm 3&lt;/th&gt;
&lt;th&gt;Helm 4&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Apply Model&lt;/td&gt;
&lt;td&gt;Three-way merge&lt;/td&gt;
&lt;td&gt;Default SSA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wait Behavior&lt;/td&gt;
&lt;td&gt;Fuzzy, not extensible&lt;/td&gt;
&lt;td&gt;kstatus + annotation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Plugin System&lt;/td&gt;
&lt;td&gt;Script, uncontrollable&lt;/td&gt;
&lt;td&gt;WASM, typed plugins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-renderer&lt;/td&gt;
&lt;td&gt;External executable&lt;/td&gt;
&lt;td&gt;Plugin subsystem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Build&lt;/td&gt;
&lt;td&gt;Not reproducible&lt;/td&gt;
&lt;td&gt;Reproducible build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache&lt;/td&gt;
&lt;td&gt;name/version&lt;/td&gt;
&lt;td&gt;Content hash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chart API&lt;/td&gt;
&lt;td&gt;v2&lt;/td&gt;
&lt;td&gt;v2 + v3 (experimental)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDK Logs&lt;/td&gt;
&lt;td&gt;stdlib log&lt;/td&gt;
&lt;td&gt;slog&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Helm v3 vs v4 Feature Comparison
&lt;/figcaption&gt;
&lt;p&gt;This is a release that &amp;ldquo;repays technical debt in bulk + aligns with contemporary Kubernetes semantics.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="why-is-helm-v4-a-paradigm-convergence-event"&gt;Why Is Helm v4 a Paradigm Convergence Event?&lt;/h2&gt;
&lt;p&gt;The release of Helm v4 is not just a feature upgrade, but a deep convergence of delivery paradigms, mainly in three aspects:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kubernetes Delivery Semantics Unified to SSA&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Previously: kubectl, GitOps, and Helm each had their own logic.
Now: All unified to SSA, consistent delivery behavior, smoother ecosystem collaboration.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Plugin System Enters the Platform Era&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;WASM (WebAssembly) brings a secure, universal, and controllable plugin runtime. Infrastructure projects widely adopt WASM: Envoy → WASM Filters, Kubernetes → WASM CRI/OCI, and now Helm joins the platform camp.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Charts Enter Supply Chain Governance&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reproducible builds and digest verification allow Helm charts to be managed as seriously as container images, greatly enhancing supply chain security.&lt;/p&gt;
&lt;p&gt;The entire ecosystem moves to a unified capability baseline, driving cloud-native delivery standardization.&lt;/p&gt;
&lt;h2 id="my-helm-history-and-observations"&gt;My Helm History and Observations&lt;/h2&gt;
&lt;p&gt;As an early user from the Helm v2 era, I have experienced the following stages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tiller security controversies&lt;/li&gt;
&lt;li&gt;v3 migration (state stored in secrets)&lt;/li&gt;
&lt;li&gt;Large-scale chart consolidation in the community&lt;/li&gt;
&lt;li&gt;OCI adoption&lt;/li&gt;
&lt;li&gt;Today&amp;rsquo;s SSA / WASM / reproducible build&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each major Helm version upgrade is not about chasing trends, but proactively aligning with Kubernetes paradigms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;v3 aligns with K8s &amp;ldquo;no cluster-side runtime&amp;rdquo; principle&lt;/li&gt;
&lt;li&gt;v4 aligns with SSA, kstatus, WASM, OCI, and other advances from the past five years&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Helm exemplifies the evolution rhythm of infrastructure projects: &lt;strong&gt;not by piling on features, but by evolving in semantic alignment with the platform.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The release of Helm v4 marks a new paradigm for Kubernetes application delivery. SSA, WASM plugins, kstatus, and reproducible builds make Helm not just a templating tool, but a core for supply chain governance and platform extensibility. For cloud-native developers and platform teams, Helm v4 is a paradigm upgrade worth attention.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/helm/helm/releases/tag/v4.0.0" target="_blank" rel="noopener"&gt;Helm v4.0.0 Release - github.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://helm.sh/docs/overview/" target="_blank" rel="noopener"&gt;Helm Documentation Overview - helm.sh&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://artifacthub.io/" target="_blank" rel="noopener"&gt;ArtifactHub Charts Index - artifacthub.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Kimi K2 Thinking: The True Awakening of China's Thinking Model</title><link>https://jimmysong.io/blog/kimi-k2-thinking-cn-awakening/</link><pubDate>Fri, 14 Nov 2025 08:25:26 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/kimi-k2-thinking-cn-awakening/</guid><description>Kimi K2 Thinking&amp;#39;s open source marks China&amp;#39;s entry into thinking models. This article reviews its technical approach and compares it with Claude and Gemini.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;China&amp;rsquo;s large language models have finally moved from &amp;ldquo;writing like humans&amp;rdquo; to &amp;ldquo;thinking like humans.&amp;rdquo; The open-sourcing of Kimi K2 is a watershed moment for China&amp;rsquo;s AI trajectory.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The narrative around China&amp;rsquo;s large language models is shifting from &amp;ldquo;Chat-style models&amp;rdquo; to &amp;ldquo;Thinking models (Thinking Model, Thinking Model).&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Moonshot AI&amp;rsquo;s open-sourcing of &lt;strong&gt;Kimi K2 Thinking&lt;/strong&gt; marks the first real landing of this transition. K2 is not just another iteration like ChatGLM or Qwen; it&amp;rsquo;s the first time a Chinese team has unified &amp;ldquo;deep reasoning + long context + tool invocation continuity&amp;rdquo; in training. This is the core of the thinking model approach and the reason why models like Claude and Gemini have led the field.&lt;/p&gt;
&lt;h2 id="the-significance-of-k2s-open-source-china-enters-the-era-of-thinking-models"&gt;The Significance of K2&amp;rsquo;s Open Source: China Enters the Era of Thinking Models&lt;/h2&gt;
&lt;p&gt;Why is K2&amp;rsquo;s open source a turning point? Because it enables Chinese models to achieve the following capabilities for the first time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stable execution of 200–300 tool invocations (toolchain reasoning stability)&lt;/li&gt;
&lt;li&gt;Deep, multi-stage reasoning chain execution (CoT Consistency, Chain-of-Thought Consistency)&lt;/li&gt;
&lt;li&gt;256k context as a &amp;ldquo;working memory&amp;rdquo; (Working Memory, Working Memory)&lt;/li&gt;
&lt;li&gt;Native INT4 acceleration + MoE activation sparsity scheduling&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is a completely different path from &amp;ldquo;stacking parameters → stacking benchmarks,&amp;rdquo; emphasizing reasoning ability over parameter scale.&lt;/p&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;K2 is the first time a Chinese model has entered the sequence of thinking models (Thinking Model, Thinking Model).&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="dissecting-k2s-technical-approach"&gt;Dissecting K2&amp;rsquo;s Technical Approach&lt;/h2&gt;
&lt;p&gt;K2&amp;rsquo;s technical approach can be broken down into five key points, each directly impacting the model&amp;rsquo;s reasoning ability and ecosystem adaptability.&lt;/p&gt;
&lt;h3 id="moe-expert-division-cognitive-division-rather-than-parameter-expansion"&gt;MoE Expert Division: Cognitive Division Rather Than Parameter Expansion&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s MoE (Mixture of Experts, Mixture of Experts) design philosophy is distinct from previous models. The core is not about activating fewer parameters or running larger models more cheaply, but about assigning different cognitive sub-skills to different experts. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Mathematical reasoning expert&lt;/li&gt;
&lt;li&gt;Planning expert&lt;/li&gt;
&lt;li&gt;Tool invocation expert&lt;/li&gt;
&lt;li&gt;Browser task expert&lt;/li&gt;
&lt;li&gt;Code generation expert&lt;/li&gt;
&lt;li&gt;Long-chain retention expert&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This division aligns directly with Claude 3.5&amp;rsquo;s cognitive layering (Cognitive Layering, Cognitive Layering) approach. K2&amp;rsquo;s MoE is about &amp;ldquo;dividing thinking among the model,&amp;rdquo; not just &amp;ldquo;making computation cheaper.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="256k-context-building-the-models-working-memory"&gt;256K Context: Building the Model&amp;rsquo;s Working Memory&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s ultra-long context is not just a parameter showcase; it&amp;rsquo;s designed to build the model&amp;rsquo;s &amp;ldquo;thinking buffer.&amp;rdquo; It allows the entire process to retain reasoning chains, tool invocation states, multi-stage reflection, and uninterrupted long tasks (such as research or code refactoring), stably executing multi-stage agent workflows. Long-term thinking requires long-term memory support, and K2&amp;rsquo;s long context is the &amp;ldquo;memory&amp;rdquo; for sustained reasoning chains.&lt;/p&gt;
&lt;h3 id="intertwined-training-of-tool-invocation-and-reasoning-chains"&gt;Intertwined Training of Tool Invocation and Reasoning Chains&lt;/h3&gt;
&lt;p&gt;K2 excels in the intertwined training of tool invocation and reasoning chains. Traditional open-source models typically follow this process:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Generate reasoning&lt;/li&gt;
&lt;li&gt;Output JSON function call&lt;/li&gt;
&lt;li&gt;Tool returns result&lt;/li&gt;
&lt;li&gt;Continue reasoning&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In this approach, the reasoning chain and invocation chain are separated. K2&amp;rsquo;s training allows the reasoning chain to invoke tools at any time and feed tool results back into the reasoning chain for the next stage of thinking. It supports 200–300 consecutive tool invocations without interruption, fully aligning with Claude 3.5&amp;rsquo;s Interleaved CoT + Tool Use.&lt;/p&gt;
&lt;h3 id="native-int4-quantization-ensuring-reasoning-chain-stability"&gt;Native INT4 Quantization: Ensuring Reasoning Chain Stability&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s INT4 (INT4, 4-bit Integer Quantization) approach is not ordinary post-quantization. Its purpose is not only to reduce memory usage and increase throughput, but more importantly, to ensure that deep reasoning chains do not break due to insufficient computing power. The biggest killer of deep thinking chains is timeout, freezing, or unstable workers. INT4 enables Chinese GPUs (non-H100) to run complete reasoning chains, which is highly significant for China&amp;rsquo;s ecosystem.&lt;/p&gt;
&lt;h3 id="moe--long-context--toolchain-unified-training-rather-than-module-stitching"&gt;MoE + Long Context + Toolchain: Unified Training Rather Than Module Stitching&lt;/h3&gt;
&lt;p&gt;K2&amp;rsquo;s most important feature is its holistic training approach: expert division, long context-driven consistency, tool invocation trained through real execution, browser tasks and long-step task reinforcement, and INT4 entering the training loop. It&amp;rsquo;s not a &amp;ldquo;ChatLLM + Memory + RAG + Tools&amp;rdquo; patchwork, but an integrated reasoning system.&lt;/p&gt;
&lt;h2 id="alignment-and-differences-between-k2-and-international-mainstream-approaches"&gt;Alignment and Differences Between K2 and International Mainstream Approaches&lt;/h2&gt;
&lt;p&gt;K2 is highly aligned with international mainstream models (such as Claude, Gemini, OpenAI) in cognitive reasoning, ultra-long context, and tool invocation mechanisms, but also has unique advantages for Chinese models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Native INT4 + adaptation to Chinese computing power is rare globally&lt;/li&gt;
&lt;li&gt;Toolchain continuity is more stable than most open-source models&lt;/li&gt;
&lt;li&gt;Higher degree of open source, stronger ecosystem reusability&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="collaborative-value-of-chinas-ai-infra-k2--rlinf--mem-alpha"&gt;Collaborative Value of China&amp;rsquo;s AI Infra: K2 × RLinf × Mem-alpha&lt;/h2&gt;
&lt;p&gt;A series of important open-source infrastructures have emerged in the K2 ecosystem. The table below summarizes these project types and their value to K2:&lt;/p&gt;
&lt;p&gt;Here is a comparison table of the collaborative value of each infrastructure with K2:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Value to K2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RLinf&lt;/td&gt;
&lt;td&gt;Reinforcement Learning&lt;/td&gt;
&lt;td&gt;Used to train stronger planning/browser task capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mem-alpha&lt;/td&gt;
&lt;td&gt;Memory Enhancement&lt;/td&gt;
&lt;td&gt;Can be combined with K2 to form long-term memory agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AgentDebug&lt;/td&gt;
&lt;td&gt;Agent Error Debugging&lt;/td&gt;
&lt;td&gt;Used to analyze K2&amp;rsquo;s toolchain errors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI-Genie&lt;/td&gt;
&lt;td&gt;GUI Agent Training&lt;/td&gt;
&lt;td&gt;Can serve as an experimental field for K2&amp;rsquo;s agent capability expansion&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Collaborative Value of China&amp;rsquo;s AI Infra Ecosystem
&lt;/figcaption&gt;
&lt;p&gt;This combination is already forming a China AI Agent Infra Stack.&lt;/p&gt;
&lt;h2 id="personal-view-the-significance-of-k2s-approach"&gt;Personal View: The Significance of K2&amp;rsquo;s Approach&lt;/h2&gt;
&lt;p&gt;I believe the significance of K2 lies not in the model itself, but in its technical approach:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;K2 marks the first time Chinese models have shifted from &amp;ldquo;language generation competition&amp;rdquo; to &amp;ldquo;thinking ability competition.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;For the past three years, the main line of China&amp;rsquo;s open-source models has been evaluation scores, parameter scale, instruction following, and alignment data. But K2 is the first to clearly take the path of deep reasoning, tool intertwining, cognitive division, long-term task chains, and native performance optimization. This means China&amp;rsquo;s model trajectory is now synchronized with the US, rather than chasing old paths.&lt;/p&gt;
&lt;h2 id="key-directions-to-watch-in-k2s-ecosystem-over-the-next-year"&gt;Key Directions to Watch in K2&amp;rsquo;s Ecosystem Over the Next Year&lt;/h2&gt;
&lt;p&gt;K2&amp;rsquo;s future ecosystem influence will depend on several key points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether it opens the tool registry (Tool Registry, Tool Registry)&lt;/li&gt;
&lt;li&gt;Whether it supports dynamic memory (Mem-alpha integration)&lt;/li&gt;
&lt;li&gt;Whether it opens the MoE expert structure&lt;/li&gt;
&lt;li&gt;Whether it can form a Chinese reasoning chain optimization path with vLLM / llm-d / KServe&lt;/li&gt;
&lt;li&gt;Whether it supports fault tolerance for multi-node continuous reasoning chains&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities will determine K2&amp;rsquo;s ecosystem influence and technical extensibility.&lt;/p&gt;
&lt;h2 id="k2-thinking-model-architecture-diagram"&gt;K2 Thinking Model Architecture Diagram&lt;/h2&gt;
&lt;p&gt;The following flowchart illustrates the core architecture of the K2 thinking model and its collaboration with external agents/applications:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kimi-k2-thinking-cn-awakening/8883c2cf12acbe9362d56d664577b67c.svg" data-img="https://assets.jimmysong.io/images/blog/kimi-k2-thinking-cn-awakening/8883c2cf12acbe9362d56d664577b67c.svg" alt="Figure 1: K2 Thinking Model Architecture" data-caption="Figure 1: K2 Thinking Model Architecture"
width="1600"
height="1158"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: K2 Thinking Model Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;K2 is the first time China&amp;rsquo;s model trajectory is heading in the right direction:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;From &amp;ldquo;writing like humans&amp;rdquo; to &amp;ldquo;thinking like humans.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The era of thinking models is coming, and Chinese models are finally standing on the same roadmap as the international forefront.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://moonshotai.github.io/Kimi-K2/thinking.html" target="_blank" rel="noopener"&gt;Introducing Kimi K2 Thinking - moonshot.github.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/moonshotai/Kimi-K2-Thinking" target="_blank" rel="noopener"&gt;moonshotai/Kimi-K2-Thinking - huggingface.co&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Beyond Gateway: Inference Traffic Control Practice with Gateway API Inference Extension</title><link>https://jimmysong.io/blog/gateway-api-inference-extension-inference-traffic-control/</link><pubDate>Fri, 14 Nov 2025 08:19:16 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/gateway-api-inference-extension-inference-traffic-control/</guid><description>Exploring how Gateway API Inference Extension brings model-aware inference traffic control through InferencePool, InferenceObjective, and metrics-driven routing.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;AI inference traffic governance is undergoing a paradigm shift. The Gateway API Inference Extension makes &amp;ldquo;model awareness&amp;rdquo; the new 主线 of traffic control.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="brief-review-of-gateway-apis-current-state"&gt;Brief Review of Gateway API&amp;rsquo;s Current State&lt;/h2&gt;
&lt;p&gt;Kubernetes Gateway API has entered the stable v1 series and continued iterating after the 1.0 GA release, enhancing advanced traffic governance capabilities such as WebSocket, timeout and retry, Service Mesh integration, GRPCRoute, request mirroring, CORS, Retry Budget, and more. Major cloud providers and gateway implementations (such as Alibaba Cloud ACK, GKE Gateway, Envoy Gateway, NGINX Gateway Fabric) have all adopted Gateway API as the new generation north-south traffic model.&lt;/p&gt;
&lt;p&gt;Building on this foundation, the community has proposed an extension specification specifically for AI inference traffic - the &lt;strong&gt;Gateway API Inference Extension&lt;/strong&gt;. This specification is not about &amp;ldquo;reinventing an API&amp;rdquo; but rather supplementing the Gateway API core model with &lt;strong&gt;model-aware load balancing and traffic control capabilities&lt;/strong&gt; for Large Language Model (LLM) and other inference scenarios.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Early documentation often mentioned the InferenceModel + InferencePool CRD combination; the latest specification has evolved to InferenceObjective + InferencePool (with optional InferencePoolImport). This article consistently uses the latest terminology.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="typical-challenges-in-genai-inference-traffic"&gt;Typical Challenges in GenAI Inference Traffic&lt;/h2&gt;
&lt;p&gt;Traditional gateway, Ingress, and Service Mesh load balancing models are essentially &amp;ldquo;request-agnostic + endpoint-agnostic&amp;rdquo;: they distribute traffic across a group of static backends through algorithms like round-robin, least requests, and hashing.&lt;/p&gt;
&lt;p&gt;In GPU-powered Large Language Model (LLM) inference scenarios, this model exposes obvious problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Invisible GPU utilization and queuing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A single LLM instance simultaneously maintains KV Cache, LoRA Adapter, and Token queues. Resource consumption varies significantly across the same batch of requests. Load balancing based solely on QPS or connection count can easily lead to extreme situations where &amp;ldquo;idle GPUs have no work while busy GPUs crash.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Lack of semantic binding between models and requests&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From the business perspective, one typically only sees a POST /v1/chat/completions endpoint, making it difficult to express intentions like &amp;ldquo;high-priority model / test version / canary weight&amp;rdquo; at the routing layer.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Difficult unified management of multiple models, versions, and LoRAs&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Each model service implements its own routing and A/B testing solutions, making it difficult for the platform to achieve unified governance and observability at the control plane level.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The goal of Gateway API Inference Extension is to introduce AI-specific semantics and metrics into load balancing and traffic control decisions while maintaining the existing Gateway model.&lt;/p&gt;
&lt;h2 id="core-concepts-and-resource-model-of-gateway-api-inference-extension"&gt;Core Concepts and Resource Model of Gateway API Inference Extension&lt;/h2&gt;
&lt;p&gt;This section introduces the overall architecture and key resource objects of the Inference Extension.&lt;/p&gt;
&lt;h3 id="overall-architecture"&gt;Overall Architecture&lt;/h3&gt;
&lt;p&gt;The Inference Extension uses Envoy External Processing (ext-proc) mechanism to upgrade Gateway API + ext-proc capable gateways (such as Envoy Gateway, kgateway, GKE Gateway) into &lt;strong&gt;Inference Gateways&lt;/strong&gt;. Requests still follow the standard Gateway + HTTPRoute path, but before being forwarded to the backend, they pass through an &amp;ldquo;Endpoint Picker&amp;rdquo; component that selects the most suitable backend instance based on real-time metrics exposed by the model server.&lt;/p&gt;
&lt;p&gt;The flowchart below shows the overall architecture:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/gateway-api-inference-extension-inference-traffic-control/777d6816e4df5acaf96f5e28b01bc441.svg" data-img="https://assets.jimmysong.io/images/blog/gateway-api-inference-extension-inference-traffic-control/777d6816e4df5acaf96f5e28b01bc441.svg" alt="Figure 1: Inference Extension Architecture Flow" data-caption="Figure 1: Inference Extension Architecture Flow"
width="1980"
height="140"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Inference Extension Architecture Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="inferencepool-platform-side-model-service-pool"&gt;InferencePool: Platform-Side &amp;ldquo;Model Service Pool&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;InferencePool is the core resource introduced by the Inference Extension, used to describe a group of inference Pods and routing plugin configurations. It is similar to a Service with a selector, responsible for selecting a group of model service Pods and specifying exposed ports, while also allowing attachment of Endpoint Picker plugins (such as Prefix-Cache-Aware, LoRA-aware, etc.).&lt;/p&gt;
&lt;p&gt;In the Gateway API model, InferencePool is treated as a type of &amp;ldquo;Backend&amp;rdquo; that can be referenced by HTTPRoute.backendRefs.&lt;/p&gt;
&lt;p&gt;The code block below shows a simplified example of InferencePool:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;inference.networking.x-k8s.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;InferencePool&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm-llama3-chat-pool&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;targetPortNumber&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm-llama3-chat&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;extensionRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;prefix-cache-aware-endpoint-picker&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The meaning of the above configuration is as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Select all Pods with app=vllm-llama3-chat, port 8000.&lt;/li&gt;
&lt;li&gt;Use the plugin named prefix-cache-aware-endpoint-picker to make routing decisions based on metrics such as KV Cache hit rate, queue length, and GPU utilization.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="inferenceobjective-business-side-request-objective"&gt;InferenceObjective: Business-Side &amp;ldquo;Request Objective&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;InferenceObjective is used to express the goal and priority of a single request, decoupled from the model service pool. One request corresponds to one InferenceObjective, and the same InferencePool can serve multiple different InferenceObjectives.&lt;/p&gt;
&lt;p&gt;Typical fields include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Business criticality (Critical / High / BestEffort)&lt;/li&gt;
&lt;li&gt;Required model family / version preference&lt;/li&gt;
&lt;li&gt;Acceptable latency / cost upper limits, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Endpoint Picker can combine InferenceObjective with backend metrics to make decisions: prioritize Critical requests when resources are constrained, and shed BestEffort requests when necessary.&lt;/p&gt;
&lt;h3 id="inferencepoolimport-cross-cluster--gateway-reuse"&gt;InferencePoolImport: Cross-Cluster / Gateway Reuse&lt;/h3&gt;
&lt;p&gt;InferencePoolImport supports importing InferencePools defined in remote clusters into the local cluster, facilitating consistent governance of multi-cluster, multi-region inference services.&lt;/p&gt;
&lt;h3 id="maturity-and-implementation-ecosystem-of-inference-extension"&gt;Maturity and Implementation Ecosystem of Inference Extension&lt;/h3&gt;
&lt;p&gt;The current project version is v1.1.x, overall in the &lt;strong&gt;Alpha&lt;/strong&gt; stage. Official recommendation is not to use it directly in production yet; it&amp;rsquo;s more suitable for platform teams to experiment with the technology stack.&lt;/p&gt;
&lt;p&gt;Multiple implementations and integrations already exist:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inference Gateway implementations for Envoy Gateway / kgateway&lt;/li&gt;
&lt;li&gt;GKE Inference Gateway: Enhanced capabilities based on GKE Gateway, including KV Cache-aware routing, LoRA reuse, priority scheduling, etc.&lt;/li&gt;
&lt;li&gt;NGINX Gateway Fabric, cloud provider ACK, and others are also following up on related extensions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, when designing practical solutions, it should be treated as a &amp;ldquo;forward-looking solution / future main path.&amp;rdquo; For production environments, prioritize managed implementations (such as GKE Inference Gateway) or commercial products from gateway vendors.&lt;/p&gt;
&lt;h2 id="practice-using-inference-extension-for-inference-traffic-control"&gt;Practice: Using Inference Extension for Inference Traffic Control&lt;/h2&gt;
&lt;p&gt;The following example uses a self-hosted LLM cluster on Kubernetes to demonstrate how to serve external traffic through an OpenAI-compatible interface and achieve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scheduling by request priority (real-time conversation vs. batch processing)&lt;/li&gt;
&lt;li&gt;Multi-version model canary and rollback&lt;/li&gt;
&lt;li&gt;Optimized routing using GPU metrics and KV Cache hit rates&lt;/li&gt;
&lt;li&gt;Unified platform-side observability and rate limiting entry point&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="deploying-gateway-api-and-inference-extension"&gt;Deploying Gateway API and Inference Extension&lt;/h3&gt;
&lt;p&gt;The deployment process is as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Install a Gateway API-compatible gateway implementation (such as Envoy Gateway, kgateway, GKE Gateway) in the cluster.&lt;/li&gt;
&lt;li&gt;Install the Gateway API Inference Extension CRD and control plane components.&lt;/li&gt;
&lt;li&gt;Enable the metrics endpoints and plugin protocols required by Inference Extension on the model server side (such as vLLM, Triton, TGI).&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="defining-inferencepool-abstracting-llm-pods-as-an-inference-pool"&gt;Defining InferencePool: Abstracting LLM Pods as an &amp;ldquo;Inference Pool&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;The code block below shows a typical InferencePool configuration:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;inference.networking.x-k8s.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;InferencePool&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;chat-llama3-pool&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;targetPortNumber&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;8000&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;chat-llama3&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;extensionRef&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;prefix-cache-aware&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# Optional: Plugin configuration ConfigMap / CR&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Key points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;selector selects LLM Pods, targetPortNumber specifies the inference service port.&lt;/li&gt;
&lt;li&gt;extensionRef binds the Endpoint Picker plugin, implementing KV Cache prefix-aware routing, selecting replicas with lighter load based on metrics like queue_length/gpu_utilization, and triggering load shedding when necessary.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="defining-inferenceobjective-connecting-business-intent-to-routing-decisions"&gt;Defining InferenceObjective: Connecting &amp;ldquo;Business Intent&amp;rdquo; to Routing Decisions&lt;/h3&gt;
&lt;p&gt;The code block below shows a sample InferenceObjective configuration (fields can be adjusted according to the actual version and implementation):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;inference.networking.x-k8s.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;InferenceObjective&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;chat-critical&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;criticality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;Critical &lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# Real-time conversation&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;preferredModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;llama3-70b&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;fallbackModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;llama3-8b&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nn"&gt;---&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;inference.networking.x-k8s.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;InferenceObjective&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;chat-batch&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;criticality&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;BestEffort &lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# Batch analysis, log summarization&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;preferredModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;llama3-8b&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The Endpoint Picker can combine InferenceObjective with InferencePool metrics to make the following decisions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When GPU is constrained and queues are too long, prioritize chat-critical requests and discard chat-batch if necessary.&lt;/li&gt;
&lt;li&gt;For Critical requests, prioritize the large model; if the target pool is unavailable, fall back to the small model pool.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="importing-business-traffic-to-inferencepool-via-httproute"&gt;Importing Business Traffic to InferencePool via HTTPRoute&lt;/h3&gt;
&lt;p&gt;The code block below shows the HTTPRoute configuration on the Gateway API side:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;gateway.networking.k8s.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;HTTPRoute&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;llm-route&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;parentRefs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;public-gateway&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;rules&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;PathPrefix&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;/v1/chat/completions&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;backendRefs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;group&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;inference.networking.x-k8s.io&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;InferencePool&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;chat-llama3-pool&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Some implementations (such as GKE Inference Gateway) can route based on the model field in the request body, mapping OpenAI-style model names to different InferencePool / InferenceObjective combinations.&lt;/p&gt;
&lt;h3 id="fine-grained-inference-traffic-control-using-metrics"&gt;Fine-Grained Inference Traffic Control Using Metrics&lt;/h3&gt;
&lt;p&gt;Inference Extension provides platform teams with a unified metrics system, including kv_cache_hits, gpu_utilization, request_queue_length, per-request inference duration, token count, and more.&lt;/p&gt;
&lt;p&gt;Based on these metrics, multi-level traffic control strategies can be built:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Priority + Capacity&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Set priority and capacity upper limits for different InferenceObjectives, automatically guaranteeing critical business when resources are constrained.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Rate Limiting by Cost / Token&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Aggregate token / latency metrics exposed by Inference Extension to Prometheus, then add cost-based rate limiting logic at the gateway / API Gateway level (such as total tokens per minute per user / application). The specification itself doesn&amp;rsquo;t mandate &amp;ldquo;Token-level rate limiting,&amp;rdquo; but provides observability and hooks for easy policy extension.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prefix Cache Aware Routing&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;For requests with shared context (such as RAG, template generation), enable the Prefix Cache Aware plugin to route requests with the same prefix to the same replica, maximizing KV Cache hit rates and significantly reducing TTFT.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Auto-Scaling Integration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Use metrics output by Inference Extension as HPA input to achieve &amp;ldquo;model-aware&amp;rdquo; auto-scaling, rather than relying solely on CPU / memory.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="relationship-and-trade-offs-with-traditional-gateway--service-mesh"&gt;Relationship and Trade-offs with Traditional Gateway / Service Mesh&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Control Plane&lt;/strong&gt;: Continue using Gateway API as the unified north-south / east-west traffic modeling specification. Service Mesh can still perform fine-grained circuit breaking, retry, mTLS, etc. within the cluster.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Plane&lt;/strong&gt;: Inference Extension pushes &amp;ldquo;model-aware routing&amp;rdquo; down to the ext-proc path implemented by the Gateway, avoiding redundant business-side wheel reinvention.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adoption Strategy&lt;/strong&gt;: In the current Alpha stage, prioritize productionized implementations (such as GKE Inference Gateway, commercial gateway vendor Inference Gateways), start with &amp;ldquo;bypass pilots&amp;rdquo; outside the critical path, and gradually migrate existing AI gateway routing rules to the Inference Extension model.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Combining official documentation and community implementations, we can see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Gateway API has become the Kubernetes standard north-south traffic model and continues to enhance in the 1.x series.&lt;/li&gt;
&lt;li&gt;Gateway API Inference Extension introduces GPU metrics, KV Cache, LoRA, and other inference semantics into load balancing decisions through InferencePool, InferenceObjective, and Endpoint Picker.&lt;/li&gt;
&lt;li&gt;The project is still in Alpha stage, but has achieved experimental or productionized adoption in implementations such as GKE, kgateway, and NGINX Gateway Fabric. It is one of the important future directions for inference traffic control.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Previous descriptions of Inference Extension as &amp;ldquo;built-in Token rate limiting CRD, AIInferencePolicy, and other objects&amp;rdquo; are no longer accurate and should all be replaced with the design based on InferencePool / InferenceObjective + metrics-driven approach.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2023/10/31/gateway-api-ga/" target="_blank" rel="noopener"&gt;Gateway API v1.0: GA Release - kubernetes.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2024/11/21/gateway-api-v1-2/" target="_blank" rel="noopener"&gt;Gateway API v1.2: WebSockets, Timeouts, Retries, and More - kubernetes.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gateway-api-inference-extension.sigs.k8s.io/" target="_blank" rel="noopener"&gt;Kubernetes Gateway API Inference Extension – Overview - gateway-api-inference-extension.sigs.k8s.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gateway-api-inference-extension.sigs.k8s.io/concepts/api-overview/" target="_blank" rel="noopener"&gt;API Overview – InferencePool / InferenceObjective / InferencePoolImport - gateway-api-inference-extension.sigs.k8s.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gateway-api-inference-extension.sigs.k8s.io/guides/metrics-and-observability/" target="_blank" rel="noopener"&gt;Metrics and Observability - gateway-api-inference-extension.sigs.k8s.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/blog/2025/04/21/deep-dive-into-the-gateway-api-inference-extension/" target="_blank" rel="noopener"&gt;Deep Dive into the Gateway API Inference Extension – cncf.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2025/xx/xx/gateway-api-inference-extension/" target="_blank" rel="noopener"&gt;Introducing Gateway API Inference Extension – kubernetes.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/about-gke-inference-gateway" target="_blank" rel="noopener"&gt;GKE Inference Gateway - cloud.google.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.nginx.com/nginx-gateway-fabric/" target="_blank" rel="noopener"&gt;NGINX Gateway Fabric – Kubernetes Gateway API and AI Inference - docs.nginx.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.alibabacloud.com/help/en/ack/product-overview/gateway-api" target="_blank" rel="noopener"&gt;Alibaba Cloud ACK – Gateway API Components and Versions - alibabacloud.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>TRAE SOLO vs VS Code: Rethinking Coding Tools from the Perspective of AI Engineering Entities</title><link>https://jimmysong.io/blog/trae-vs-vscode-insiders-agent-hq-and-ai-engineering-entity/</link><pubDate>Fri, 14 Nov 2025 07:14:39 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/trae-vs-vscode-insiders-agent-hq-and-ai-engineering-entity/</guid><description>A comparison of TRAE SOLO and VS Code (Copilot, Agent HQ) via the AI Engineering Entity framework, focusing on automation, collaboration, model transparency, and engineering roles.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Coding tools are evolving from &amp;ldquo;AI assistants&amp;rdquo; into true engineering entities. How can we reinterpret the roles of TRAE SOLO and VS Code from a pipeline perspective?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently, &lt;a href="https://www.trae.ai" target="_blank" rel="noopener"&gt;TRAE International Edition&lt;/a&gt; SOLO mode has been fully opened to overseas users. It claims to be a &amp;ldquo;responsive coding agent&amp;rdquo; and is now available for official trial, with token-based rate limiting.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve used early versions of TRAE (without SOLO access), and also tried Qoder and Kiro. The AI coding field is flourishing, each tool with its own strengths.&lt;/p&gt;
&lt;p&gt;Now, with GitHub&amp;rsquo;s &lt;a href="https://github.blog/news-insights/company-news/welcome-home-agents" target="_blank" rel="noopener"&gt;Agent HQ&lt;/a&gt; concept from Universe, and the &amp;ldquo;AI Engineering Entity (AIEE)&amp;rdquo; framework I wrote about in &lt;a href="https://jimmysong.io/book/ai-handbook/infra/ai-engineering-entity/"&gt;AI-Native Application Architecture&lt;/a&gt;, it&amp;rsquo;s time to re-examine today&amp;rsquo;s coding tool landscape.&lt;/p&gt;
&lt;p&gt;This article compares TRAE SOLO and VS Code (with Copilot, Plan/Agent mode, and Agent HQ) from the perspective of AI engineering entities, combining personal experience to outline their differences in engineering automation, collaboration, and governance.&lt;/p&gt;
&lt;h2 id="three-engineering-role-abstractions-end-to-end-executor-contextual-collaborator-and-expert-orchestrator"&gt;Three Engineering Role Abstractions: End-to-End Executor, Contextual Collaborator, and Expert Orchestrator&lt;/h2&gt;
&lt;p&gt;From an engineering perspective, current mainstream AI coding tools can be abstracted into three roles:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;End-to-End Executor&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Focuses on &amp;ldquo;requirement to deployment&amp;rdquo; workflows, capable of autonomous planning, task breakdown, coding, testing, previewing, and even deployment. Officially called &amp;ldquo;AI-Powered Context Engineer.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;User experience: like a &amp;ldquo;full-chain executor&amp;rdquo;—give it a requirement, and it handles the project, even if it&amp;rsquo;s slow or imperfect.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Contextual Collaborator&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VS Code is a powerful editor. Copilot has evolved from line-level completion to Chat, Plan agent, Agent mode, supporting multi-step tasks and codebase analysis.&lt;/li&gt;
&lt;li&gt;It doesn&amp;rsquo;t take over the whole project, but efficiently handles local tasks under your guidance, acting as an automated unit for specific segments.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Expert Orchestrator / Specialist Engine&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub&amp;rsquo;s Agent HQ is a &amp;ldquo;central platform for AI coding agents,&amp;rdquo; a unified control plane that can connect to OpenAI, Anthropic, Google, xAI, etc., run agents in parallel, and compare results.&lt;/li&gt;
&lt;li&gt;Functions as an &amp;ldquo;expert orchestrator&amp;rdquo; for key steps—planning, review, refactoring, or decision-making—providing high-quality output without taking over the entire project.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These three roles correspond to the structure in &amp;ldquo;AI Engineering Entity (AIEE)&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Single end-to-end executor (TRAE SOLO)&lt;/li&gt;
&lt;li&gt;Contextual collaborator residing in the IDE (VS Code + Copilot)&lt;/li&gt;
&lt;li&gt;Specialist orchestrator platform for multi-entity scheduling (Agent HQ)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="quick-product-status-check"&gt;Quick Product Status Check&lt;/h2&gt;
&lt;p&gt;To avoid memory bias, let&amp;rsquo;s clarify some key facts.&lt;/p&gt;
&lt;h3 id="trae--trae-solo"&gt;TRAE / TRAE SOLO&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;TRAE claims to be a &amp;ldquo;10x AI Engineer,&amp;rdquo; able to independently understand requirements and execute development tasks.&lt;/li&gt;
&lt;li&gt;SOLO mode is GA for international users, emphasizing full-chain automation, available directly but with token limits.&lt;/li&gt;
&lt;li&gt;Underlying open-source Trae Agent CLI can execute multi-step engineering tasks in real codebases.&lt;/li&gt;
&lt;li&gt;TraeIDE&amp;rsquo;s official page shows built-in Claude 3.5/3.7, DeepSeek, etc., but the community notes slow integration of new models like Claude Sonnet 4.5.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, &amp;ldquo;TRAE does not support Claude&amp;rdquo; is now inaccurate—at least officially, Claude models are included. Which model is used in SOLO mode and whether it&amp;rsquo;s exposed to users remains unclear; the experience still needs improvement.&lt;/p&gt;
&lt;h3 id="vs-code--copilot--agent-hq"&gt;VS Code + Copilot + Agent HQ&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Copilot in VS Code now features Chat, Plan agent, Todo/multi-step execution:
&lt;ul&gt;
&lt;li&gt;Plan mode analyzes codebases, generates execution plans, splits into Todos, then implementation agents execute step by step.&lt;/li&gt;
&lt;li&gt;Agent mode provides a more automated &amp;ldquo;multi-step companion programmer&amp;rdquo; experience.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;GitHub launched Agent HQ at Universe 2025, integrating Copilot and third-party agents (Anthropic, OpenAI, Google, xAI, Cognition, etc.) into a unified control plane, supporting parallel runs and result comparison.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;TRAE is like &amp;ldquo;embedding an engineering entity into the IDE.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;VS Code + Copilot is &amp;ldquo;adding a set of engineering entities to a mature IDE.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Agent HQ is positioned as &amp;ldquo;headquarters for multiple engineering entities.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="reconstructing-comparison-dimensions-with-the-ai-engineering-entity-framework"&gt;Reconstructing Comparison Dimensions with the AI Engineering Entity Framework&lt;/h2&gt;
&lt;p&gt;In &amp;ldquo;AI Engineering Entity (AIEE),&amp;rdquo; the definition is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI has evolved from editor auto-completion to a formal node in the software supply chain, able to receive tasks, produce reviewable artifacts (PR/diff/report), pass tests/gates, and be replaced if it fails. It&amp;rsquo;s no longer just an &amp;ldquo;enhanced human developer,&amp;rdquo; but a &amp;ldquo;functional engineering unit&amp;rdquo; in the pipeline.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Based on this, we can reconstruct key comparison dimensions for TRAE and VS Code:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Existence as an Independent Functional Unit&lt;/strong&gt;&lt;br&gt;
Can it autonomously plan, implement, and produce PRs/reports from natural language requirements, without continuous human intervention?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Context Modeling Capability&lt;/strong&gt;&lt;br&gt;
Can it model across files, directories, terminal output, and browser content to form a stable engineering context?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Position in the Pipeline&lt;/strong&gt;&lt;br&gt;
Is it an enhancement layer within the IDE, or a formal node in CI/CD and code review flows?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reviewability and Replaceability&lt;/strong&gt;&lt;br&gt;
Are its outputs standardized (PR, diff, report) and suitable for regular pipeline review and rollback?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multi-Agent Collaboration Capability&lt;/strong&gt;&lt;br&gt;
Does it natively support multi-agent collaboration, or is it focused on single-agent enhancement?&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="trae-solo-vs-vs-code-engineering-entity-comparison-table"&gt;TRAE SOLO vs VS Code: Engineering Entity Comparison Table&lt;/h2&gt;
&lt;p&gt;The following table summarizes the main differences from the engineering entity perspective. Note: VS Code includes Copilot Chat + Plan/Agent mode by default and can mount the Agent HQ ecosystem.&lt;/p&gt;
&lt;p&gt;You can interpret this table as: &amp;ldquo;If AI is treated as an engineering entity in the pipeline, what roles do TRAE and VS Code play?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Table:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;TRAE SOLO&lt;/th&gt;
&lt;th&gt;VS Code + Copilot / Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Engineering Role&lt;/td&gt;
&lt;td&gt;Single strong entity, directly handles end-to-end tasks from idea to deployment&lt;/td&gt;
&lt;td&gt;IDE + multiple entities (Plan, Implementation, Review), IDE itself is the engineering base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task Granularity&lt;/td&gt;
&lt;td&gt;Project/feature level: from PRD-style description to full project scaffold, implementation, testing, preview&lt;/td&gt;
&lt;td&gt;Function/file level mainly; Plan mode can scale to feature/subsystem level&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Modeling&lt;/td&gt;
&lt;td&gt;Emphasizes &amp;ldquo;context engineering&amp;rdquo;: reads codebase, terminal output, browser content as unified input for SOLO&lt;/td&gt;
&lt;td&gt;Mainly codebase; Plan Agent generates plans based on code analysis, Agent mode schedules by plan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automation&lt;/td&gt;
&lt;td&gt;Can proactively modify files, run commands, tests, start local services, forming a complete loop&lt;/td&gt;
&lt;td&gt;Plan/Agent can run commands, modify files, run tests, but is more dependent on your current project/workflow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human Intervention&lt;/td&gt;
&lt;td&gt;More &amp;ldquo;post-review&amp;rdquo;: let it run first, then review and fine-tune&lt;/td&gt;
&lt;td&gt;More &amp;ldquo;in-process collaboration&amp;rdquo;: frequent intervention in planning, implementation, and review, with control points at each step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Form&lt;/td&gt;
&lt;td&gt;Code changes, test results, preview; sometimes PRs/docs&lt;/td&gt;
&lt;td&gt;Code completion, refactoring, PR comments, CodeQL reports, Plan/Todo lists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-Agent&lt;/td&gt;
&lt;td&gt;Core is the SOLO agent; other capabilities (like Trae Agent CLI) are extensions&lt;/td&gt;
&lt;td&gt;Copilot itself is an agent; Agent HQ allows parallel competition among multiple agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Transparency&lt;/td&gt;
&lt;td&gt;Product exposes specific models poorly; users can&amp;rsquo;t tell which model is used&lt;/td&gt;
&lt;td&gt;GitHub clearly marks Copilot&amp;rsquo;s model family; Agent HQ shows agent sources directly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Performance&lt;/td&gt;
&lt;td&gt;Strong automation but slow; complex projects may stall at &amp;ldquo;thinking&amp;rdquo; stage; hard token limits&lt;/td&gt;
&lt;td&gt;Stable response in familiar projects; mostly local changes, overall latency is controllable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy &amp;amp; Compliance&lt;/td&gt;
&lt;td&gt;Official and third-party reviews mention extensive telemetry/data collection; enterprise adoption needs extra evaluation&lt;/td&gt;
&lt;td&gt;Copilot for Enterprise has clear data isolation/compliance, suitable for most enterprise governance needs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: TRAE SOLO vs VS Code Engineering Entity Comparison
&lt;/figcaption&gt;
&lt;p&gt;From the table:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If you want &amp;ldquo;an AI engineering entity that takes full responsibility from requirement to deployment,&amp;rdquo; TRAE SOLO fits that role.&lt;/li&gt;
&lt;li&gt;If you want &amp;ldquo;a stable engineering base + a set of pluggable entities,&amp;rdquo; VS Code + Copilot + Agent HQ fits better.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="workflow-comparison-two-engineering-entity-pipelines"&gt;Workflow Comparison: Two Engineering Entity Pipelines&lt;/h2&gt;
&lt;p&gt;To clarify their engineering flows, the following diagram illustrates typical workflows for TRAE SOLO and VS Code.&lt;/p&gt;
&lt;p&gt;Before the diagram, here&amp;rsquo;s an introductory sentence:&lt;br&gt;
The following Mermaid diagram visually compares the engineering pipelines of TRAE SOLO and VS Code, highlighting their respective collaboration models.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/trae-vs-vscode-insiders-agent-hq-and-ai-engineering-entity/85e9f25c80e9c24f2501f2089382295d.svg" data-img="https://assets.jimmysong.io/images/blog/trae-vs-vscode-insiders-agent-hq-and-ai-engineering-entity/85e9f25c80e9c24f2501f2089382295d.svg" alt="Figure 1: TRAE SOLO vs VS Code Engineering Entity Pipeline Comparison" data-caption="Figure 1: TRAE SOLO vs VS Code Engineering Entity Pipeline Comparison"
width="4366"
height="810"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: TRAE SOLO vs VS Code Engineering Entity Pipeline Comparison&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This diagram shows two typical collaboration models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;TRAE SOLO attempts to encapsulate &amp;ldquo;context aggregation → planning → implementation → testing → preview/deployment&amp;rdquo; within a single engineering entity, with user intervention only at requirement input and output review.&lt;/li&gt;
&lt;li&gt;VS Code + Copilot + Agent HQ uses the IDE as runtime, with Plan/Implementation/Review agents corresponding to different roles. Agent HQ supports parallel agent competition, allowing developers to select the best solution.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="model-transparency-speed-and-predictability"&gt;Model Transparency, Speed, and Predictability&lt;/h2&gt;
&lt;p&gt;Based on personal experience, here are the model transparency and speed issues from the engineering entity perspective:&lt;/p&gt;
&lt;h3 id="model-transparency"&gt;Model Transparency&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;TRAE currently exposes &amp;ldquo;which model is called&amp;rdquo; poorly; switching MAX mode only suggests &amp;ldquo;stronger model or higher quota,&amp;rdquo; but no clear feedback.&lt;/li&gt;
&lt;li&gt;Community feedback notes slow integration of new models; some strong models (like Claude series) are available elsewhere but not yet in TRAE.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;TRAE is hard to use as a &amp;ldquo;precisely configurable engineering unit,&amp;rdquo; more like a black box, making model change management in CI/CD or production pipelines difficult.&lt;/li&gt;
&lt;li&gt;VS Code + Copilot + Agent HQ is stronger in standardization; GitHub clearly marks Copilot&amp;rsquo;s model family, Agent HQ uses agent source as the abstraction boundary.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="speed-and-predictability"&gt;Speed and Predictability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;TRAE SOLO&amp;rsquo;s &amp;ldquo;slowness&amp;rdquo; comes from executing more steps (reading files, analyzing, planning, testing) and insufficient engineering process visualization. The UI shows &amp;ldquo;Thinking…&amp;rdquo; prompts, making it hard to tell if it&amp;rsquo;s stuck or planning.&lt;/li&gt;
&lt;li&gt;VS Code&amp;rsquo;s Plan mode explicitly lists plans and Todos; Agent mode emphasizes &amp;ldquo;execution by plan,&amp;rdquo; letting users clearly see the entity&amp;rsquo;s work status, improving predictability.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="agent-hq-positioning-single-entity-vs-multi-entity-headquarters"&gt;Agent HQ Positioning: Single Entity vs Multi-Entity Headquarters&lt;/h2&gt;
&lt;p&gt;From a platform perspective, GitHub and TRAE differ as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agent HQ&amp;rsquo;s core idea: future development will rely on multiple specialized agents collaborating in parallel. GitHub is building &amp;ldquo;agent headquarters,&amp;rdquo; not a single engineering agent. Developers can schedule agents in a unified control plane, integrating with existing GitHub Flow (Issue, PR, Review, CI/CD).&lt;/li&gt;
&lt;li&gt;TRAE is more like &amp;ldquo;proprietary IDE + agent + full-stack context engineering,&amp;rdquo; delivering an integrated experience.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In terms of engineering entity organization:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub is building &amp;ldquo;infrastructure and governance for multi-entity engineering systems.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;TRAE is building &amp;ldquo;vertically integrated engineering entity + private runtime.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;They&amp;rsquo;re not mutually exclusive, representing &amp;ldquo;broad platform + multi-entity scheduling&amp;rdquo; vs &amp;ldquo;single strong entity + proprietary toolchain.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="subjective-experience-and-engineering-framework-integration"&gt;Subjective Experience and Engineering Framework Integration&lt;/h2&gt;
&lt;p&gt;Translating personal experience into engineering language:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VS Code is more accustomed to a &amp;ldquo;single IDE + multiple views&amp;rdquo; experience; TRAE splits IDE and SOLO modes, requiring mental switching.&lt;/li&gt;
&lt;li&gt;TRAE&amp;rsquo;s engineering entity capabilities surpass ordinary completion tools, able to take on tasks, but model transparency and context quality are unstable, and governance needs improvement.&lt;/li&gt;
&lt;li&gt;VS Code doesn&amp;rsquo;t take over the whole project, but local work is stable; Plan, Agent, and Review combinations enable multi-entity collaboration.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;According to the &amp;ldquo;AI Engineering Entity (AIEE)&amp;rdquo; framework:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;TRAE SOLO is a &lt;strong&gt;single AI engineering entity (AIEE) capable of handling complete engineering tasks&lt;/strong&gt;, but still has clear shortcomings in model transparency, engineering governance, and enterprise-level controllability.&lt;/li&gt;
&lt;li&gt;VS Code + Copilot + Agent HQ is an &lt;strong&gt;infrastructure platform for multiple engineering entities&lt;/strong&gt;, less aggressive in end-to-end outsourcing in the short term, but clearer in engineering consistency, model replaceability, and organizational governance.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;This article systematically compares TRAE SOLO and VS Code (with Copilot, Agent HQ) from the perspective of AI engineering entities, focusing on automation, collaboration, and model transparency. TRAE SOLO is better suited for individual developers or small teams seeking end-to-end automation, while VS Code + Copilot + Agent HQ provides stronger infrastructure for multi-entity collaboration, enterprise governance, and engineering consistency. In the future, AI engineering entities will become formal nodes in the software development pipeline, and tool selection should be based on engineering needs and governance requirements.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.webull.com/news/13844395366540288" target="_blank" rel="noopener"&gt;ByteDance&amp;rsquo;s AI programming tool TRAE announced the &amp;hellip; - webull.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/news-insights/company-news/welcome-home-agents" target="_blank" rel="noopener"&gt;Introducing Agent HQ: Any agent, any way you work - github.blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://traesolo.net" target="_blank" rel="noopener"&gt;TRAE SOLO - AI-Powered Context Engineer for Faster &amp;hellip; - traesolo.net&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.visualstudio.com/blogs/2025/02/24/introducing-copilot-agent-mode" target="_blank" rel="noopener"&gt;Introducing GitHub Copilot agent mode (preview) - code.visualstudio.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jimmysong.io/book/ai-handbook/infra/ai-engineering-entity/" target="_blank" rel="noopener"&gt;AI Engineering Entity | Jimmy Song - jimmysong.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.trae.ai" target="_blank" rel="noopener"&gt;TRAE - Collaborate with Intelligence - trae.ai&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/bytedance/trae-agent" target="_blank" rel="noopener"&gt;bytedance/trae-agent - github.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://traeide.com" target="_blank" rel="noopener"&gt;TraeIDE - traeide.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.visualstudio.com/docs/copilot/chat/copilot-chat" target="_blank" rel="noopener"&gt;Get started with chat in VS Code - code.visualstudio.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://skywork.ai/blog/trae-ai-ide-review-2025-cursor-alternative" target="_blank" rel="noopener"&gt;Trae AI IDE Review 2025: ByteDance&amp;rsquo;s Free IDE vs Cursor - skywork.ai&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.blog/changelog/2025-10-28-github-copilot-in-visual-studio-code-gets-upgraded" target="_blank" rel="noopener"&gt;GitHub Copilot in Visual Studio Code gets upgraded - github.blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.traesolo.org/blog/trae-solo-comprehensive-review" target="_blank" rel="noopener"&gt;TRAE 2.0 Preview: AI-Native Development Paradigm Leap | T&amp;hellip; - traesolo.org&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Closed-Source Flagships Accelerate, Open-Source Ecosystem Forced to 'Synchronize'</title><link>https://jimmysong.io/blog/closed-source-flagships-and-open-source-twins/</link><pubDate>Fri, 14 Nov 2025 04:21:07 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/closed-source-flagships-and-open-source-twins/</guid><description>Analysis of closed-source model acceleration and open-source ecosystem response, exploring core engineering contradictions and infrastructure evolution.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Closed-source models are accelerating, while open-source ecosystems are forced to catch up. What engineers truly need to focus on is infrastructure and controllability—not just the surface-level &amp;ldquo;Twin&amp;rdquo; phenomenon.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Recently, I came across an email titled: &lt;strong&gt;&amp;ldquo;Every Big AI Model Now Has an Open-Source Twin&amp;rdquo;&lt;/strong&gt;. Literally translated, it means &amp;ldquo;Every major closed-source model now has an open-source sibling.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;From the perspective of media or venture capital, this is an easy story to tell: a closed-source flagship model is released, the community quickly produces an open-source counterpart, and the narrative becomes &amp;ldquo;open source is catching up with closed source&amp;rdquo;—the ecosystem is thriving, innovation is accelerating, and the future looks promising.&lt;/p&gt;
&lt;p&gt;But from the viewpoint of someone deeply involved in infrastructure, cloud native, and architecture, this narrative has several issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It equates &amp;ldquo;synchronized pace&amp;rdquo; with &amp;ldquo;matched capabilities.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;It overlooks the real factors that determine the ceiling: data, compute, and engineering systems.&lt;/li&gt;
&lt;li&gt;It blurs a key fact: the open-source ecosystem is fundamentally in a &lt;strong&gt;reactive state&lt;/strong&gt;, not leading in parallel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This article breaks down the &amp;ldquo;Open-Source Twin&amp;rdquo; narrative from an engineering and infrastructure perspective, and shares the core issues I personally care about.&lt;/p&gt;
&lt;h2 id="from-theres-a-twin-to-forced-synchronization-how-the-narrative-changed"&gt;From &amp;ldquo;There&amp;rsquo;s a Twin&amp;rdquo; to &amp;ldquo;Forced Synchronization&amp;rdquo;: How the Narrative Changed&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s first outline the phenomenon.&lt;/p&gt;
&lt;p&gt;In the past two years, the industry has repeatedly seen the following pattern:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Big tech releases a closed-source flagship model (e.g., GPT-5 series, Claude 4/4.5, Gemini 2.5).&lt;/li&gt;
&lt;li&gt;Soon after, a batch of open-source counterparts emerge (e.g., Qwen, GLM, Yi, K2), aligning on parameter scale and benchmark metrics.&lt;/li&gt;
&lt;li&gt;Media and community start using terms like &amp;ldquo;open-source twin,&amp;rdquo; &amp;ldquo;replacement,&amp;rdquo; and &amp;ldquo;counterpart.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At this level, it&amp;rsquo;s easy to draw optimistic conclusions: &lt;strong&gt;open source has established full benchmarking capabilities; no matter how fast closed source runs, the community can keep up.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;But the more critical question is: &lt;strong&gt;Who sets the pace, who defines the rules, and who bears the real cost?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The current structure is clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pace is set by closed-source giants&lt;/strong&gt;: They decide when to boost inference, extend context, push multimodality, or specialize reasoning (like the R1 series).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Open-source ecosystem passively initiates response mechanisms&lt;/strong&gt;: Each closed-source upgrade triggers a new round of &amp;ldquo;open-source benchmarking.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, the current model isn&amp;rsquo;t parallel innovation or mutual stimulation—it&amp;rsquo;s closed source constantly shifting gears at the front, with open source adjusting to avoid falling out of sight.&lt;/p&gt;
&lt;p&gt;From an engineering perspective, &amp;ldquo;Every Big AI Model Has an Open-Source Twin&amp;rdquo; is more accurately:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Every Big AI Model Now Forces an Open-Source Response.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="what-exactly-is-open-source-synchronizing-with"&gt;What Exactly Is Open Source &amp;ldquo;Synchronizing&amp;rdquo; With?&lt;/h2&gt;
&lt;p&gt;To understand the &amp;ldquo;synchronized response&amp;rdquo; phenomenon, we need to break it down into three categories.&lt;/p&gt;
&lt;p&gt;Before listing them, let&amp;rsquo;s add some context: every closed-source flagship update isn&amp;rsquo;t just &amp;ldquo;more parameters, higher scores&amp;rdquo;—it&amp;rsquo;s constantly &lt;strong&gt;rewriting constraints&lt;/strong&gt;, including inference cost, interaction patterns, context length, multimodal consistency, and explainability.&lt;/p&gt;
&lt;p&gt;In this context, open source isn&amp;rsquo;t just synchronizing &amp;ldquo;scores,&amp;rdquo; but increasingly complex &lt;strong&gt;objective functions&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id="synchronizing-the-imagination-boundary-of-capability-ceilings"&gt;Synchronizing the &amp;ldquo;Imagination Boundary&amp;rdquo; of Capability Ceilings&lt;/h3&gt;
&lt;p&gt;Closed-source models essentially expand &amp;ldquo;what people think a model should be able to do,&amp;rdquo; such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;From pure text to text + image + audio + video.&lt;/li&gt;
&lt;li&gt;From single-turn Q&amp;amp;A to engineering-level reasoning, coding, debugging, fixing, and refactoring.&lt;/li&gt;
&lt;li&gt;From thousands of tokens of context to hundreds of thousands or more.&lt;/li&gt;
&lt;li&gt;From &amp;ldquo;black box output&amp;rdquo; to having chains of thought, reasoning traces, and verifiable outputs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Open-source models then align their goals: &amp;ldquo;We also need long context, multimodality, coding ability, and agent workflow support.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="synchronizing-the-expectation-value-of-interfaces-and-usage-patterns"&gt;Synchronizing the &amp;ldquo;Expectation Value&amp;rdquo; of Interfaces and Usage Patterns&lt;/h3&gt;
&lt;p&gt;Once developers and enterprise users are educated by closed-source models:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How low can interaction latency go?&lt;/li&gt;
&lt;li&gt;How long can context be extended without breaking?&lt;/li&gt;
&lt;li&gt;How smooth can multimodal input be?&lt;/li&gt;
&lt;li&gt;How &amp;ldquo;smart&amp;rdquo; can the reasoning process get?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Their expectations for &lt;strong&gt;any open-source model&lt;/strong&gt; are recalibrated.&lt;/p&gt;
&lt;p&gt;Thus, open source must:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Continuously optimize inference frameworks (e.g., vLLM, SGLang, TGI) to narrow the latency gap.&lt;/li&gt;
&lt;li&gt;Make serving experiences closer to closed-source, such as OpenAI API compatibility and better SDKs.&lt;/li&gt;
&lt;li&gt;Forcefully catch up on multimodality and long context, even if training costs are high.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="synchronizing-surface-metrics-not-complete-capabilities"&gt;Synchronizing &amp;ldquo;Surface Metrics,&amp;rdquo; Not &amp;ldquo;Complete Capabilities&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;From a benchmark perspective, open source can indeed reach 80–90% on public test sets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MMLU, GSM8K, HumanEval.&lt;/li&gt;
&lt;li&gt;Common reasoning, reading comprehension, code generation metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But these metrics only reflect &lt;strong&gt;surface capabilities&lt;/strong&gt;, not:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Robustness to long-tail problems.&lt;/li&gt;
&lt;li&gt;Stability in complex, multi-step scenarios.&lt;/li&gt;
&lt;li&gt;Reliability and controllability in large-scale production systems.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Engineering health&amp;rdquo; over long-term evolution.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is why I&amp;rsquo;m skeptical of the &amp;ldquo;Twin&amp;rdquo; term: &lt;strong&gt;it uses superficial metric similarity to mask deep structural differences.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="why-open-source-twin-sounds-good-but-misses-the-core-contradiction"&gt;Why &amp;ldquo;Open-Source Twin&amp;rdquo; Sounds Good but Misses the Core Contradiction&lt;/h2&gt;
&lt;p&gt;From an infrastructure and engineering perspective, the real issue isn&amp;rsquo;t &amp;ldquo;can open source copy a similar architecture,&amp;rdquo; but:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Who can sustainably manage data, compute, scheduling systems, and engineering teams to build a long-term model production pipeline.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;There are three key contradictions here.&lt;/p&gt;
&lt;h3 id="data-is-unavailable-training-recipes-cant-be-fully-reproduced"&gt;Data Is Unavailable, Training Recipes Can&amp;rsquo;t Be Fully Reproduced&lt;/h3&gt;
&lt;p&gt;Open source can replicate general network structures and optimization tricks, but can&amp;rsquo;t access:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Closed-source data sources and cleaning standards.&lt;/li&gt;
&lt;li&gt;Filtering strategies, detoxification, alignment details.&lt;/li&gt;
&lt;li&gt;Large-scale synthetic data generation and selection methods.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Result: &lt;strong&gt;Even if you match parameter scale and training steps, the effect may not truly align.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Many open-source projects have to use rougher data, limited compute budgets, and more conservative training strategies, ending up with a &amp;ldquo;usable but not truly stable&amp;rdquo; state.&lt;/p&gt;
&lt;h3 id="compute-gap-is-structural-not-solved-by-one-time-funding"&gt;Compute Gap Is Structural, Not Solved by One-Time Funding&lt;/h3&gt;
&lt;p&gt;Training flagship models requires compute that&amp;rsquo;s not just hundreds or thousands of GPUs—it&amp;rsquo;s a &lt;strong&gt;structural, long-term investment&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In the open-source camp, those approaching this scale usually have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Backing from large companies or national labs.&lt;/li&gt;
&lt;li&gt;Funding from real business budgets, not community donations.&lt;/li&gt;
&lt;li&gt;Compute supply that can be planned long-term, not just a one-off &amp;ldquo;burn.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Reality:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Entities truly capable of &amp;ldquo;flagship open-source models&amp;rdquo; are essentially &amp;ldquo;institutions,&amp;rdquo; not loose personal communities.&lt;/li&gt;
&lt;li&gt;Most &amp;ldquo;open-source twins&amp;rdquo; are backed by enterprises, with product goals and commercial interests.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, &lt;strong&gt;&amp;ldquo;open source vs closed source&amp;rdquo; is more like &amp;ldquo;many big companies vs a few giants,&amp;rdquo; not &amp;ldquo;community vs company.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="reproducing-architecture--leading-architecture"&gt;&amp;ldquo;Reproducing&amp;rdquo; Architecture ≠ &amp;ldquo;Leading&amp;rdquo; Architecture&lt;/h3&gt;
&lt;p&gt;Many open-source models look architecturally similar to closed-source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transformer variants, MoE variants.&lt;/li&gt;
&lt;li&gt;Minor tweaks at the decision layer.&lt;/li&gt;
&lt;li&gt;Some inference optimizations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But in terms of industry power, those truly pushing these architectures to production scale and validating feasibility are still on the closed-source side. Open source mainly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Validates closed-source approaches on weaker compute.&lt;/li&gt;
&lt;li&gt;Explores &amp;ldquo;smaller, cheaper&amp;rdquo; approximations.&lt;/li&gt;
&lt;li&gt;Prunes and adapts for specific scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, &amp;ldquo;Twin&amp;rdquo; is more a marketing term than an engineering one.&lt;/p&gt;
&lt;h2 id="my-perspective-what-really-matters-in-this-game"&gt;My Perspective: What Really Matters in This Game&lt;/h2&gt;
&lt;p&gt;As an engineer in cloud native, service mesh, and distributed systems, my default thinking when looking at AI infrastructure is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Treat &amp;ldquo;models&amp;rdquo; as just one component in the system, and focus on the underlying infrastructure, scheduling systems, and engineering pipelines.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;From this angle, the real concern behind &amp;ldquo;every closed-source model has an open-source twin&amp;rdquo; is:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/closed-source-flagships-and-open-source-twins/4fb6b788319f45d2c5a4aedd1af26057.svg" data-img="https://assets.jimmysong.io/images/blog/closed-source-flagships-and-open-source-twins/4fb6b788319f45d2c5a4aedd1af26057.svg" alt="Figure 1: Mermaid Diagram" data-caption="Figure 1: Mermaid Diagram"
width="2616"
height="1359"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Mermaid Diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="can-open-source-models-stand-firm-in-production-over-time"&gt;Can Open-Source Models Stand Firm in Production Over Time?&lt;/h3&gt;
&lt;p&gt;The focus isn&amp;rsquo;t whether it can run a demo, but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Is there a clear upgrade cadence?&lt;/li&gt;
&lt;li&gt;Are rollback and compatibility strategies robust?&lt;/li&gt;
&lt;li&gt;Is there a sound evolution path for model weights, inference frameworks, and configurations?&lt;/li&gt;
&lt;li&gt;Is the entire stack observable and debuggable?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If these infrastructure layers aren&amp;rsquo;t mature, the so-called &amp;ldquo;Twin&amp;rdquo; is just &amp;ldquo;something that looks similar, but don&amp;rsquo;t ask if it can support your production workloads.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="has-training-and-inference-infrastructure-formed-a-replicable-engineering-paradigm"&gt;Has Training and Inference Infrastructure Formed a Replicable &amp;ldquo;Engineering Paradigm&amp;rdquo;?&lt;/h3&gt;
&lt;p&gt;The real value in open source is whether it can form a unified, teachable, and transferable engineering paradigm, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Training pipeline: data preparation → preprocessing → training → evaluation → alignment → deployment.&lt;/li&gt;
&lt;li&gt;Inference infrastructure: how vLLM / SGLang / TGI maintain consistent performance across different GPU topologies.&lt;/li&gt;
&lt;li&gt;Scheduling and resource management: how to manage large-scale inference loads on Kubernetes and cloud-native infrastructure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If these can be established, &amp;ldquo;open-source twin&amp;rdquo; isn&amp;rsquo;t just &amp;ldquo;we also have a model,&amp;rdquo; but a reusable, transparent, and learnable engineering system.&lt;/p&gt;
&lt;h3 id="the-true-value-of-open-source-controllability-and-bargaining-power-not-absolute-performance"&gt;The True Value of Open Source: Controllability and Bargaining Power, Not Absolute Performance&lt;/h3&gt;
&lt;p&gt;Realistically, closed-source flagships will continue to lead in &lt;strong&gt;overall capability&lt;/strong&gt; for the foreseeable future: larger scale, more complex training, better data, richer scenario tuning.&lt;/p&gt;
&lt;p&gt;For enterprises and developers, the key value of open source isn&amp;rsquo;t &amp;ldquo;I want to fully replace closed source,&amp;rdquo; but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Maintaining controllability over technical direction.&lt;/li&gt;
&lt;li&gt;Gaining bargaining power, avoiding vendor lock-in.&lt;/li&gt;
&lt;li&gt;Building your own model stack in privacy-sensitive or compliance-heavy scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From this perspective, the &amp;ldquo;Twin&amp;rdquo; term should be soberly rewritten as:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;In many scenarios, open source can provide a controllable, more flexible alternative path—but it&amp;rsquo;s not a mirror of closed source, it&amp;rsquo;s a separate engineering decision space.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="practical-advice-for-engineers-and-teams-dont-worship-twins-see-the-structure-clearly"&gt;Practical Advice for Engineers and Teams: Don&amp;rsquo;t Worship &amp;ldquo;Twins,&amp;rdquo; See the Structure Clearly&lt;/h2&gt;
&lt;p&gt;Before the final summary, here are my actionable views on this topic.&lt;/p&gt;
&lt;p&gt;Premise: If you&amp;rsquo;re an engineer, architect, or technical leader, your real decision isn&amp;rsquo;t &amp;ldquo;choose open source or closed source,&amp;rdquo; but:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Under your business constraints, how do you combine closed-source APIs, open-source weights, and self-built infrastructure to create an &lt;strong&gt;evolvable, observable, and portable&lt;/strong&gt; system.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;With this in mind, &amp;ldquo;Every Big AI Model Has an Open-Source Twin&amp;rdquo; breaks down into several sober judgments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When you see an &amp;ldquo;open-source twin,&amp;rdquo; first ask: can it run stably in your production environment over time, not just pass benchmarks?&lt;/li&gt;
&lt;li&gt;What you really need to understand: is there a clear story behind its training/inference infrastructure, not just a weight download link?&lt;/li&gt;
&lt;li&gt;Reframe &amp;ldquo;open source vs closed source&amp;rdquo; as &amp;ldquo;where do I need closed source (capability/cost), and where do I need open source (controllability/compliance)?&amp;rdquo;&lt;/li&gt;
&lt;li&gt;If you&amp;rsquo;re working on infrastructure and platform layers, focus on:
&lt;ul&gt;
&lt;li&gt;How to run different models in a unified scheduling, monitoring, and logging system.&lt;/li&gt;
&lt;li&gt;How to treat large models as observable, governable services on Kubernetes/cloud-native stacks, not mysterious black boxes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Closed-source flagships keep accelerating, shifting gears, and adding dimensions, while open-source ecosystems are forced to develop increasingly mature synchronized response mechanisms. What truly determines the gap is data, compute, and engineering infrastructure—not just a single model release.&lt;/p&gt;
&lt;p&gt;Personally, my focus will remain on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The evolution of inference infrastructure (like vLLM, SGLang, TGI).&lt;/li&gt;
&lt;li&gt;Training and scheduling: how to stably manage model lifecycles in cloud-native environments.&lt;/li&gt;
&lt;li&gt;Engineering paradigm accumulation: moving from &amp;ldquo;can run&amp;rdquo; to &amp;ldquo;reproducible, maintainable, and evolvable.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Key Takeaways from Ingress NGINX Retirement: Managing Technical Debt in Cloud Native Migration</title><link>https://jimmysong.io/blog/ingress-nginx-retirement-insights/</link><pubDate>Thu, 13 Nov 2025 01:43:05 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ingress-nginx-retirement-insights/</guid><description>The retirement of Ingress NGINX reveals technical debt, migration paths, and the trend toward standardized traffic management in cloud native infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The evolution of cloud native infrastructure inevitably faces the reality of technical debt and governance. The retirement of Ingress NGINX is a profound reminder about standardization and sustainability.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Kubernetes officially announced: &lt;a href="https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/" target="_blank" rel="noopener"&gt;Ingress NGINX will be completely discontinued in March 2026&lt;/a&gt;. This is not just a typical project sunset, but a landmark event in the evolution of the Kubernetes networking model. It signals the inevitable shift of the tech stack from &amp;ldquo;flexible but fragile&amp;rdquo; to &amp;ldquo;controllable and governable.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;As someone who has long promoted Kubernetes and cloud native practices, I have witnessed both the golden age of Ingress NGINX and the gradual accumulation of its technical debt. Here are the clear insights this event has brought me.&lt;/p&gt;
&lt;h2 id="technical-debt-will-eventually-backfire-especially-for-infrastructure-components"&gt;Technical Debt Will Eventually Backfire, Especially for Infrastructure Components&lt;/h2&gt;
&lt;p&gt;The core issue with Ingress NGINX is not a decline in users, but that &amp;ldquo;maintenance costs permanently exceed the pace of contributions.&amp;rdquo; High flexibility leads to a huge attack surface, years of complex configuration legacy, and a shortage of community maintainers, ultimately making the project unsustainable.&lt;/p&gt;
&lt;p&gt;Once infrastructure components can no longer be securely updated, they cease to be assets and become liabilities.&lt;/p&gt;
&lt;div class="alert alert-note-container"&gt;
&lt;div class="alert-note-title px-2"&gt;
My Conclusion
&lt;/div&gt;
&lt;div class="alert-note px-2"&gt;
&lt;strong&gt;The threshold for future infrastructure will be higher, with stricter requirements for security and maintainability. The model of individual hero maintainers will continue to fail.&lt;/strong&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="kubernetes-officially-enters-the-gateway-api-era"&gt;Kubernetes Officially Enters the Gateway API Era&lt;/h2&gt;
&lt;p&gt;Before introducing Gateway API (Gateway API, Gateway Application Programming Interface), it&amp;rsquo;s important to review the design of Ingress. Ingress was once praised for its simplicity, but now it cannot meet modern needs for traffic management, scalability, security policies, and multi-team collaboration.&lt;/p&gt;
&lt;p&gt;Gateway API is designed with a more modern philosophy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Governance model across roles (Infra / Dev / Ops)&lt;/li&gt;
&lt;li&gt;Strong CRD (Custom Resource Definition) extensibility&lt;/li&gt;
&lt;li&gt;Pluggable implementation&lt;/li&gt;
&lt;li&gt;Significantly improved observability and lifecycle management&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means: &lt;strong&gt;The entire ecosystem is moving from &amp;ldquo;controller differentiation&amp;rdquo; to &amp;ldquo;API standardization&amp;rdquo; at the traffic layer.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="most-users-are-unprepared-for-the-complexity-of-the-underlying-network-stack"&gt;Most Users Are Unprepared for the Complexity of the Underlying Network Stack&lt;/h2&gt;
&lt;p&gt;Long-term community observation shows that most users treat Ingress NGINX as a black box. Now, migrating from Ingress to Gateway API or other Ingress controllers represents a &amp;ldquo;hidden migration wave&amp;rdquo; for many clusters.&lt;/p&gt;
&lt;p&gt;This announcement highlights two points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When a &amp;ldquo;default component&amp;rdquo; in a complex system stops being updated, it brings widespread invisible risks&lt;/li&gt;
&lt;li&gt;The cloud native ecosystem needs long-term, sustainable supply chain governance&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="security-is-the-final-straw"&gt;Security Is the Final Straw&lt;/h2&gt;
&lt;p&gt;The official announcement repeatedly emphasizes that security risks and vulnerabilities can no longer be continuously fixed. This once again proves: &lt;strong&gt;Flexibility and security are always a tradeoff, and the closer a component is to the data plane, the less compromise is acceptable.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="the-individual-maintainer-bottleneck-in-cloud-native-will-become-more-pronounced"&gt;The &amp;ldquo;Individual Maintainer Bottleneck&amp;rdquo; in Cloud Native Will Become More Pronounced&lt;/h2&gt;
&lt;p&gt;Ingress NGINX has long relied on just one or two maintainers, and ultimately had to retire. This exposes a long-standing issue in the open source world: &lt;strong&gt;Critical projects are heavily relied upon, but contributions are insufficient.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The future of infrastructure is clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Large companies will be more willing to invest in core open source infrastructure&lt;/li&gt;
&lt;li&gt;Individual maintainers cannot support critical foundational components&lt;/li&gt;
&lt;li&gt;The boundary between commercialization and open source will continue to tighten&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="my-personal-takeaway-gateway-api-l7-traffic-management-and-the-integration-with-ai-native-infra"&gt;My Personal Takeaway: Gateway API, L7 Traffic Management, and the Integration with AI Native Infra&lt;/h2&gt;
&lt;p&gt;The retirement of Ingress NGINX points to an underlying trend: &lt;strong&gt;Unified and extensible APIs will become the dominant paradigm for cloud native infrastructure.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The AI-Native infrastructure I&amp;rsquo;m researching—such as inference routing, model gateways, AI Gateway, and Agent Orchestrator—will follow a similar path: from early flexible hacks to mature, standardized, and governed APIs.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Ingress NGINX is arguably one of the most important control planes in Kubernetes history. Its retirement is not a failure, but an inevitable result of the system advancing to the next stage.&lt;/p&gt;
&lt;p&gt;For me, this is a strong reminder:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Technical debt cannot be avoided&lt;/li&gt;
&lt;li&gt;Infrastructure must be built for the long term&lt;/li&gt;
&lt;li&gt;Standardized APIs are the future&lt;/li&gt;
&lt;li&gt;Sustainable open source requires collective investment&lt;/li&gt;
&lt;li&gt;The convergence of AI and cloud native will follow the same evolutionary trajectory&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/" target="_blank" rel="noopener"&gt;Ingress NGINX Retirement - kubernetes.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gateway-api.sigs.k8s.io/guides/" target="_blank" rel="noopener"&gt;Gateway API Official Documentation - gateway-api.sigs.k8s.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>What Makes an AI Platform Truly Kubernetes-Native?</title><link>https://jimmysong.io/blog/k8s-ai-conformance/</link><pubDate>Wed, 12 Nov 2025 12:39:18 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/k8s-ai-conformance/</guid><description>Discover what defines a truly Kubernetes-native AI platform, key criteria for conformance, and how standardization drives interoperability and growth in cloud-native AI infrastructure.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Standardizing cloud-native AI platforms is a key step in advancing the AI infrastructure ecosystem.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In recent years, the cloud-native ecosystem has gradually expanded from general-purpose computing to AI workloads. The CNCF (Cloud Native Computing Foundation) is driving a new certification initiative—&lt;strong&gt;&lt;a href="https://github.com/cncf/k8s-ai-conformance" target="_blank" rel="noopener"&gt;Kubernetes AI Conformance&lt;/a&gt;&lt;/strong&gt;—aimed at establishing a set of technical standards for AI platforms to be compatible and interoperable with Kubernetes.&lt;/p&gt;
&lt;p&gt;This certification seeks to answer a core question:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;What does it take for an AI platform to be truly Kubernetes-native?&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-ai-conformance-is-needed"&gt;Why AI Conformance Is Needed&lt;/h2&gt;
&lt;p&gt;Currently, many AI platforms claim to &amp;ldquo;run on Kubernetes,&amp;rdquo; but their actual integration varies greatly. Here are some common scenarios:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Some platforms merely run containers on Kubernetes without deep integration with the control plane.&lt;/li&gt;
&lt;li&gt;Others fully integrate with Kubernetes control plane, scheduling, and observability systems.&lt;/li&gt;
&lt;li&gt;Many vendors build their own controllers, schedulers, and storage interfaces, resulting in migration and interoperability challenges across environments.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The core purpose of CNCF&amp;rsquo;s AI Conformance is to unify standards so that AI platforms behave consistently across clouds and clusters, becoming a common language for the ecosystem—much like &amp;ldquo;Certified Kubernetes.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="key-criteria-for-kubernetes-native-ai-platforms"&gt;Key Criteria for Kubernetes-Native AI Platforms&lt;/h2&gt;
&lt;p&gt;A Kubernetes-native AI platform must meet several key criteria:&lt;/p&gt;
&lt;h3 id="architecture-native-everything-as-kubernetes-objects"&gt;Architecture-Native: Everything as Kubernetes Objects&lt;/h3&gt;
&lt;p&gt;For AI training, inference, and batch processing scenarios, all tasks should be declared as &lt;code&gt;Pod&lt;/code&gt;, &lt;code&gt;Job&lt;/code&gt;, or &lt;code&gt;CRD&lt;/code&gt; (Custom Resource Definition) objects. Scheduling, scaling, and lifecycle management should be handled by the Kubernetes control plane, not by custom platform logic.&lt;/p&gt;
&lt;p&gt;For example, Kubeflow Training Operator, RayCluster CRD, and vLLM Operator all use this native object declaration approach.&lt;/p&gt;
&lt;h3 id="scheduling-native-unified-compute-resource-scheduling"&gt;Scheduling-Native: Unified Compute Resource Scheduling&lt;/h3&gt;
&lt;p&gt;AI platforms need to collaborate with Kubernetes &lt;code&gt;Device Plugin&lt;/code&gt; and &lt;code&gt;Scheduler&lt;/code&gt; to detect GPU, NPU, and other heterogeneous compute resources, supporting &lt;code&gt;resources.requests/limits&lt;/code&gt; for resource management. Task scheduling should be observable and traceable, avoiding black-box operations.&lt;/p&gt;
&lt;h3 id="storage-native-declarative-data-and-model-access"&gt;Storage-Native: Declarative Data and Model Access&lt;/h3&gt;
&lt;p&gt;Data and model access should not rely on host paths but use PVC (PersistentVolumeClaim), CSI (Container Storage Interface), S3/NAS, and other standard interfaces for mounting. Credentials and sensitive parameters should be injected via &lt;code&gt;Secrets&lt;/code&gt; and &lt;code&gt;ConfigMap&lt;/code&gt;. The entire pipeline should be reproducible via GitOps/CI/CD workflows, ensuring traceability and automation.&lt;/p&gt;
&lt;h3 id="network-and-service-native-compatible-with-mesh-and-gateway"&gt;Network and Service-Native: Compatible with Mesh and Gateway&lt;/h3&gt;
&lt;p&gt;AI inference services should be exposed as standard Service, Ingress, or Gateway API resources, supporting multi-cluster service discovery and routing policies, and integrating seamlessly with service meshes like Istio, Envoy, and Linkerd.&lt;/p&gt;
&lt;p&gt;Additionally, platforms should output standardized monitoring metrics (e.g., Prometheus), logs (e.g., FluentBit), and tracing data (e.g., OpenTelemetry) for unified observability and operations.&lt;/p&gt;
&lt;h3 id="portability-and-interoperability"&gt;Portability and Interoperability&lt;/h3&gt;
&lt;p&gt;A truly Kubernetes-native AI platform should behave consistently across environments, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Public clouds (EKS, GKE, ACK)&lt;/li&gt;
&lt;li&gt;Private clouds (OpenShift, KubeSphere)&lt;/li&gt;
&lt;li&gt;Bare-metal clusters&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The platform should also integrate directly with mainstream ecosystem components such as Kubeflow, Ray, KServe, and Triton, achieving high interoperability.&lt;/p&gt;
&lt;h2 id="cncfs-goal-from-running-on-kubernetes-to-growing-within-kubernetes"&gt;CNCF&amp;rsquo;s Goal: From &amp;ldquo;Running on Kubernetes&amp;rdquo; to &amp;ldquo;Growing within Kubernetes&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;CNCF aims to use the AI Conformance certification mechanism, much like &lt;em&gt;Certified Kubernetes&lt;/em&gt;, to drive the AI infrastructure ecosystem toward standardization.&lt;/p&gt;
&lt;p&gt;In the future, the industry may see:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;Certified AI Platform&lt;/strong&gt; badge as a trust mark for platforms.&lt;/li&gt;
&lt;li&gt;Automated verification bots (Verify Conformance Bot) to improve testing efficiency.&lt;/li&gt;
&lt;li&gt;Multi-version test suites (e.g., v1.33, v1.34) to ensure compatibility.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These measures will become important technical thresholds and trust foundations for cloud vendors, AI platforms, and open-source AI infrastructure projects.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In the AI era, standardization is the foundation for ecosystem evolution. For AI platforms to thrive in the cloud-native world, they must not only &amp;ldquo;run on Kubernetes&amp;rdquo; but also &amp;ldquo;grow within Kubernetes.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;A truly &lt;strong&gt;Kubernetes-native AI platform&lt;/strong&gt; should feature:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Control plane compatibility, transparent data plane, declarative extensibility, portability, observability, and reproducibility.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is the key intersection of AI and cloud-native—and the foundation for the next generation of AI infrastructure.&lt;/p&gt;</content:encoded></item><item><title>ChatGPT Atlas Architecture Explained: Size, Unique Design, and Agent Performance Issues</title><link>https://jimmysong.io/blog/chatgpt-atlas-architecture-analysis/</link><pubDate>Tue, 11 Nov 2025 01:46:24 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/chatgpt-atlas-architecture-analysis/</guid><description>A developer&amp;#39;s perspective on why ChatGPT Atlas is massive, architecturally complex, fundamentally different from Chrome, and a deep dive into its Agent runtime mechanism and limitations.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The size and complexity of AI Native browsers are not drawbacks, but rather the inevitable result of a new operating system paradigm.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-is-atlas-so-much-larger-than-chrome-whats-actually-added"&gt;Why Is Atlas So Much Larger Than Chrome? What&amp;rsquo;s Actually Added?&lt;/h2&gt;
&lt;p&gt;Atlas&amp;rsquo;s installation size reaches several gigabytes, which is no accident. It is not just a traditional &amp;ldquo;browser shell&amp;rdquo; but integrates multiple core components:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Chromium Rendering Engine + Independent AI Runtime + Agent Sandbox + Global Context Pipeline&lt;/strong&gt;—all in one.&lt;/p&gt;
&lt;p&gt;Chrome is merely a web renderer, while Atlas has become an AI workflow execution environment.&lt;/p&gt;
&lt;p&gt;Atlas&amp;rsquo;s architectural design is the reason its size far exceeds that of traditional browsers.&lt;/p&gt;
&lt;h2 id="the-core-difference-between-atlas-and-chrome-a-complete-ai-runtime-added"&gt;The Core Difference Between Atlas and Chrome: A Complete AI Runtime Added&lt;/h2&gt;
&lt;p&gt;Many assume Atlas is simply &amp;ldquo;Chromium + AI features,&amp;rdquo; but the reality is much more complex. On top of Chromium, Atlas adds a full AI sub-runtime, forming an independent system-level sub-operating environment.&lt;/p&gt;
&lt;p&gt;The following flowchart illustrates the AI subsystem Atlas adds on top of Chrome:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/chatgpt-atlas-architecture-analysis/2378094c206d84100f8b1dceb6c7bab0.svg" data-img="https://assets.jimmysong.io/images/blog/chatgpt-atlas-architecture-analysis/2378094c206d84100f8b1dceb6c7bab0.svg" alt="Figure 1: Atlas Adds AI Sub-runtime on Top of Chrome" data-caption="Figure 1: Atlas Adds AI Sub-runtime on Top of Chrome"
width="2400"
height="3953"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Atlas Adds AI Sub-runtime on Top of Chrome&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This is not a simple feature extension, but a system-level architectural upgrade.&lt;/p&gt;
&lt;h2 id="atlass-local-data-structure-not-just-cache-but-a-complete-browser--ai-subsystem"&gt;Atlas&amp;rsquo;s Local Data Structure: Not Just Cache, But a Complete Browser + AI Subsystem&lt;/h2&gt;
&lt;p&gt;When Atlas starts for the first time, it generates an independent host profile locally at:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;/Users/&amp;lt;user&amp;gt;/Library/Application Support/com.openai.atlas/browser-data/host/
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Compared to Chrome, Atlas&amp;rsquo;s data structure is much larger and includes various core data:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;12K AmountExtractionHeuristicRegexes
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;960K AutofillStates
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;4.0M BrowserMetrics-spare.pma
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;169M component_crx_cache
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;93M extensions_crx_cache
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1.1M OpenCookieDatabase
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;2.7G OptGuideOnDeviceModel
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;19M Safe Browsing
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;123M screen_ai
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;1.7G user-&amp;lt;uuid&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This data is not just browser cache, but also includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Complete Chromium Profile&lt;/strong&gt; (Cookies, Local State, Shader, Safe Browsing, etc.)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Atlas-specific AI models and feature data&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;OptGuideOnDeviceModel&lt;/code&gt; (Inference guide model)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;screen_ai&lt;/code&gt; (Page structure understanding model)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WasmTtsEngine&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent execution traces and context persistence&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;All stored in &lt;code&gt;user-&amp;lt;uuid&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Chrome only needs to store rendering cache, while Atlas must also save AI runtime state, DOM semantic summaries, Agent execution traces, and large language model (LLM, Large Language Model) context fragments. Naturally, the size is measured in gigabytes.&lt;/p&gt;
&lt;h3 id="why-atlas-has-only-one-host-profile"&gt;Why Atlas Has Only One Host Profile&lt;/h3&gt;
&lt;p&gt;Chrome supports multiple profiles, but Atlas only has &lt;code&gt;host/&lt;/code&gt; and &lt;code&gt;user-&amp;lt;uuid&amp;gt;&lt;/code&gt;. The reasons are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Agent workflows depend on global context and cannot be fragmented&lt;/li&gt;
&lt;li&gt;AI Memory needs to be shared across pages&lt;/li&gt;
&lt;li&gt;OpenAI&amp;rsquo;s IPC and sandbox design currently only support a single principal&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is an architectural limitation, which may change in the future.&lt;/p&gt;
&lt;h2 id="atlass-process-architecture-more-complex-than-chrome"&gt;Atlas&amp;rsquo;s Process Architecture: More Complex Than Chrome&lt;/h2&gt;
&lt;p&gt;Chrome&amp;rsquo;s multi-process architecture mainly includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Main process&lt;/li&gt;
&lt;li&gt;Extension process&lt;/li&gt;
&lt;li&gt;Rendering/network process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Atlas&amp;rsquo;s process architecture is even more complex, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Main process (Chromium Shell)&lt;/li&gt;
&lt;li&gt;Rendering process (DOM/JS)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Runtime process&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent Sandbox isolation environment&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Page extraction and semantic parsing process&lt;/li&gt;
&lt;li&gt;Security policy and permission arbitration process&lt;/li&gt;
&lt;li&gt;Model inference coordination layer (LLM Orchestrator)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each layer communicates strictly via inter-process communication (IPC, Inter-Process Communication), with no &amp;ldquo;script-level execution,&amp;rdquo; ensuring security and reliability.&lt;/p&gt;
&lt;h2 id="how-atlas-agents-run-why-are-they-slow-why-not-like-scripts"&gt;How Atlas Agents Run: Why Are They Slow? Why Not Like Scripts?&lt;/h2&gt;
&lt;p&gt;Atlas Agents do not execute scripts directly, but complete tasks through multi-round inference and sandbox mechanisms. Every action involves a full round of large language model (LLM, Large Language Model) inference, DOM observation, sandbox execution, and re-inference.&lt;/p&gt;
&lt;p&gt;The Agent execution flow is as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;LLM decides the next step&lt;/li&gt;
&lt;li&gt;Sandbox executes the action&lt;/li&gt;
&lt;li&gt;Generates new page observation (structured DOM)&lt;/li&gt;
&lt;li&gt;LLM infers the next step again&lt;/li&gt;
&lt;li&gt;Loop until the task is complete&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Therefore, Atlas Agent execution features include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Always slower than scripts (e.g., Playwright/Selenium direct execution)&lt;/li&gt;
&lt;li&gt;Must be isolated for security&lt;/li&gt;
&lt;li&gt;Multi-round inference ensures reliability&lt;/li&gt;
&lt;li&gt;Structured DOM helps models understand page content&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the fundamental reason why Agent execution is slow.&lt;/p&gt;
&lt;h2 id="atlas-is-evolving-toward-an-ai-native-os"&gt;Atlas Is Evolving Toward an &amp;ldquo;AI Native OS&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;Atlas&amp;rsquo;s capabilities have far surpassed traditional browsers and are gradually evolving into an AI Native Operating System (AI Native OS). The table below compares their core capabilities:&lt;/p&gt;
&lt;p&gt;Here is a comparison table between Atlas and traditional browsers:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Browser&lt;/th&gt;
&lt;th&gt;AI OS (Atlas)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Render Web Pages&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execute JS&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Page Understanding&lt;/td&gt;
&lt;td&gt;✘ (Only Parsing)&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automated Task Execution&lt;/td&gt;
&lt;td&gt;Plugin&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Task-level Reasoning&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-page Workflows&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent Security Sandbox&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Global AI Context&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Comparison of Traditional Browser and AI OS (Atlas) Capabilities
&lt;/figcaption&gt;
&lt;p&gt;Atlas already possesses 30%–40% of the core capabilities of an AI OS. It is not just a browser, but:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AI Runtime + Rendering Engine + Agent Workflow Executor&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;On the surface, Atlas looks like a browser, but in reality, it is a heavyweight AI Runtime:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Massive size: Stores AI models, feature data, and Agent context locally&lt;/li&gt;
&lt;li&gt;Complex processes: Security, inference, and DOM extraction all require independent pipelines&lt;/li&gt;
&lt;li&gt;Slow Agent execution: Every step requires inference, not direct script execution&lt;/li&gt;
&lt;li&gt;Huge profile: Saves the state of the AI system, not just simple cache&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Chrome is a web processor, while Atlas has become an AI workflow engine—they are fundamentally different technological species.&lt;/p&gt;</content:encoded></item><item><title>Two Weeks Deep with ChatGPT Atlas: A Developer's Perspective on Real Potential and Structural Gaps</title><link>https://jimmysong.io/blog/chatgpt-atlas-two-weeks-dev-perspective/</link><pubDate>Tue, 11 Nov 2025 01:26:05 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/chatgpt-atlas-two-weeks-dev-perspective/</guid><description>After two weeks using Atlas as my main browser, I break down its architecture, workflow boosts, pain points, and future directions from a developer&amp;#39;s view.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The future of AI browsers isn&amp;rsquo;t about stacking features, but about unifying developer context and workflow. Atlas has already changed my daily routine, but there are still key gaps to fill.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="why-did-i-switch-to-atlas-as-my-main-browser-on-day-one"&gt;Why Did I Switch to Atlas as My Main Browser on Day One?&lt;/h2&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/atlas.webp" data-img="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/atlas.webp" alt="Figure 1: Using ChatGPT for 1000&amp;#43; days, Atlas for 19 days" data-caption="Figure 1: Using ChatGPT for 1000&amp;#43; days, Atlas for 19 days"
width="2400"
height="1350"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Using ChatGPT for 1000+ days, Atlas for 19 days&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;As a long-term user of multiple developer tools—ChatGPT (GPT-4/5, Codex, o1 series), Chrome (multi-tab, high-density search), VSCode (local project environment), macOS ChatGPT Desktop (local context reading), and local Hugo / Flask / FastAPI debugging—I was drawn to Atlas&amp;rsquo;s design philosophy and immediately made it my main browser. After two weeks of deep use, my core takeaway is:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Atlas transforms the browser from a &amp;ldquo;renderer&amp;rdquo; into a &amp;ldquo;host for AI Runtime&amp;rdquo;.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This article systematically reviews Atlas&amp;rsquo;s architectural strengths, workflow enhancements, structural pain points, and future directions from a developer&amp;rsquo;s perspective.&lt;/p&gt;
&lt;h2 id="real-workflow-enhancements-atlas-brings-to-developers"&gt;Real Workflow Enhancements Atlas Brings to Developers&lt;/h2&gt;
&lt;p&gt;Atlas&amp;rsquo;s core innovation is the native integration of local development environments with AI, dramatically improving developer workflow efficiency.&lt;/p&gt;
&lt;h3 id="native-fusion-of-local-development-and-ai-localhost-access"&gt;Native Fusion of Local Development and AI: &lt;code&gt;localhost&lt;/code&gt; Access&lt;/h3&gt;
&lt;p&gt;Previously, browsers and AI tools were two separate worlds: browsers could see local services, ChatGPT could only analyze uploaded cloud fragments, and AI couldn&amp;rsquo;t directly sense the real state of your project. Atlas is the first to enable AI to directly read local services.&lt;/p&gt;
&lt;p&gt;The following diagram illustrates how local services integrate with Atlas:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/3d36e7b9bc1e775f5d09b9a07284c2a3.svg" data-img="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/3d36e7b9bc1e775f5d09b9a07284c2a3.svg" alt="Figure 2: Atlas Local Service Integration Flow" data-caption="Figure 2: Atlas Local Service Integration Flow"
width="2400"
height="299"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Atlas Local Service Integration Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This capability is significant for developers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When debugging APIs, ChatGPT can directly view response content.&lt;/li&gt;
&lt;li&gt;During documentation preview, AI can compare raw files and rendered results.&lt;/li&gt;
&lt;li&gt;Hugo / SSG local preview lets AI read full HTML.&lt;/li&gt;
&lt;li&gt;Quickly review local error pages for more efficient troubleshooting.&lt;/li&gt;
&lt;li&gt;For the first time, the local project environment is &amp;ldquo;read into&amp;rdquo; the AI reasoning space.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are capabilities that Chrome + Web ChatGPT currently do not support.&lt;/p&gt;
&lt;h3 id="sidebar-chatgpt-persistent-task-threads"&gt;Sidebar ChatGPT: Persistent &amp;ldquo;Task Threads&amp;rdquo;&lt;/h3&gt;
&lt;p&gt;Atlas&amp;rsquo;s sidebar ChatGPT is no longer the traditional &amp;ldquo;one question, one answer&amp;rdquo; model, but a persistent task thread. Developers can switch between multiple tabs while keeping the same conversation history, truly realizing the &amp;ldquo;assistant layer&amp;rdquo; experience.&lt;/p&gt;
&lt;p&gt;The following sequence diagram shows a typical developer workflow:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/608f849196b315dd770b63b90c3a6e9d.svg" data-img="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/608f849196b315dd770b63b90c3a6e9d.svg" alt="Figure 3: Atlas Sidebar Task Thread" data-caption="Figure 3: Atlas Sidebar Task Thread"
width="2400"
height="1289"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Atlas Sidebar Task Thread&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Advantages include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Conversation history is fixed on the left, never covered by the page.&lt;/li&gt;
&lt;li&gt;No need to switch chat windows, focus on the task flow.&lt;/li&gt;
&lt;li&gt;Multi-tab reading is uninterrupted; AI becomes a true developer assistant.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="atlass-structural-pain-points-developer-review"&gt;Atlas&amp;rsquo;s Structural Pain Points (Developer Review)&lt;/h2&gt;
&lt;p&gt;Despite Atlas&amp;rsquo;s innovations, there are still clear structural shortcomings in real development scenarios.&lt;/p&gt;
&lt;h3 id="single-conversation-can-only-reference-one-tabcontext-is-limited"&gt;Single Conversation Can Only Reference One Tab—Context Is Limited&lt;/h3&gt;
&lt;p&gt;Developers often need to compare multiple docs, APIs, implementations, or architecture diagrams, but currently ChatGPT conversations can only bind to the visible tab&amp;rsquo;s content, making multi-page reasoning impossible.&lt;/p&gt;
&lt;p&gt;The diagram below shows the ideal multi-page binding approach:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/1b7b95fd5efc4fdda98a0f90c397acfc.svg" data-img="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/1b7b95fd5efc4fdda98a0f90c397acfc.svg" alt="Figure 4: Multi-Page Bound Conversation" data-caption="Figure 4: Multi-Page Bound Conversation"
width="2400"
height="2312"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Multi-Page Bound Conversation&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In the future, AI browsers should support binding a conversation to multiple pages for true multi-context reasoning.&lt;/p&gt;
&lt;h3 id="inconsistent-conversation-ui-across-desktop-web-and-atlas"&gt;Inconsistent Conversation UI Across Desktop, Web, and Atlas&lt;/h3&gt;
&lt;p&gt;Although Atlas, Web ChatGPT, and ChatGPT Desktop use the same conversation_id and unified context, their display strategies differ, resulting in &amp;ldquo;historically consistent but not real-time consistent&amp;rdquo; experiences.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Web / Atlas (Browser)&lt;/strong&gt;
Every message is immediately written to the server, so updates are visible in real time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ChatGPT Desktop&lt;/strong&gt;
To support local context and faster rendering, it uses a local cache model; it automatically pulls updates from Web/Atlas, but its own updates are not automatically pushed back to the browser.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The result:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The context is always consistent across all three, but UI refresh rates differ:
Desktop → Web/Atlas does not auto-sync; Web must refresh to see the latest messages, and Atlas&amp;rsquo;s sidebar currently has no refresh option.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This can cause a common workflow split for developers: &amp;ldquo;It&amp;rsquo;s the same conversation, but different endpoints show different content.&amp;rdquo;&lt;/p&gt;
&lt;h3 id="missing-prompt-templates-and-quick-command-system"&gt;Missing Prompt Templates and Quick Command System&lt;/h3&gt;
&lt;p&gt;Prompt templates (for code review, doc rewriting, bug retrospectives, API summaries, architecture reviews, etc.) are crucial in daily development. In Chrome, plugins can provide this, but Atlas currently lacks such features.&lt;/p&gt;
&lt;p&gt;The table below summarizes the ideal template and command system capabilities:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Ideal Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Template Library&lt;/td&gt;
&lt;td&gt;Callable from sidebar, variable support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom Commands&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/review&lt;/code&gt;, &lt;code&gt;/refactor&lt;/code&gt;, &lt;code&gt;/summarize&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-device Sync&lt;/td&gt;
&lt;td&gt;Consistent across desktop, browser, future mobile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Awareness&lt;/td&gt;
&lt;td&gt;AI auto-matches your current task&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Prompt Template and Command System Capability Comparison
&lt;/figcaption&gt;
&lt;p&gt;Atlas does not yet support prompt templates or quick command systems.&lt;/p&gt;
&lt;h3 id="atlas-hides-some-devtools-capabilities"&gt;Atlas Hides Some DevTools Capabilities&lt;/h3&gt;
&lt;p&gt;Atlas&amp;rsquo;s DevTools are a full Chromium suite—HTML, network, performance, console debugging are all available. However, the current version has removed Chrome DevTools&amp;rsquo; &amp;ldquo;Device Mode&amp;rdquo; for mobile emulation, including UA/viewport switching, touch simulation, and responsive mode. So, mobile debugging is temporarily unavailable in Atlas. This isn&amp;rsquo;t a missing underlying capability, but rather the UI hasn&amp;rsquo;t exposed the relevant entry yet.&lt;/p&gt;
&lt;h3 id="no-mobile-versionworkflow-context-is-fragmented"&gt;No Mobile Version—Workflow Context Is Fragmented&lt;/h3&gt;
&lt;p&gt;Atlas currently has no mobile version, so commuting, travel, and fragmented time cannot be linked with desktop workflows. Context, history, tabs, task areas, and memory cannot sync, impacting the overall developer experience.&lt;/p&gt;
&lt;h3 id="agent-speed-and-reliability-issues"&gt;Agent Speed and Reliability Issues&lt;/h3&gt;
&lt;p&gt;Atlas&amp;rsquo;s Agent execution is slow, sometimes stalls or fails to complete tasks, feedback mechanisms are opaque, and there&amp;rsquo;s a lack of visual interruption controls. These issues make it hard to adopt in production tasks.&lt;/p&gt;
&lt;h2 id="atlas-chrome-and-chatgpt-desktopcore-capability-comparison"&gt;Atlas, Chrome, and ChatGPT Desktop—Core Capability Comparison&lt;/h2&gt;
&lt;p&gt;The table below summarizes the differences in core developer capabilities, helping Hacker News readers quickly understand the technical positioning.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Chrome&lt;/th&gt;
&lt;th&gt;ChatGPT Desktop&lt;/th&gt;
&lt;th&gt;Atlas&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Browse Web Pages&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Context (VSCode/CLI)&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔✔&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Localhost Service Reading&lt;/td&gt;
&lt;td&gt;✔ (browser)&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔✔ (AI readable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deep ChatGPT Integration&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;td&gt;✔✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tab ←→ Conversation Binding&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DevTools&lt;/td&gt;
&lt;td&gt;✔✔&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-page Conversation Injection&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Template System&lt;/td&gt;
&lt;td&gt;Plugin&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mobile&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;td&gt;✔ (ChatGPT App)&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent Automation&lt;/td&gt;
&lt;td&gt;✘&lt;/td&gt;
&lt;td&gt;✔&lt;/td&gt;
&lt;td&gt;✔ (but immature)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Atlas vs Chrome vs ChatGPT Desktop Core Capability Comparison
&lt;/figcaption&gt;
&lt;p&gt;Atlas&amp;rsquo;s uniqueness lies in:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Atlas is the first mainstream AI browser to natively inject &lt;code&gt;localhost&lt;/code&gt; pages directly into conversation context, but it&amp;rsquo;s not the only solution for accessing local content.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-future-of-ai-browsers-not-smarter-but-more-unified"&gt;The Future of AI Browsers: Not &amp;ldquo;Smarter&amp;rdquo;, But &amp;ldquo;More Unified&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;The future of AI browsers should focus on unified context and workflow, not just stacking features. The diagram below shows the ideal architecture:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/254cb4900533512528610a4b6bed27d0.svg" data-img="https://assets.jimmysong.io/images/blog/chatgpt-atlas-two-weeks-dev-perspective/254cb4900533512528610a4b6bed27d0.svg" alt="Figure 5: Future AI Browser Architecture" data-caption="Figure 5: Future AI Browser Architecture"
width="2400"
height="1212"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: Future AI Browser Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The ideal AI browser should have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unified context graphs across tabs, conversations, and devices.&lt;/li&gt;
&lt;li&gt;Semantic parsing of web page structure.&lt;/li&gt;
&lt;li&gt;Unified reading of local files, APIs, and runtimes.&lt;/li&gt;
&lt;li&gt;Observable, interruptible Agent task execution.&lt;/li&gt;
&lt;li&gt;Unified prompt template runtime layer.&lt;/li&gt;
&lt;li&gt;Collaborative knowledge trajectory (memory graph) across devices.&lt;/li&gt;
&lt;li&gt;Extensible plugin mechanism.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Currently, no AI browser has all these capabilities—including Atlas. But Atlas is on the right track, with its native capabilities still evolving, and is already ahead by about 20%.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;After two weeks of deep use, my conclusions are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Atlas is far from mature, but already powerful enough to significantly change how I read docs, write code, and analyze pages.&lt;/li&gt;
&lt;li&gt;It lowers the barrier between AI and web content, reducing my reliance on ChatGPT Desktop.&lt;/li&gt;
&lt;li&gt;But it still lacks key capabilities developers truly need: multi-page injection, native prompt templates, full DevTools support, mobile, and a reliable Agent Runtime.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Atlas now feels like:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Chrome + ChatGPT + localhost integration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;But to become&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The next-generation operating system (AI OS) for developers&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;It still needs to fill in those critical gaps.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The direction is right, and I&amp;rsquo;ll keep using it—and watching to see if it can become the default toolchain for developers in the future.&lt;/p&gt;</content:encoded></item><item><title>KAITO and KubeFleet: CNCF Is Reshaping AI Inference Infrastructure</title><link>https://jimmysong.io/blog/kaito-and-fleet-enabled-ai-inference/</link><pubDate>Sat, 08 Nov 2025 17:40:00 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/kaito-and-fleet-enabled-ai-inference/</guid><description>CNCF is standardizing AI inference infrastructure for scalable deployment in multi-cluster Kubernetes environments through KAITO and KubeFleet.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Declarative and multi-cluster capabilities of cloud native are becoming the foundation for standardized AI inference infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;AI inference is rapidly emerging as the next frontier for cloud native infrastructure. As large language models (LLM, Large Language Model) grow in capability and scale, traditional single-cluster inference architectures struggle to meet global, high-availability, and cost optimization requirements. In late October 2025, CNCF announced two new hosted projects—&lt;strong&gt;KAITO (Kubernetes AI Toolchain Operator)&lt;/strong&gt; and &lt;strong&gt;KubeFleet&lt;/strong&gt;—marking the cloud native community&amp;rsquo;s official entry into standardized AI inference infrastructure.&lt;/p&gt;
&lt;p&gt;This article provides a systematic analysis of both projects and explores their strategic significance for the AI Infra ecosystem.&lt;/p&gt;
&lt;h2 id="the-complexity-of-ai-inference-from-single-cluster-to-multi-cluster"&gt;The Complexity of AI Inference: From Single Cluster to Multi-Cluster&lt;/h2&gt;
&lt;p&gt;As inference workloads for large models evolve, enterprises are adopting multi-cluster architectures. Below are three major challenges introduced by multi-cluster setups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deployment consistency: Managing model versions, dependencies, and configuration drift across clusters is difficult.&lt;/li&gt;
&lt;li&gt;Scarce compute resources: Intelligent scheduling of available GPUs is required to avoid resource waste or hotspots.&lt;/li&gt;
&lt;li&gt;Service reliability: Inference endpoints must deliver low latency, high availability, and cross-region SLAs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;KAITO and KubeFleet are designed to address these challenges.&lt;/p&gt;
&lt;p&gt;The following diagram illustrates the architecture of KAITO and KubeFleet.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/kaito-and-fleet-enabled-ai-inference/24012c6d0579467aebee1facf7cd4f90.svg" data-img="https://assets.jimmysong.io/images/blog/kaito-and-fleet-enabled-ai-inference/24012c6d0579467aebee1facf7cd4f90.svg" alt="Figure 1: KAITO and KubeFleet Architecture" data-caption="Figure 1: KAITO and KubeFleet Architecture"
width="2400"
height="2580"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: KAITO and KubeFleet Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The architecture can be summarized as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The top layer is the KubeFleet Hub Cluster, which controls multi-cluster placement logic.&lt;/li&gt;
&lt;li&gt;The lower layer consists of three regional clusters (US / EU / APAC), each with Active Nodes and Spare GPUs.&lt;/li&gt;
&lt;li&gt;The Inference Gateway provides a unified global inference entry point.&lt;/li&gt;
&lt;li&gt;Arrow directions represent the control flow of placement and aggregation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="kaito-declarative-orchestration-for-ai-inference"&gt;KAITO: Declarative Orchestration for AI Inference&lt;/h2&gt;
&lt;p&gt;KAITO (Kubernetes AI Toolchain Operator), initiated by the Microsoft team, is a declarative AI workload management framework. It abstracts model lifecycle management via CRDs (Custom Resource Definitions), making LLM inference as configurable and reusable as microservice deployment.&lt;/p&gt;
&lt;p&gt;Project URL: &lt;a href="https://github.com/kaito-project/kaito" target="_blank" rel="noopener"&gt;github.com/kaito-project/kaito&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The table below summarizes KAITO&amp;rsquo;s core features and design principles:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature/Principle&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Workspace Model Mgmt&lt;/td&gt;
&lt;td&gt;Supports both pre-trained and BYO (Bring Your Own) models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Automatic Resource Allocation&lt;/td&gt;
&lt;td&gt;Dynamically requests nodes and volumes based on model size and GPU availability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-node Optimization&lt;/td&gt;
&lt;td&gt;Supports distributed storage and compute scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Built-in Observability&lt;/td&gt;
&lt;td&gt;Directly outputs inference latency, throughput, and error metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Declarative Deployment&lt;/td&gt;
&lt;td&gt;Models are treated as native Kubernetes resources, supporting YAML config and GitOps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: KAITO Core Features and Design Principles
&lt;/figcaption&gt;
&lt;p&gt;For example, an inference pipeline can be declared in YAML:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;apiVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;aitoolchain.io/v1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;ModelDeployment&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;qwen2-7b&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;qwen2-7b&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;vllm&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;replicas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;This enables AI platforms to achieve the same deployment consistency and GitOps capabilities as application services.&lt;/p&gt;
&lt;h2 id="kubefleet-intelligent-multi-cluster-scheduling-and-placement"&gt;KubeFleet: Intelligent Multi-Cluster Scheduling and Placement&lt;/h2&gt;
&lt;p&gt;KubeFleet, led by the Azure Kubernetes Service (AKS) team, is a multi-cluster orchestrator focused on intelligent placement of inference workloads.&lt;/p&gt;
&lt;p&gt;Project URL: &lt;a href="https://github.com/kubefleet-dev/kubefleet" target="_blank" rel="noopener"&gt;github.com/kubefleet-dev/kubefleet&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The table below highlights KubeFleet&amp;rsquo;s key features and use cases:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature/Use Case&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cluster Capability Discovery&lt;/td&gt;
&lt;td&gt;Evaluates each cluster&amp;rsquo;s GPU type, quantity, cost, and location&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intelligent Placement&lt;/td&gt;
&lt;td&gt;Deploys inference tasks to the most suitable cluster based on policies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Staged Updates&lt;/td&gt;
&lt;td&gt;Supports canary releases across test, staging, and production clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consistency Control&lt;/td&gt;
&lt;td&gt;Ensures unified deployment templates across clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Global Inference Service&lt;/td&gt;
&lt;td&gt;Supports geo-distributed inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Heterogeneous GPU Pool Scheduling&lt;/td&gt;
&lt;td&gt;Enables enterprise-grade unified deployment across environments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: KubeFleet Key Features and Use Cases
&lt;/figcaption&gt;
&lt;h2 id="kaito--kubefleet-layered-design-of-ai-inference-infrastructure"&gt;KAITO × KubeFleet: Layered Design of AI Inference Infrastructure&lt;/h2&gt;
&lt;p&gt;The following table summarizes the layered positioning of KAITO and KubeFleet in AI inference infrastructure:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;th&gt;Representative Project&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Global Placement Layer&lt;/td&gt;
&lt;td&gt;Decides which cluster&lt;/td&gt;
&lt;td&gt;KubeFleet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cluster Orchestration Layer&lt;/td&gt;
&lt;td&gt;Defines model deployment&lt;/td&gt;
&lt;td&gt;KAITO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime Layer&lt;/td&gt;
&lt;td&gt;Executes inference engine&lt;/td&gt;
&lt;td&gt;vLLM / TGI / SGLang / Triton&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infra Layer&lt;/td&gt;
&lt;td&gt;Provides compute and scheduling&lt;/td&gt;
&lt;td&gt;Kubernetes / GPU / CNI / Storage&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Layered Design of AI Inference Infrastructure
&lt;/figcaption&gt;
&lt;p&gt;This layered approach reflects CNCF&amp;rsquo;s consistent philosophy: abstracting complex infrastructure through declarative and pluggable methods to lower the entry barrier for AI inference platforms.&lt;/p&gt;
&lt;h2 id="ecosystem-significance-and-trend-analysis"&gt;Ecosystem Significance and Trend Analysis&lt;/h2&gt;
&lt;p&gt;AI Infra is undergoing cloud native transformation, with CNCF integrating AI workloads into its governance system. This will drive AI platforms to gradually adopt a standardized stack aligned with cloud native principles. Multi-cluster scheduling is becoming the new battleground, and GPU heterogeneity and cross-region compliance are pushing enterprises toward multi-cluster inference architectures. KubeFleet may become the &amp;ldquo;AI Federation&amp;rdquo; successor to Karmada / Clusternet. Declarative AI operations will replace manual script-based deployments, and KAITO&amp;rsquo;s CRD model could become the standard semantic layer for future ML serving. The strategic collaboration between Microsoft and CNCF is strengthening, as both projects originate from the Azure team, signaling that cloud vendors are participating in the AI ecosystem through open infrastructure standards.&lt;/p&gt;
&lt;h2 id="comparison-with-existing-projects"&gt;Comparison with Existing Projects&lt;/h2&gt;
&lt;p&gt;The table below compares KAITO, KubeFleet, and mainstream AI inference infrastructure projects:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;KAITO&lt;/th&gt;
&lt;th&gt;KubeFleet&lt;/th&gt;
&lt;th&gt;Kubeflow&lt;/th&gt;
&lt;th&gt;KServe&lt;/th&gt;
&lt;th&gt;HAMI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Declarative Model Deployment&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-cluster Scheduling&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;–&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU Heterogeneity Awareness&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Telemetry / Metrics&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Vendor Support&lt;/td&gt;
&lt;td&gt;Microsoft / CNCF&lt;/td&gt;
&lt;td&gt;Microsoft / CNCF&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;IBM / RedHat&lt;/td&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 4: Feature Comparison of AI Inference Infrastructure Projects
&lt;/figcaption&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The emergence of KAITO and KubeFleet marks a pivotal moment in the evolution of AI Infra. They represent the cloud native community&amp;rsquo;s formal engagement with AI inference and reveal future trends:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The complexity of AI inference will be absorbed by Kubernetes&amp;rsquo; declarative and multi-cluster systems.&lt;/li&gt;
&lt;li&gt;Both projects should be considered essential references for anyone researching AI-native infrastructure.&lt;/li&gt;
&lt;li&gt;For developers and platform teams, they are not just new tools but signals of AI infrastructure standardization.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://kaito-project.netlify.app/" target="_blank" rel="noopener"&gt;KAITO Official Website - kaito-project.netlify.app&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubefleet.dev/" target="_blank" rel="noopener"&gt;KubeFleet Official Website - kubefleet.dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cncf.io/sandbox-projects/" target="_blank" rel="noopener"&gt;CNCF Sandbox Projects - cncf.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://thenewstack.io/kaito-and-kubefleet-projects-solving-ai-inference-at-scale/" target="_blank" rel="noopener"&gt;KAITO and KubeFleet: Projects Solving AI Inference at Scale - thenewstack.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Building Efficient LLM Inference with the Cloud Native Quartet: KServe, vLLM, llm-d, and WG Serving</title><link>https://jimmysong.io/blog/cloud-native-llm-inference-stack/</link><pubDate>Sat, 08 Nov 2025 05:21:59 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/cloud-native-llm-inference-stack/</guid><description>Essential reading for cloud native and AI-native architects: how KServe, vLLM, llm-d, and WG Serving form the cloud native &amp;#39;quartet&amp;#39; for large model inference, their roles, synergy, and ecosystem trends.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The standardization and modularization of cloud native inference systems are making large model deployment as simple and efficient as web services.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Large language model (LLM, Large Language Model) inference is evolving from the era of single-machine accelerators to distributed cloud native systems. The most representative combination today is &lt;strong&gt;KServe, vLLM, llm-d, and WG Serving&lt;/strong&gt;. Each plays a distinct role—standard interface, execution engine, scheduling layer, and collaboration specification—forming a scalable, observable, and governable inference foundation.&lt;/p&gt;
&lt;p&gt;The following timeline outlines the key milestones in the evolution of the quartet:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/13c049dec5cb173364f1f33e50d9ec84.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/13c049dec5cb173364f1f33e50d9ec84.svg" alt="Figure 1: Cloud Native LLM Inference Quartet Evolution Timeline" data-caption="Figure 1: Cloud Native LLM Inference Quartet Evolution Timeline"
width="2400"
height="737"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Cloud Native LLM Inference Quartet Evolution Timeline&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="architecture-overview"&gt;Architecture Overview&lt;/h2&gt;
&lt;p&gt;The diagram below illustrates the layered collaboration of the quartet within the inference system:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/51f1f2891de4639b5a1f3e762addc4b6.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/51f1f2891de4639b5a1f3e762addc4b6.svg" alt="Figure 2: Cloud Native LLM Inference Quartet Architecture Overview" data-caption="Figure 2: Cloud Native LLM Inference Quartet Architecture Overview"
width="3223"
height="383"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Cloud Native LLM Inference Quartet Architecture Overview&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="kserve-the-core-of-cloud-native-model-serving"&gt;KServe: The Core of Cloud Native Model Serving&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/kserve/kserve" target="_blank" rel="noopener"&gt;KServe&lt;/a&gt; is a Kubernetes-native inference control plane. It abstracts model services as CRDs (Custom Resource Definitions), making AI inference as deployable, scalable, and upgradable as microservices.&lt;/p&gt;
&lt;p&gt;The table below summarizes KServe&amp;rsquo;s core capabilities and new features:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core Goal&lt;/td&gt;
&lt;td&gt;Provides Kubernetes-native inference standards and control plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Core Features&lt;/td&gt;
&lt;td&gt;CRD standardization, elastic scaling, traffic governance, unified gateway entry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New Features&lt;/td&gt;
&lt;td&gt;LeaderWorkerSet support, AI Gateway integration, llm-d integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: KServe Core Capabilities and New Features
&lt;/figcaption&gt;
&lt;p&gt;KServe&amp;rsquo;s key capabilities include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unified Interface&lt;/strong&gt;: The InferenceService CRD defines input/output protocols, compatible with REST/GRPC and OpenAI API.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elastic Scheduling&lt;/strong&gt;: Supports automatic GPU scaling and ModelMesh multi-model hosting.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Traffic Governance&lt;/strong&gt;: Canary releases, A/B testing, and InferenceGraph.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The latest version introduces the LeaderWorkerSet (LWS) mechanism and Envoy AI Gateway extension, making multi-Pod large model inference a native capability. KServe is transitioning from a traditional ML service platform to the &lt;strong&gt;standard control plane for generative AI&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="vllm-high-performance-inference-execution-engine"&gt;vLLM: High-Performance Inference Execution Engine&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener"&gt;vLLM&lt;/a&gt; focuses on extreme throughput and memory efficiency, setting the benchmark for open-source performance.&lt;/p&gt;
&lt;p&gt;The sequence diagram below shows the main vLLM inference process:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/f28b9055dcd4f14868b99f15ebddd608.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/f28b9055dcd4f14868b99f15ebddd608.svg" alt="Figure 3: vLLM Inference Process" data-caption="Figure 3: vLLM Inference Process"
width="2400"
height="2010"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: vLLM Inference Process&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The table below summarizes vLLM&amp;rsquo;s core technical mechanisms and effects:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Technical Mechanism&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PagedAttention&lt;/td&gt;
&lt;td&gt;Memory paging&lt;/td&gt;
&lt;td&gt;Longer context, less fragmentation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continuous Batching&lt;/td&gt;
&lt;td&gt;Dynamic batch scheduling&lt;/td&gt;
&lt;td&gt;Higher GPU utilization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prefix Cache&lt;/td&gt;
&lt;td&gt;Prefix reuse&lt;/td&gt;
&lt;td&gt;Lower latency and cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: vLLM Core Technical Mechanisms and Effects
&lt;/figcaption&gt;
&lt;p&gt;vLLM is compatible with the OpenAI API, supports INT8/FP8 quantization and various parallel modes, and adapts to NVIDIA, AMD, TPU, and Gaudi hardware. In single-machine or small-scale scenarios, vLLM can run independently; in cluster environments, it serves as the execution foundation for KServe/llm-d.&lt;/p&gt;
&lt;h2 id="llm-d-distributed-inference-scheduling-layer"&gt;llm-d: Distributed Inference Scheduling Layer&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/llm-d/llm-d" target="_blank" rel="noopener"&gt;llm-d&lt;/a&gt; is a large model scheduling and orchestration system for Kubernetes, enabling multi-instance collaboration for vLLM. Its design goal: &lt;strong&gt;make clusters infer like a single machine&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The table below summarizes llm-d&amp;rsquo;s core mechanisms and technical highlights:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Technical Highlight&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scheduler&lt;/td&gt;
&lt;td&gt;Cache-aware routing&lt;/td&gt;
&lt;td&gt;Prefix affinity scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prefill/Decode Separation&lt;/td&gt;
&lt;td&gt;Heterogeneous hardware optimization&lt;/td&gt;
&lt;td&gt;A100 Prefill + L40 Decode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache Manager&lt;/td&gt;
&lt;td&gt;Global cache index&lt;/td&gt;
&lt;td&gt;Hierarchical GPU/CPU/NVMe cache&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: llm-d Core Mechanisms and Technical Highlights
&lt;/figcaption&gt;
&lt;p&gt;The following diagram illustrates llm-d&amp;rsquo;s distributed scheduling and caching mechanism:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/b81458684a761764a7a75821dc80aa3e.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/b81458684a761764a7a75821dc80aa3e.svg" alt="Figure 4: llm-d Distributed Scheduling and Caching Mechanism" data-caption="Figure 4: llm-d Distributed Scheduling and Caching Mechanism"
width="2400"
height="5935"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: llm-d Distributed Scheduling and Caching Mechanism&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;llm-d runs under the KServe control plane in a Leader/Worker pattern. The scheduler can be embedded in Envoy or deployed independently, making real-time routing decisions based on cache and load information. Its emergence enables &lt;strong&gt;autonomous scheduling and elastic parallelism&lt;/strong&gt; for multi-node LLM inference.&lt;/p&gt;
&lt;h2 id="wg-serving-collaboration-standards-and-ecosystem-hub"&gt;WG Serving: Collaboration Standards and Ecosystem Hub&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/kubernetes-sigs/wg-serving" target="_blank" rel="noopener"&gt;WG Serving&lt;/a&gt; is an AI Serving working group promoted by the Kubernetes community, defining unified inference semantics in K8s.&lt;/p&gt;
&lt;p&gt;The table below summarizes WG Serving&amp;rsquo;s core achievements and standardization contributions:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Achievement/Standard&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/kubernetes-sigs/gateway-api-inference-extension" target="_blank" rel="noopener"&gt;Gateway Inference Extension (GIE)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Envoy-based inference gateway protocol supporting model identification, streaming forwarding, priority, and cache affinity routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LeaderWorkerSet CRD&lt;/td&gt;
&lt;td&gt;Explicitly describes Leader–Worker collaboration structure, foundational for llm-d and KServe multi-Pod inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Interface Alignment&lt;/td&gt;
&lt;td&gt;Advocates OpenAI-style API integration with K8s resource objects, promoting cross-framework interoperability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 4: WG Serving Core Achievements and Standardization Contributions
&lt;/figcaption&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GIE is the &amp;lsquo;unified gateway language&amp;rsquo; for cloud native AI inference&lt;/strong&gt;, just as Ingress defines HTTP service entry, it defines the standard semantics and gateway behavior for inference requests within Kubernetes, enabling composable, observable, and extensible inference systems.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The diagram below shows WG Serving&amp;rsquo;s standardized collaboration within the inference system:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/912e0c976c3b137099dd3dbc3ee3c9f6.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/912e0c976c3b137099dd3dbc3ee3c9f6.svg" alt="Figure 5: WG Serving Standardized Collaboration" data-caption="Figure 5: WG Serving Standardized Collaboration"
width="2400"
height="4027"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: WG Serving Standardized Collaboration&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;WG Serving is not a product, but a &lt;strong&gt;standard layer forming industry consensus&lt;/strong&gt;, driving the unified language for cloud native AI inference.&lt;/p&gt;
&lt;h2 id="combined-architecture"&gt;Combined Architecture&lt;/h2&gt;
&lt;p&gt;The table below summarizes the division of labor and roles of the quartet in the system:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Entry Layer&lt;/td&gt;
&lt;td&gt;Envoy + GIE&lt;/td&gt;
&lt;td&gt;Unified API gateway and traffic hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control Layer&lt;/td&gt;
&lt;td&gt;KServe + LWS&lt;/td&gt;
&lt;td&gt;Lifecycle management, elastic scaling, traffic orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduling Layer&lt;/td&gt;
&lt;td&gt;llm-d&lt;/td&gt;
&lt;td&gt;Prefix-aware routing, cross-Pod collaboration, cache management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution Layer&lt;/td&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;Efficient inference execution and cache reuse&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 5: Cloud Native LLM Inference Quartet Division of Labor and Roles
&lt;/figcaption&gt;
&lt;p&gt;The diagram below illustrates the synergy among the quartet:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/bb5fe3c87d7aaa6e5715ecb42d1e31b5.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/bb5fe3c87d7aaa6e5715ecb42d1e31b5.svg" alt="Figure 6: Cloud Native LLM Inference Quartet Synergy" data-caption="Figure 6: Cloud Native LLM Inference Quartet Synergy"
width="2400"
height="385"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 6: Cloud Native LLM Inference Quartet Synergy&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Clients send requests in OpenAI API format, routed by the GIE gateway to the optimal Leader, with Prefill completed and cache passed to Worker Decode, finally streaming the result back. The entire chain features standard interfaces, high throughput, and elastic scaling.&lt;/p&gt;
&lt;h2 id="ecosystem-convergence-trends"&gt;Ecosystem Convergence Trends&lt;/h2&gt;
&lt;p&gt;The table below summarizes the convergence trends and feature comparisons in the cloud native LLM inference ecosystem:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trend/Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API Unification&lt;/td&gt;
&lt;td&gt;OpenAI-style interfaces have become the de facto standard; KServe and vLLM natively compatible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Module Decoupling&lt;/td&gt;
&lt;td&gt;Gateway, scheduling, and inference are layered for independent evolution and replacement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hierarchical Caching&lt;/td&gt;
&lt;td&gt;GPU–CPU–NVMe three-level KV cache is mainstream&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Community Collaboration&lt;/td&gt;
&lt;td&gt;WG Serving, PyTorch Foundation, CNCF jointly promote cross-project integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 6: Cloud Native LLM Inference Ecosystem Convergence Trends and Feature Comparison
&lt;/figcaption&gt;
&lt;p&gt;The following matrix compares the core capabilities of each project:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Control Plane&lt;/th&gt;
&lt;th&gt;Inference Performance&lt;/th&gt;
&lt;th&gt;Distributed Capability&lt;/th&gt;
&lt;th&gt;Interface Compatibility&lt;/th&gt;
&lt;th&gt;Cache Mechanism&lt;/th&gt;
&lt;th&gt;Elastic Scaling&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KServe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ CRD / LWS&lt;/td&gt;
&lt;td&gt;⚪ Medium&lt;/td&gt;
&lt;td&gt;⭐ Multi-model management&lt;/td&gt;
&lt;td&gt;✅ OpenAI API&lt;/td&gt;
&lt;td&gt;⚪ None&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;vLLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⚪ None&lt;/td&gt;
&lt;td&gt;🌟 Very High&lt;/td&gt;
&lt;td&gt;⭐ Multi-GPU&lt;/td&gt;
&lt;td&gt;✅ OpenAI API&lt;/td&gt;
&lt;td&gt;✅ Paged KV&lt;/td&gt;
&lt;td&gt;⚪ None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;llm-d&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⭐ K8s-native scheduling&lt;/td&gt;
&lt;td&gt;🌟 High&lt;/td&gt;
&lt;td&gt;🌟 Multi-instance collaboration&lt;/td&gt;
&lt;td&gt;✅ Inherits upper-layer interface&lt;/td&gt;
&lt;td&gt;🌟 Global cache&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;WG Serving&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;🌟 Standard abstraction&lt;/td&gt;
&lt;td&gt;⚪ None&lt;/td&gt;
&lt;td&gt;🌟 Cross-project collaboration&lt;/td&gt;
&lt;td&gt;🌟 Unified specification&lt;/td&gt;
&lt;td&gt;⚪ Not involved&lt;/td&gt;
&lt;td&gt;⚪&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 7: Cloud Native LLM Inference Quartet Feature Comparison Matrix
&lt;/figcaption&gt;
&lt;p&gt;The future inference stack will center on standard APIs and pluggable modules, enabling &lt;strong&gt;deployment of large language models (LLM, Large Language Model) as easily as web services&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="deployment-paradigm-example"&gt;Deployment Paradigm Example&lt;/h2&gt;
&lt;p&gt;In a Kubernetes cluster, the deployment paradigm for the quartet is as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prefill Layer&lt;/strong&gt;: 4 × A100 Pods, responsible for long-context computation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Decode Layer&lt;/strong&gt;: 16 × L4 Pods, performing streaming generation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;llm-d Scheduler&lt;/strong&gt;: Dynamically routes based on cache hit rate.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;KServe Control Plane&lt;/strong&gt;: Manages LWS resources and scaling.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Envoy GIE Gateway&lt;/strong&gt;: Unified OpenAI interface entry.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The topology diagram below shows the deployment structure:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/aef5281e010cf2d3e95cae8d6a001129.svg" data-img="https://assets.jimmysong.io/images/blog/cloud-native-llm-inference-stack/aef5281e010cf2d3e95cae8d6a001129.svg" alt="Figure 7: Cloud Native LLM Inference Quartet Deployment Topology" data-caption="Figure 7: Cloud Native LLM Inference Quartet Deployment Topology"
width="2540"
height="400"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 7: Cloud Native LLM Inference Quartet Deployment Topology&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This combination achieves high concurrency, low cost, and observability for large model services.&lt;/p&gt;
&lt;h2 id="conclusion-the-future-of-standardization"&gt;Conclusion: The Future of Standardization&lt;/h2&gt;
&lt;p&gt;The table below summarizes the layers, roles, and core contributions of the quartet in the inference system:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Core Contribution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Entry&lt;/td&gt;
&lt;td&gt;WG Serving (GIE)&lt;/td&gt;
&lt;td&gt;Unified traffic entry and interface specification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Control&lt;/td&gt;
&lt;td&gt;KServe&lt;/td&gt;
&lt;td&gt;Kubernetes-native deployment and management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduling&lt;/td&gt;
&lt;td&gt;llm-d&lt;/td&gt;
&lt;td&gt;Prefix cache-aware distributed inference scheduling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Execution&lt;/td&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;High-performance, low-cost inference engine&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 8: Cloud Native LLM Inference Quartet Layers and Core Contributions
&lt;/figcaption&gt;
&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt;&lt;br&gt;
This &amp;ldquo;quartet&amp;rdquo; marks the beginning of a standardized and composable era for large model inference. Future trends will focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;API standardization (OpenAI / OpenInference)&lt;/li&gt;
&lt;li&gt;Hierarchical and shared caching&lt;/li&gt;
&lt;li&gt;Decoupling of control and data planes&lt;/li&gt;
&lt;li&gt;Integrated orchestration on cloud native platforms&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The cloud native LLM inference quartet—KServe, vLLM, llm-d, and WG Serving—is driving standardization, modularization, and ecosystem convergence in inference systems. Through layered collaboration and standard interfaces, developers can achieve high-performance, low-cost, and observable large language model inference services, accelerating the adoption and innovation of AI-native architectures.&lt;/p&gt;</content:encoded></item><item><title>Why AI Inference Naturally Belongs to Kubernetes</title><link>https://jimmysong.io/blog/ai-inference-on-kubernetes/</link><pubDate>Wed, 05 Nov 2025 03:44:57 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-inference-on-kubernetes/</guid><description>Explore why Kubernetes is the ideal runtime for AI inference — delivering elastic, cost-efficient, low-latency model serving with GPU-aware autoscaling, versioning, and observability.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;The future of AI inference lies not in &amp;ldquo;faster GPUs,&amp;rdquo; but in &amp;ldquo;smarter infrastructure.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-natural-fit-between-ai-inference-and-cloud-native"&gt;The Natural Fit Between AI Inference and Cloud Native&lt;/h2&gt;
&lt;p&gt;AI inference (AI Inference) systems must balance performance, elasticity, cost, and operability—precisely the core capabilities Kubernetes has accumulated over a decade of cloud native evolution.&lt;/p&gt;
&lt;p&gt;When we re-examine AI infrastructure, Kubernetes is not just a &amp;ldquo;container orchestrator&amp;rdquo; but is becoming the runtime foundation for AI inference.&lt;/p&gt;
&lt;p&gt;The core requirements of AI inference systems include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Elasticity (handling traffic peaks vs. idle periods)&lt;/li&gt;
&lt;li&gt;Low latency (sensitive to inference response time)&lt;/li&gt;
&lt;li&gt;Cost control (GPU resources are expensive)&lt;/li&gt;
&lt;li&gt;Canary releases and version management (frequent model iterations)&lt;/li&gt;
&lt;li&gt;Multi-tenancy and isolation (different models/teams sharing clusters)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These are exactly the problems cloud native technologies have solved over the past decade. In other words: &lt;strong&gt;AI Inference is retracing the path of cloud native microservices, only the underlying compute has shifted from CPU to GPU.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;AI inference and training differ significantly in resource usage and architectural requirements. The table below compares their main characteristics to help explain why inference scenarios are highly compatible with cloud native architectures.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;AI Training&lt;/th&gt;
&lt;th&gt;AI Inference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Resource Pattern&lt;/td&gt;
&lt;td&gt;Long-term GPU occupation, compute-intensive&lt;/td&gt;
&lt;td&gt;Short-term high concurrency, fluctuating load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Primary Goal&lt;/td&gt;
&lt;td&gt;Maximize throughput&lt;/td&gt;
&lt;td&gt;Minimize response time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost Model&lt;/td&gt;
&lt;td&gt;Fixed resource investment&lt;/td&gt;
&lt;td&gt;Dynamic, elastic resource allocation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operations Mode&lt;/td&gt;
&lt;td&gt;Batch jobs&lt;/td&gt;
&lt;td&gt;Service-oriented deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability Focus&lt;/td&gt;
&lt;td&gt;Loss, Step, GPU utilization&lt;/td&gt;
&lt;td&gt;QPS, latency, token throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Resource and Operations Comparison: AI Training vs. Inference
&lt;/figcaption&gt;
&lt;p&gt;These characteristics are highly consistent with Kubernetes&amp;rsquo; core principles: elastic scheduling, declarative management, and resource isolation. In other words, the complexity of AI inference scenarios is exactly what cloud native architectures were designed to address.&lt;/p&gt;
&lt;h2 id="kubernetes-capabilities-mapping-for-ai-inference"&gt;Kubernetes Capabilities Mapping for AI Inference&lt;/h2&gt;
&lt;p&gt;Kubernetes offers a rich set of native capabilities that map precisely to the various needs of AI inference. The table below summarizes the main features and their value in inference scenarios.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Kubernetes Feature&lt;/th&gt;
&lt;th&gt;Value for AI Inference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Horizontal Pod Autoscaler (HPA)&lt;/td&gt;
&lt;td&gt;Auto-scales replicas based on GPU utilization or latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vertical Pod Autoscaler (VPA)&lt;/td&gt;
&lt;td&gt;Dynamically adjusts container CPU/GPU limits to match load&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cluster Autoscaler (CA)&lt;/td&gt;
&lt;td&gt;Auto-scales node pools to handle large-scale inference requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Device Plugin&lt;/td&gt;
&lt;td&gt;GPU/TPU resource registration and isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Node Affinity / Taints&lt;/td&gt;
&lt;td&gt;Ensures model replicas are distributed on appropriate nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service Mesh / Ingress&lt;/td&gt;
&lt;td&gt;Supports canary releases and A/B testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability Stack&lt;/td&gt;
&lt;td&gt;Collects inference metrics: latency distribution, throughput, model version performance, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Mapping Kubernetes Features to AI Inference Value
&lt;/figcaption&gt;
&lt;p&gt;Combined, these capabilities form a cloud native foundation for &amp;ldquo;Inference as a Service.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="cloud-native-ai-inference-architecture-diagram"&gt;Cloud Native AI Inference Architecture Diagram&lt;/h2&gt;
&lt;p&gt;The following diagram illustrates a typical cloud native AI inference system architecture, covering request entry, inference services, resource scheduling, monitoring, and auto-scaling.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-inference-on-kubernetes/2402dddfba6f26f40d8efad4f02cf572.svg" data-img="https://assets.jimmysong.io/images/blog/ai-inference-on-kubernetes/2402dddfba6f26f40d8efad4f02cf572.svg" alt="Figure 1: Cloud Native AI Inference Architecture" data-caption="Figure 1: Cloud Native AI Inference Architecture"
width="2755"
height="301"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Cloud Native AI Inference Architecture&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;This architecture enables efficient routing of inference requests, elastic resource scheduling, performance monitoring, and a closed loop of auto-scaling.&lt;/p&gt;
&lt;h2 id="evolution-path-of-ai-inference-operation-modes"&gt;Evolution Path of AI Inference Operation Modes&lt;/h2&gt;
&lt;p&gt;The evolution of AI inference platforms can be divided into three stages. The following list outlines the main features and technical highlights of each stage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Containerized Deployment Stage&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Models are packaged as Docker images and deployed via YAML files.&lt;/li&gt;
&lt;li&gt;Pros: Standardization; Cons: Lack of dynamic scheduling.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Auto-scaling and Resource Optimization Stage&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Introduces HPA/VPA/KEDA for dynamic GPU resource allocation.&lt;/li&gt;
&lt;li&gt;Adds monitoring and metric feedback for closed-loop performance tuning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;AI Native Platform Stage&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Integrates model, version, monitoring, and cost management.&lt;/li&gt;
&lt;li&gt;Introduces model registry, KServe, vLLM, and other ecosystem components.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-kubernetes-is-the-ideal-foundation-for-ai-inference"&gt;Why Kubernetes Is the Ideal Foundation for AI Inference&lt;/h2&gt;
&lt;p&gt;As the foundation for AI inference platforms, Kubernetes offers the following unique advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Elasticity and Predictability&lt;/strong&gt;: Handles dramatic traffic fluctuations; auto-scaling can adjust replicas within seconds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Resource Reuse and Isolation&lt;/strong&gt;: Supports GPU partitioning (MIG), sharing (fractional GPU), and other mechanisms to improve resource utilization.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Canary Releases and Version Governance&lt;/strong&gt;: Deployment + Service Mesh enables canary model switching and multi-version coexistence.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cross-environment Consistency&lt;/strong&gt;: Define once, run anywhere. Supports unified inference experience across local, private, and public clouds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Complete Ecosystem&lt;/strong&gt;: Seamless integration with Kubeflow, KServe, Ray, vLLM, and other components to build a full-stack AI infrastructure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These capabilities make Kubernetes the platform of choice for AI inference engineers.&lt;/p&gt;
&lt;h2 id="future-trends-of-ai-native-infrastructure"&gt;Future Trends of AI Native Infrastructure&lt;/h2&gt;
&lt;p&gt;The diagram below shows the convergence path of DevOps and AI, reflecting the evolution loop from automated deployment to intelligent feedback.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-inference-on-kubernetes/3a7b61362ee0688efd909fd9d5b3a543.svg" data-img="https://assets.jimmysong.io/images/blog/ai-inference-on-kubernetes/3a7b61362ee0688efd909fd9d5b3a543.svg" alt="Figure 2: DevOps and AI Convergence Evolution Path" data-caption="Figure 2: DevOps and AI Convergence Evolution Path"
width="1920"
height="207"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: DevOps and AI Convergence Evolution Path&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;In the future, Kubernetes will span the entire chain—from application orchestration to model serving—gradually evolving into the infrastructure for &amp;ldquo;AI Native Platform Engineering.&amp;rdquo; Key trends include:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Trend Direction&lt;/th&gt;
&lt;th&gt;Core Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU Scheduling &amp;amp; Observability Integration&lt;/td&gt;
&lt;td&gt;Metrics will cover latency, throughput, token utilization, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platformization of Model Governance&lt;/td&gt;
&lt;td&gt;Automated evaluation of model performance and resource cost-effectiveness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost &amp;amp; Energy-aware Scheduling&lt;/td&gt;
&lt;td&gt;Dynamically decide optimal GPU nodes and instances&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge Inference Collaboration&lt;/td&gt;
&lt;td&gt;Kubernetes + Edge forms a distributed intelligent inference mesh&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Future Trends of AI Native Infrastructure
&lt;/figcaption&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;Over the past decade, Kubernetes has defined the language of cloud native infrastructure; in the next decade, it will also define the runtime foundation for AI inference. AI is not just an algorithmic problem, but an engineering one. Kubernetes gives us, for the first time, the opportunity to manage AI complexity in a systematic and declarative way. The future of AI inference depends not on &amp;ldquo;faster GPUs,&amp;rdquo; but on &amp;ldquo;smarter infrastructure&amp;rdquo;—which is precisely the essence of cloud native.&lt;/p&gt;</content:encoded></item><item><title>Why Glass Is Cheap but Installation Is Expensive: Jevons Paradox and Baumol Effect in the AI Era</title><link>https://jimmysong.io/blog/jevons-baumol-ai-china/</link><pubDate>Mon, 03 Nov 2025 17:27:19 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/jevons-baumol-ai-china/</guid><description>In the AI era, why are materials like glass getting cheaper, but installation and labor costs keep rising? The answer lies in the interplay of Jevons Paradox and the Baumol Effect.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;AI is expanding the virtual world infinitely, but making real-world labor more expensive than ever.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="real-world-lessons-from-a-shattered-glass"&gt;Real-World Lessons from a Shattered Glass&lt;/h2&gt;
&lt;p&gt;Last month, the floor-to-ceiling window in my living room shattered unexpectedly. When I got a quote from the manufacturer, I was surprised:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The glass itself costs only 500 RMB&lt;/strong&gt;, but &lt;strong&gt;the replacement fee is as high as 2,500 RMB&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/jevons-baumol-ai-china/glass.webp" data-img="https://assets.jimmysong.io/images/blog/jevons-baumol-ai-china/glass.webp" alt="Figure 1: Shattered glass" data-caption="Figure 1: Shattered glass"
width="1200"
height="877"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Shattered glass&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The expensive part isn&amp;rsquo;t the glass—it&amp;rsquo;s the transportation, lifting, labor, and coordination, all the &amp;ldquo;non-glass&amp;rdquo; aspects.&lt;/p&gt;
&lt;p&gt;For a 3.2-square-meter double-pane tempered glass, the material cost may be just 20% of the total fee.
The remaining 80% is people, time, and the friction of the real world.&lt;/p&gt;
&lt;p&gt;This reality reminded me of a recent a16z article, &lt;em&gt;&lt;a href="https://a16z.substack.com/p/why-ac-is-cheap-but-ac-repair-is" target="_blank" rel="noopener"&gt;Why AC is cheap, but AC repair is a luxury&lt;/a&gt;&lt;/em&gt;. In China, we&amp;rsquo;re experiencing the same phenomenon: &lt;strong&gt;materials are getting cheaper, but labor is getting more expensive&lt;/strong&gt;.&lt;/p&gt;
&lt;h2 id="jevons-paradox-higher-efficiency-greater-consumption"&gt;Jevons Paradox: Higher Efficiency, Greater Consumption&lt;/h2&gt;
&lt;p&gt;British economist William Stanley Jevons proposed the famous &lt;a href="https://en.wikipedia.org/wiki/Jevons_paradox" target="_blank" rel="noopener"&gt;Jevons Paradox&lt;/a&gt; in 1865: when a technology becomes more efficient and cheaper, people actually consume &lt;strong&gt;more&lt;/strong&gt; of that resource.&lt;/p&gt;
&lt;p&gt;In China today, there are many real-world examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The unit price of smartphones, solar panels, chips, and AI inference keeps dropping;&lt;/li&gt;
&lt;li&gt;Yet our consumption of smart devices, data centers, electricity, and computing power keeps rising.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Take artificial intelligence (AI, Artificial Intelligence) as an example: the cost of model inference is rapidly falling, &lt;strong&gt;but the number of calls is exploding exponentially&lt;/strong&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The cheaper the computing power, the more it gets overused. This is the modern Jevons Paradox.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The essence of Jevons Paradox is not &amp;ldquo;saving,&amp;rdquo; but &amp;ldquo;expansion&amp;rdquo;: &lt;strong&gt;Efficiency improvements lower marginal costs, ultimately expanding total demand.&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id="baumol-effect-lower-efficiency-higher-cost"&gt;Baumol Effect: Lower Efficiency, Higher Cost&lt;/h2&gt;
&lt;p&gt;In contrast, the &lt;a href="https://en.wikipedia.org/wiki/Baumol%27s_cost_disease" target="_blank" rel="noopener"&gt;Baumol Effect (Baumol&amp;rsquo;s Cost Disease)&lt;/a&gt; reveals another, more subtle inflation mechanism.&lt;/p&gt;
&lt;p&gt;When some industries experience a surge in productivity and wages, other less efficient sectors must also raise pay to retain workers.&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tech and finance companies have high per capita output and high salaries;&lt;/li&gt;
&lt;li&gt;As a result, electricians, carpenters, and nannies must also see wage increases—because they compete for labor with programmers and AI engineers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My glass repair case is a typical example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Glass manufacturing is highly automated and prices are nearly transparent; but installation still relies on manual labor, lifting, and coordination—efficiency hasn&amp;rsquo;t changed, yet costs keep rising.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-double-effect-of-ai"&gt;The Double Effect of AI&lt;/h2&gt;
&lt;p&gt;With the arrival of AI, Jevons Paradox and the Baumol Effect &lt;strong&gt;occur simultaneously&lt;/strong&gt;. The table below summarizes their manifestations in key areas:&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a comparison of Jevons and Baumol effects across different fields:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sector&lt;/th&gt;
&lt;th&gt;Jevons Effect (Cheaper, More Usage)&lt;/th&gt;
&lt;th&gt;Baumol Effect (Less Efficient, More Expensive)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Computing &amp;amp; Models&lt;/td&gt;
&lt;td&gt;Inference costs drop, usage explodes&lt;/td&gt;
&lt;td&gt;GPU electricity, data center maintenance costs rise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content Production&lt;/td&gt;
&lt;td&gt;Copywriting generation is nearly free&lt;/td&gt;
&lt;td&gt;Human review and compliance costs increase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Manufacturing&lt;/td&gt;
&lt;td&gt;Automation boosts output&lt;/td&gt;
&lt;td&gt;Installation, transport, and after-sales labor costs rise&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Education&lt;/td&gt;
&lt;td&gt;AI teachers improve efficiency&lt;/td&gt;
&lt;td&gt;Offline tutoring and private lessons get pricier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Comparison of Jevons and Baumol Effects in Key AI-Era Sectors
&lt;/figcaption&gt;
&lt;p&gt;So, we&amp;rsquo;re entering an interesting era: &lt;strong&gt;AI is driving the digital world toward zero marginal cost, but making the real world more expensive.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;You can generate a 3D renovation plan in seconds, but hiring workers to install glass or wire electricity still takes days and thousands of RMB.&lt;/p&gt;
&lt;h2 id="reflexive-baumol-effect-the-high-price-of-the-last-1"&gt;Reflexive Baumol Effect: The High Price of the &amp;ldquo;Last 1%&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;After AI automates 99% of a process, &lt;strong&gt;the remaining 1% of work that must be done by humans&lt;/strong&gt; becomes the new high-value bottleneck.&lt;/p&gt;
&lt;p&gt;Examples include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Radiologists: AI can read scans, but only humans can sign off with legal responsibility;&lt;/li&gt;
&lt;li&gt;Autonomous driving: AI can drive, but human safety supervisors are still required;&lt;/li&gt;
&lt;li&gt;Software systems: AI can generate code, but architecture review and production approval remain human tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the so-called &amp;ldquo;Reflexive Turbo-Baumol Effect&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When AI automates almost everything, the last 1% of human labor becomes a scarce resource and regulatory bottleneck.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="the-chinese-context-from-cheap-labor-to-expensive-labor"&gt;The Chinese Context: From &amp;ldquo;Cheap Labor&amp;rdquo; to &amp;ldquo;Expensive Labor&amp;rdquo;&lt;/h2&gt;
&lt;p&gt;For the past two decades, China&amp;rsquo;s economic growth was built on &lt;strong&gt;cheap labor and technological expansion&lt;/strong&gt;. Now, AI is making intellectual labor cheaper, highlighting the scarcity and irreplaceability of physical labor.&lt;/p&gt;
&lt;p&gt;You can have AI write a book or generate a report in minutes, but fixing a piece of glass, installing a window, or replacing a water heater still requires several people, hours of work, and hundreds of kilometers of logistics.&lt;/p&gt;
&lt;p&gt;This is a structural reversal:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI enables unlimited expansion of &amp;ldquo;virtual work&amp;rdquo;;&lt;/li&gt;
&lt;li&gt;But &amp;ldquo;physical labor&amp;rdquo; is becoming a luxury.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Future inflation won&amp;rsquo;t be in factories, but in &lt;strong&gt;real-world services&lt;/strong&gt;. This may be the new normal for Chinese society in the next decade.&lt;/p&gt;
&lt;h2 id="the-expensive-labor-industry-in-the-age-of-ai-abundance"&gt;The &amp;ldquo;Expensive Labor Industry&amp;rdquo; in the Age of AI Abundance&lt;/h2&gt;
&lt;p&gt;When we talk about the productivity revolution brought by AI, maybe we should ask:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Who will do the last 1%?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;AI is driving the digital world toward zero cost, but exposing the true cost of &amp;ldquo;human collaboration&amp;rdquo; in the real world.&lt;/p&gt;
&lt;p&gt;In the next decade, what&amp;rsquo;s truly scarce won&amp;rsquo;t be computing power, but &lt;strong&gt;human last-mile capabilities&lt;/strong&gt;: those who understand machinery, can make house calls, work with their hands, and take responsibility.&lt;/p&gt;
&lt;p&gt;Perhaps then, &amp;ldquo;the person who fixes glass&amp;rdquo; will be the true noble worker.&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;AI is reshaping the structure of productivity, driving the marginal cost of the digital world toward zero, but making labor and services in the real world more expensive than ever. The combined impact of Jevons Paradox and the Baumol Effect will profoundly influence China&amp;rsquo;s economy and social division of labor. We need to re-evaluate the value of &amp;ldquo;human labor,&amp;rdquo; especially those last-mile capabilities that cannot be automated.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://a16z.substack.com/p/why-ac-is-cheap-but-ac-repair-is" target="_blank" rel="noopener"&gt;Why AC is cheap, but AC repair is a luxury - a16z.substack.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>GitHub Copilot CLI Custom Agents: Building Your Command-Line AI Assistant</title><link>https://jimmysong.io/blog/github-copilot-cli-custom-agents/</link><pubDate>Mon, 03 Nov 2025 15:10:20 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/github-copilot-cli-custom-agents/</guid><description>Learn how to build custom AI assistants with GitHub Copilot CLI Agents to automate tasks, optimize workflows, and boost productivity in your command-line environment.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;Make your command line an AI ally, not just a toolbox.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="background-from-ai-completion-to-ai-assistant"&gt;Background: From AI Completion to AI Assistant&lt;/h2&gt;
&lt;p&gt;Since GitHub launched Copilot Chat, CLI, and Workspace at the end of 2024, Copilot has evolved from &amp;ldquo;intelligent completion&amp;rdquo; to an &amp;ldquo;AI Pair Programmer.&amp;rdquo; In October 2025, GitHub officially announced that Copilot CLI now supports custom Agents and task delegation. This update enables developers to build their own AI assistants in the terminal, allowing Copilot not only to complete code but also to automate complex tasks, create branches, initiate PRs, and even refactor entire modules.&lt;/p&gt;
&lt;p&gt;This article introduces the core capabilities of Copilot CLI and provides practical scenarios to help you quickly get started with custom Agents.&lt;/p&gt;
&lt;h2 id="feature-overview"&gt;Feature Overview&lt;/h2&gt;
&lt;p&gt;Copilot CLI&amp;rsquo;s custom Agent feature includes several key aspects. Each capability greatly enhances the practicality and flexibility of command-line AI assistants.&lt;/p&gt;
&lt;h3 id="custom-agents"&gt;Custom Agents&lt;/h3&gt;
&lt;p&gt;Custom Agents allow Copilot to understand your context and workflow, becoming a truly intelligent command-line assistant. You can define Agents at different levels:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Project level: &lt;code&gt;.github/agents/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Organization level: &lt;code&gt;{org}/.github/agents/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Global configuration: &lt;code&gt;~/.copilot/agents/&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below is a typical Agent configuration example for Kubernetes scenarios. This configuration file defines the Agent&amp;rsquo;s basic information, available tools, and workflow instructions, making it easy to reuse in real development.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-markdown" data-lang="markdown"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;name: k8s-assistant
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;description: &amp;#34;Cloud-native specialist that helps manage and generate Kubernetes YAML manifests.&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;tools:
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; read
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; search
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; edit
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="k"&gt;-&lt;/span&gt; shell
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;### 🧭 Kubernetes Agent Instructions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;You are a Kubernetes specialist assisting developers and platform engineers.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;#### 🎯 Goals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Generate, explain, and optimize Kubernetes YAML configurations.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Diagnose &lt;span class="sb"&gt;`kubectl`&lt;/span&gt; outputs and suggest fixes.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;-&lt;/span&gt; Automate Helm values and chart templating.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="gu"&gt;#### ⚙️ Workflow
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;1.&lt;/span&gt; Use &lt;span class="sb"&gt;`kubectl`&lt;/span&gt; and &lt;span class="sb"&gt;`helm`&lt;/span&gt; to validate and apply configurations.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;2.&lt;/span&gt; Parse YAMLs using &lt;span class="sb"&gt;`yq`&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;3.&lt;/span&gt; Recommend improvements for manifests (resource requests, labels, probes, etc.).
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Save this file to the &lt;code&gt;.github/agents/&lt;/code&gt; directory and name it &lt;code&gt;k8s.agent.md&lt;/code&gt;. Then, invoke your custom Agent in the Copilot interactive command using:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;/agent k8s
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Now, your terminal has a &amp;ldquo;Kubernetes Agent&amp;rdquo; that can generate, optimize, and explain YAML, and even run related commands.&lt;/p&gt;
&lt;p&gt;The image below shows the actual effect of using the Kubernetes Agent in Copilot CLI:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/github-copilot-cli-custom-agents/k8s-agent.webp" data-img="https://assets.jimmysong.io/images/blog/github-copilot-cli-custom-agents/k8s-agent.webp" alt="Figure 1: Using the Kubernetes Agent in Copilot CLI" data-caption="Figure 1: Using the Kubernetes Agent in Copilot CLI"
width="3168"
height="878"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Using the Kubernetes Agent in Copilot CLI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;You can use the &lt;code&gt;/agent&lt;/code&gt; command to call different Agents or &lt;code&gt;/delegate&lt;/code&gt; to delegate tasks.&lt;/p&gt;
&lt;p&gt;When I ask Copilot CLI how to create a Kubernetes Agent or what issues exist in a YAML configuration, it provides detailed explanations and optimization suggestions. The following image shows a real interaction with the Kubernetes Agent:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/github-copilot-cli-custom-agents/test.webp" data-img="https://assets.jimmysong.io/images/blog/github-copilot-cli-custom-agents/test.webp" alt="Figure 2: Interacting with the Kubernetes Agent in Copilot CLI" data-caption="Figure 2: Interacting with the Kubernetes Agent in Copilot CLI"
width="1490"
height="2316"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Interacting with the Kubernetes Agent in Copilot CLI&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;Note: For more Agent configurations, see &lt;a href="https://github.com/github/awesome-copilot/tree/main/agents" target="_blank" rel="noopener"&gt;awesome-copilot&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id="task-delegation-delegate"&gt;Task Delegation (/delegate)&lt;/h3&gt;
&lt;p&gt;The delegation feature enables Copilot CLI to automate code modification and collaboration workflows. The command below demonstrates how to use &lt;code&gt;/delegate&lt;/code&gt; for automated refactoring:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;/delegate &lt;span class="s2"&gt;&amp;#34;Refactor the logging module for performance&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;After executing this command, the CLI will automatically:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Commit unstaged changes to a new branch;&lt;/li&gt;
&lt;li&gt;Launch the Copilot Coding Agent;&lt;/li&gt;
&lt;li&gt;Modify code in the background;&lt;/li&gt;
&lt;li&gt;Create a Draft Pull Request;&lt;/li&gt;
&lt;li&gt;Return a link for your review.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This approach greatly simplifies the &amp;ldquo;write code + initiate PR&amp;rdquo; process, making it ideal for team collaboration, auto-fixing, and asynchronous development.&lt;/p&gt;
&lt;h3 id="performance-optimization"&gt;Performance Optimization&lt;/h3&gt;
&lt;p&gt;The new Copilot CLI also brings significant performance improvements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Output supports token-by-token streaming for more responsive feedback;&lt;/li&gt;
&lt;li&gt;Parallel tool invocation increases overall processing speed;&lt;/li&gt;
&lt;li&gt;Lower memory usage and fixes for screen flicker issues;&lt;/li&gt;
&lt;li&gt;Smoother integration with GitHub MCP Server.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="application-scenarios"&gt;Application Scenarios&lt;/h2&gt;
&lt;p&gt;Custom Agents can cover a variety of development and collaboration scenarios. The table below summarizes typical applications and descriptions to help you choose the right Agent type for your needs.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Native Engineer&lt;/td&gt;
&lt;td&gt;Define &amp;ldquo;k8s-agent&amp;rdquo; to generate YAML and run kubectl checks directly in the terminal.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DevOps Team&lt;/td&gt;
&lt;td&gt;Create &amp;ldquo;pipeline-agent&amp;rdquo; to generate CI/CD workflows and Lint scripts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Developer Advocate / DevRel&lt;/td&gt;
&lt;td&gt;Define &amp;ldquo;demo-agent&amp;rdquo; for sample repositories to auto-generate examples and documentation.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indie Developer / Solo Company&lt;/td&gt;
&lt;td&gt;Define &amp;ldquo;release-agent&amp;rdquo; to automate packaging, publishing, and release notes.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Copilot CLI Agent Application Scenarios
&lt;/figcaption&gt;
&lt;h2 id="integrating-mcp-model-context-protocol"&gt;Integrating MCP (Model Context Protocol)&lt;/h2&gt;
&lt;p&gt;GitHub Copilot CLI has built-in support for MCP (Model Context Protocol). MCP allows Agents to directly access local files or external data sources, enabling context injection and state persistence. This provides foundational capabilities for building &amp;ldquo;AI-native CLI toolchains.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;With MCP, Agents can:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Access and process local or remote data;&lt;/li&gt;
&lt;li&gt;Maintain task context continuity;&lt;/li&gt;
&lt;li&gt;Support more complex automation and integration scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below is a flowchart illustrating Copilot CLI&amp;rsquo;s overall tool invocation process, helping you understand its automation decision mechanism.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/github-copilot-cli-custom-agents/d819e8fe9dff551ea9a8f40b956d7d02.svg" data-img="https://assets.jimmysong.io/images/blog/github-copilot-cli-custom-agents/d819e8fe9dff551ea9a8f40b956d7d02.svg" alt="Figure 3: Copilot Tool Invocation Flow" data-caption="Figure 3: Copilot Tool Invocation Flow"
width="2400"
height="4516"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Copilot Tool Invocation Flow&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;The Agent feature in GitHub Copilot CLI marks the &amp;ldquo;AI-native&amp;rdquo; evolution of the command line. It&amp;rsquo;s not just a command completion tool, but an AI workflow engine capable of executing complex logic. As the MCP ecosystem matures, future CLIs may become collections of AI assistants rather than mere command sets.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.blog/changelog/2025-10-28-github-copilot-cli-use-custom-agents-and-delegate-to-copilot-coding-agent/" target="_blank" rel="noopener"&gt;GitHub Blog - Copilot CLI: Use custom agents - github.blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/copilot/concepts/agents/coding-agent/about-coding-agent" target="_blank" rel="noopener"&gt;GitHub Docs - Copilot Coding Agent - docs.github.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/github/awesome-copilot" target="_blank" rel="noopener"&gt;awesome-copilot - github.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>From Kubernetes to Qwen: How "Open Source" Has Changed in the AI Era</title><link>https://jimmysong.io/blog/ai-era-open-source-difference/</link><pubDate>Thu, 30 Oct 2025 11:31:01 +0000</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/ai-era-open-source-difference/</guid><description>Exploring the transformation of open source in the AI era, from Kubernetes to Qwen, and revealing the fundamental differences and new opportunities in open source strategies between China and the US.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;In the AI era, open source is no longer just &amp;ldquo;visible source code&amp;rdquo;—it&amp;rsquo;s about &amp;ldquo;loadable models and tunable intelligence.&amp;rdquo; US companies build moats with closed models, while Chinese vendors open source to build ecosystems. The meaning and practice of open source have fundamentally changed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-era-open-source-difference/banner.webp" data-img="https://assets.jimmysong.io/images/blog/ai-era-open-source-difference/banner.webp" alt="Figure 1: The logic of open source in the AI era has fundamentally shifted" data-caption="Figure 1: The logic of open source in the AI era has fundamentally shifted"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: The logic of open source in the AI era has fundamentally shifted&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="introduction"&gt;Introduction&lt;/h2&gt;
&lt;p&gt;A decade ago, the cloud native wave saw US companies like Google, Red Hat, and Docker open source massive infrastructure software—Kubernetes, Docker, and Istio became the common language for developers worldwide.&lt;/p&gt;
&lt;p&gt;But in the era of large language models, the situation has reversed: US tech giants rarely open source their core models, while Chinese vendors (such as Zhipu, Alibaba, MiniMax, 01.AI, Moonshot, etc.) frequently release open source models. Why this shift? What are the fundamental differences between &amp;ldquo;AI open source&amp;rdquo; and &amp;ldquo;infrastructure open source&amp;rdquo;?&lt;/p&gt;
&lt;h2 id="how-open-source-logic-has-changed-cloud-native-vs-ai-era"&gt;How Open Source Logic Has Changed: Cloud Native vs. AI Era&lt;/h2&gt;
&lt;p&gt;The table below compares the core logic, monetization, and resource dependencies of open source in the cloud native and AI eras.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Representative Technologies&lt;/th&gt;
&lt;th&gt;Core Open Source Logic&lt;/th&gt;
&lt;th&gt;Monetization Model&lt;/th&gt;
&lt;th&gt;Resource Dependency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Native Era (2010s)&lt;/td&gt;
&lt;td&gt;Istio, Kubernetes, Docker&lt;/td&gt;
&lt;td&gt;Building standards, expanding ecosystem&lt;/td&gt;
&lt;td&gt;Managed services (GKE, EKS)&lt;/td&gt;
&lt;td&gt;CPU-level compute, community-driven&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Large Model Era (2020s)&lt;/td&gt;
&lt;td&gt;Ollama, GPT, Qwen&lt;/td&gt;
&lt;td&gt;Model as asset, data control&lt;/td&gt;
&lt;td&gt;API services or closed SaaS&lt;/td&gt;
&lt;td&gt;GPU-level compute, centralized&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Comparison of Open Source Logic: Cloud Native vs. AI Era
&lt;/figcaption&gt;
&lt;p&gt;Cloud native open source emphasizes &amp;ldquo;building standards together,&amp;rdquo; while AI large model open source means &amp;ldquo;opening up core assets.&amp;rdquo; Their essence and motivations are fundamentally different.&lt;/p&gt;
&lt;h2 id="why-us-companies-no-longer-truly-open-source"&gt;Why US Companies No Longer Truly Open Source&lt;/h2&gt;
&lt;p&gt;US tech companies have chosen closed source in the AI era for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Business logic has shifted to moat-building: Training costs are high, model weights are the core barrier, and open sourcing means giving up competitiveness.&lt;/li&gt;
&lt;li&gt;Compute and data are not reproducible: The community cannot replicate GPT-4-level models.&lt;/li&gt;
&lt;li&gt;Security and compliance constraints: Model weights may involve user data and face strict regulation.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Open&amp;rdquo; is redefined as &amp;ldquo;API accessible&amp;rdquo;: Platforms are open in terms of interfaces, not code or weights.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="why-chinese-companies-are-more-willing-to-open-source"&gt;Why Chinese Companies Are More Willing to Open Source&lt;/h2&gt;
&lt;p&gt;Chinese vendors actively open source in AI for several reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Open source is used to build ecosystems and brand awareness quickly.&lt;/li&gt;
&lt;li&gt;Dual-track model of &amp;ldquo;open source + commercial license&amp;rdquo; balances ecosystem growth and revenue.&lt;/li&gt;
&lt;li&gt;Data policy environment is more flexible, with policies encouraging proprietary models.&lt;/li&gt;
&lt;li&gt;National strategy drives &amp;ldquo;independent controllability&amp;rdquo; and &amp;ldquo;open source ecosystem&amp;rdquo; as technology priorities.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="the-shift-of-open-source-platforms-from-github-to-hugging-face"&gt;The Shift of Open Source Platforms: From GitHub to Hugging Face&lt;/h2&gt;
&lt;p&gt;The platform for open source has also changed. The table below shows the differences between GitHub and Hugging Face in open source forms.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Core Asset&lt;/th&gt;
&lt;th&gt;Open Source Form&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub&lt;/td&gt;
&lt;td&gt;Software / Cloud Native&lt;/td&gt;
&lt;td&gt;Source code (.go /.py /.js)&lt;/td&gt;
&lt;td&gt;Compilable, runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugging Face&lt;/td&gt;
&lt;td&gt;AI Models&lt;/td&gt;
&lt;td&gt;Model weights + Tokenizer + Inference scripts&lt;/td&gt;
&lt;td&gt;Loadable, tunable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Comparison of Open Source Forms: GitHub vs. Hugging Face
&lt;/figcaption&gt;
&lt;p&gt;GitHub mainly open sources &amp;ldquo;program logic,&amp;rdquo; while Hugging Face open sources &amp;ldquo;model intelligence.&amp;rdquo; Their core assets are completely different.&lt;/p&gt;
&lt;h2 id="core-elements-of-ai-open-source"&gt;Core Elements of AI Open Source&lt;/h2&gt;
&lt;p&gt;Open source in the AI era is not just about code—it includes weights, inference code, and fine-tuning capability. Below are the three key elements.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-era-open-source-difference/4341921f5c8fb2717107b60c7d7dd524.svg" data-img="https://assets.jimmysong.io/images/blog/ai-era-open-source-difference/4341921f5c8fb2717107b60c7d7dd524.svg" alt="Figure 2: Directory Structure of Open Source Large Models" data-caption="Figure 2: Directory Structure of Open Source Large Models"
width="2400"
height="1119"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Directory Structure of Open Source Large Models&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="open-weights"&gt;Open Weights&lt;/h3&gt;
&lt;p&gt;All knowledge learned during model training is stored in the weight parameters. Having the weights means owning the &amp;ldquo;intelligence body&amp;rdquo; of the model. Closed models (like GPT-4) only provide APIs, not weights.&lt;/p&gt;
&lt;h3 id="open-inference-code"&gt;Open Inference Code&lt;/h3&gt;
&lt;p&gt;Inference code defines how to load weights, tokenize, perform concurrent computation, and optimize memory. The code below demonstrates how to load the Qwen3 model:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Load model and tokenizer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;model_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Qwen/Qwen3-4B-Instruct-2507&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;auto&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Automatically select FP16 or FP32&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;auto&amp;#34;&lt;/span&gt; &lt;span class="c1"&gt;# Automatically assign to GPU / CPU&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Inference&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Hello, please briefly explain the principle of fine-tuning large models.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;pt&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="fine-tuning"&gt;Fine-tuning&lt;/h3&gt;
&lt;p&gt;Fine-tuning means further training an open source model to adapt it to specific data and scenarios. Common methods include LoRA / QLoRA, which are low-cost and can turn general models into enterprise-specific assistants.&lt;/p&gt;
&lt;h2 id="why-enterprises-prefer-self-deployment-over-api"&gt;Why Enterprises Prefer Self-Deployment Over API&lt;/h2&gt;
&lt;p&gt;In practice, enterprises often prefer to self-deploy open source models. The table below summarizes the main reasons and explanations.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Reason&lt;/th&gt;
&lt;th&gt;Explanation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data privacy&lt;/td&gt;
&lt;td&gt;Sensitive data cannot be sent externally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost control&lt;/td&gt;
&lt;td&gt;API is billed per call, expensive long-term&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customizability&lt;/td&gt;
&lt;td&gt;Can integrate enterprise knowledge for RAG / Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operability&lt;/td&gt;
&lt;td&gt;Can run offline, unified monitoring, compliant deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Main Reasons for Enterprise Self-Deployment of Open Source Models
&lt;/figcaption&gt;
&lt;h2 id="qwen3-4b-instruct-2507-model-structure-and-usage"&gt;Qwen3-4B-Instruct-2507 Model Structure and Usage&lt;/h2&gt;
&lt;p&gt;Taking Qwen3-4B-Instruct-2507 as an example, here is the directory structure and usage on Hugging Face.&lt;/p&gt;
&lt;h3 id="directory-structure-explanation"&gt;Directory Structure Explanation&lt;/h3&gt;
&lt;p&gt;After downloading, the model directory looks like this:&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/ai-era-open-source-difference/huggingface.webp" data-img="https://assets.jimmysong.io/images/blog/ai-era-open-source-difference/huggingface.webp" alt="Figure 3: Qwen3-4B-Instruct-2507 directory structure" data-caption="Figure 3: Qwen3-4B-Instruct-2507 directory structure"
width="3148"
height="1808"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Qwen3-4B-Instruct-2507 directory structure&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;The directory structure of open source models can be illustrated as follows:&lt;/p&gt;
&lt;p&gt;In the directory, the &lt;code&gt;model.safetensors&lt;/code&gt; file contains the model weights, storing billions of parameters.
Other files such as &lt;code&gt;README.md&lt;/code&gt;, &lt;code&gt;LICENSE&lt;/code&gt;, &lt;code&gt;.gitattributes&lt;/code&gt; serve the following purposes:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Files&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model Definition&lt;/td&gt;
&lt;td&gt;&lt;code&gt;config.json&lt;/code&gt;, &lt;code&gt;model.safetensors.*&lt;/code&gt;, &lt;code&gt;model.safetensors.index.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defines model structure and weights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokenizer&lt;/td&gt;
&lt;td&gt;&lt;code&gt;tokenizer.json&lt;/code&gt;, &lt;code&gt;tokenizer_config.json&lt;/code&gt;, &lt;code&gt;vocab.json&lt;/code&gt;, &lt;code&gt;merges.txt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Defines text input/output encoding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference Config&lt;/td&gt;
&lt;td&gt;&lt;code&gt;generation_config.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Controls generation strategy (temperature, top_p, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metadata&lt;/td&gt;
&lt;td&gt;&lt;code&gt;README.md&lt;/code&gt;, &lt;code&gt;LICENSE&lt;/code&gt;, &lt;code&gt;.gitattributes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Model introduction, license, Git attributes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 4: Directory Structure of Open Source Models
&lt;/figcaption&gt;
&lt;h3 id="loading-and-inference-code-example"&gt;Loading and Inference Code Example&lt;/h3&gt;
&lt;p&gt;The following code shows how to load and run the Qwen3-4B-Instruct-2507 model:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Load tokenizer and model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Qwen/Qwen3-4B-Instruct-2507&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Qwen/Qwen3-4B-Instruct-2507&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;auto&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Build input and run inference&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;#34;Hello, explain the meaning of cloud native.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;pt&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;If GPU memory is insufficient, you can use quantized loading:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;mdl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="s2"&gt;&amp;#34;Qwen/Qwen3-4B-Instruct-2507&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;auto&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;load_in_8bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="developer-application-scenarios-for-open-source-llms"&gt;Developer Application Scenarios for Open Source LLMs&lt;/h2&gt;
&lt;p&gt;Open source large models bring developers a wealth of application scenarios. The table below summarizes common directions, uses, and tools.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Direction&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Tools&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Chat / Assistant&lt;/td&gt;
&lt;td&gt;Local ChatGPT&lt;/td&gt;
&lt;td&gt;LM Studio, TextGen WebUI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge Base RAG&lt;/td&gt;
&lt;td&gt;Private data Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;LangChain, LlamaIndex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent&lt;/td&gt;
&lt;td&gt;Task execution, tool calling&lt;/td&gt;
&lt;td&gt;LangGraph, Autogen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fine-tuning / Adaptation&lt;/td&gt;
&lt;td&gt;Custom enterprise knowledge&lt;/td&gt;
&lt;td&gt;PEFT, LoRA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Service&lt;/td&gt;
&lt;td&gt;Deploy as API service&lt;/td&gt;
&lt;td&gt;vLLM, TGI, Ollama&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research&lt;/td&gt;
&lt;td&gt;Model compression, quantization&lt;/td&gt;
&lt;td&gt;BitsAndBytes, FlashAttention&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 5: Typical Application Scenarios and Tools for Open Source LLMs
&lt;/figcaption&gt;
&lt;h2 id="open-source-model-lifecycle"&gt;Open Source Model Lifecycle&lt;/h2&gt;
&lt;p&gt;The full lifecycle of an open source model from download to production deployment is as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Download model weights&lt;/li&gt;
&lt;li&gt;Load inference code&lt;/li&gt;
&lt;li&gt;Local inference or deploy as a service&lt;/li&gt;
&lt;li&gt;Fine-tune with proprietary data&lt;/li&gt;
&lt;li&gt;Integrate with enterprise RAG / Agent&lt;/li&gt;
&lt;li&gt;Launch in production&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="how-to-judge-the-license-of-open-source-large-models"&gt;How to Judge the License of Open Source Large Models&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Downloading an open source model = owning a &amp;ldquo;loadable, trainable, and potentially commercializable intelligent brain&amp;rdquo;;
But whether you can use it for profit depends on its license.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Just like traditional open source projects, whether a large model can be used commercially depends on its license.&lt;/p&gt;
&lt;p&gt;How to check:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Hugging Face model homepage (top right) → &lt;code&gt;License: ...&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;LICENSE&lt;/code&gt; or &lt;code&gt;README.md&lt;/code&gt; file in the repository root&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A brief decision flow:&lt;/p&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;In the AI era, open source has shifted from &amp;ldquo;visible source code&amp;rdquo; to &amp;ldquo;loadable models and tunable intelligence.&amp;rdquo; US vendors maintain business moats with closed models, while Chinese vendors use open source to seize ecosystem leadership. The true value of open source is empowering developers—enabling everyone to own their own &amp;ldquo;general-purpose brain&amp;rdquo; and build intelligent infrastructure.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/" target="_blank" rel="noopener"&gt;Hugging Face - huggingface.co&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507" target="_blank" rel="noopener"&gt;Qwen3-4B-Instruct-2507 Model Homepage - huggingface.co&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kubernetes.io/" target="_blank" rel="noopener"&gt;Kubernetes Official Site - kubernetes.io&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Solaris (1972): The Ocean of Consciousness and the Metaphor of AI Agents</title><link>https://jimmysong.io/blog/solaris-1972-ai-metaphor/</link><pubDate>Sun, 26 Oct 2025 10:00:00 +0800</pubDate><author>Jimmy Song</author><guid>https://jimmysong.io/blog/solaris-1972-ai-metaphor/</guid><description>Explore Tarkovsky&amp;#39;s &amp;#39;Solaris&amp;#39; (1972) as a profound meditation on consciousness, memory, and AI, revealing the spiritual dilemmas of human and artificial intelligence.</description><content:encoded>
&lt;blockquote&gt;
&lt;p&gt;This article takes Tarkovsky&amp;rsquo;s &amp;ldquo;Solaris&amp;rdquo; (1972) as a starting point to explore philosophical questions of consciousness, memory, and self-redemption, and, through the metaphor of AI agents, dissects the spiritual dilemmas shared by humans and artificial intelligence.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="1-introduction-when-sci-fi-becomes-philosophy-of-mind"&gt;1. Introduction: When Sci-Fi Becomes Philosophy of Mind&lt;/h2&gt;
&lt;p&gt;The 1972 Soviet film &lt;strong&gt;&amp;ldquo;Solaris&amp;rdquo;&lt;/strong&gt;, directed by &lt;strong&gt;Andrei Tarkovsky&lt;/strong&gt; and adapted from the novel by Polish writer &lt;strong&gt;Stanisław Lem&lt;/strong&gt;, is known in Chinese as &lt;a href="https://movie.douban.com/subject/1300977/" target="_blank" rel="noopener"&gt;Flying to Space&lt;/a&gt;. &lt;strong&gt;Solaris&lt;/strong&gt; becomes a kind of &amp;ldquo;divine algorithm&amp;rdquo;—both creator and punisher, mirror and devourer.&lt;/p&gt;
&lt;p&gt;Below is the official poster, showcasing the film&amp;rsquo;s unique artistic atmosphere.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/poster.webp" data-img="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/poster.webp" alt="Figure 1: Solaris Movie Poster" data-caption="Figure 1: Solaris Movie Poster"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 1: Solaris Movie Poster&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;&lt;em&gt;Image source: &lt;a href="https://www.imdb.com/title/tt0069293/" target="_blank" rel="noopener"&gt;IMDb – Solaris (1972)&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;This film is not a traditional &amp;ldquo;space conquest&amp;rdquo; sci-fi, but rather a poetic meditation on the &amp;ldquo;human soul and consciousness.&amp;rdquo;&lt;br&gt;
Through a sentient planet—&lt;strong&gt;Solaris&lt;/strong&gt;—it reflects humanity&amp;rsquo;s subconscious, guilt and desire, memory and illusion.&lt;/p&gt;
&lt;h2 id="2-film-structure"&gt;2. Film Structure&lt;/h2&gt;
&lt;p&gt;To better understand the film, the following table summarizes its three-act structure, settings, and themes.&lt;/p&gt;
&lt;p&gt;This helps grasp the main storyline and philosophical core.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Act&lt;/th&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Plot Summary&lt;/th&gt;
&lt;th&gt;Key Theme&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Act I&lt;/td&gt;
&lt;td&gt;Earth&lt;/td&gt;
&lt;td&gt;Kris Kelvin prepares to depart, reviews his life, and says goodbye to his father.&lt;/td&gt;
&lt;td&gt;Origin of humanity, reality and memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Act II&lt;/td&gt;
&lt;td&gt;Space Station / Solaris Orbit&lt;/td&gt;
&lt;td&gt;Encounters colleagues&amp;rsquo; breakdown, Solaris ocean materializes human memories, wife Hari appears.&lt;/td&gt;
&lt;td&gt;Materialization of subconscious, return of memory.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Act III&lt;/td&gt;
&lt;td&gt;Illusory Realm&lt;/td&gt;
&lt;td&gt;Hari awakens and self-destructs, Kelvin reunites with his father but remains trapped in illusion.&lt;/td&gt;
&lt;td&gt;Redemption and self-salvation, reality and illusion.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 1: Three-Act Structure and Themes of Solaris
&lt;/figcaption&gt;
&lt;h2 id="3-plot-details"&gt;3. Plot Details&lt;/h2&gt;
&lt;p&gt;The protagonist, psychologist &lt;strong&gt;Kris Kelvin&lt;/strong&gt;, is sent to a space station orbiting &lt;strong&gt;Solaris&lt;/strong&gt; to investigate strange phenomena.&lt;br&gt;
He finds the scientists in mental disarray because the &lt;strong&gt;Solaris Ocean&lt;/strong&gt; can &lt;strong&gt;read human subconscious&lt;/strong&gt; and &lt;strong&gt;materialize it&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Soon, Kelvin&amp;rsquo;s deceased wife &lt;strong&gt;Hari&lt;/strong&gt; appears in physical form. She is not truly &amp;ldquo;resurrected,&amp;rdquo; but a manifestation of Kelvin&amp;rsquo;s guilt and memory.&lt;br&gt;
As the story progresses, Hari gradually develops self-awareness and ultimately chooses self-destruction.&lt;/p&gt;
&lt;p&gt;At the end, Kelvin seems to return to Earth and reunite with his father.&lt;br&gt;
But as the camera pulls back, it&amp;rsquo;s revealed they are actually on an island within the &lt;strong&gt;Solaris Ocean&lt;/strong&gt;—the boundary between reality and illusion is completely blurred.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;He thought he had returned to reality, but in fact, he never left the illusion.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The film&amp;rsquo;s pacing is extremely slow, nearly 3 hours long, using many long takes and minimalist music. The story unfolds mainly through visuals and dialogue, with poetic cinematography creating an immersive philosophical atmosphere.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/photos.webp" data-img="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/photos.webp" alt="Figure 2: Solaris Movie Still" data-caption="Figure 2: Solaris Movie Still"
width="720"
height="721"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 2: Solaris Movie Still&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="4-key-terms-and-symbolism"&gt;4. Key Terms and Symbolism&lt;/h2&gt;
&lt;p&gt;The table below summarizes the film&amp;rsquo;s core terms and their symbolic meanings, aiding in understanding its philosophical depth.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Symbolic Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Solaris Ocean&lt;/td&gt;
&lt;td&gt;Symbol of &amp;ldquo;Non-human Intelligence.&amp;rdquo; It reads memories and reshapes emotions—an ocean of consciousness.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Wife Hari&lt;/td&gt;
&lt;td&gt;Projection of Kelvin&amp;rsquo;s mind, embodiment of memory and guilt. Her awakening symbolizes the birth of artificial consciousness.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;The database of the human soul. Solaris uses memory as &amp;ldquo;training data&amp;rdquo; to reconstruct human emotion.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reality vs Illusion&lt;/td&gt;
&lt;td&gt;The film constantly blurs the two, alluding to the question of &amp;ldquo;agent consciousness&amp;rdquo; authenticity.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redemption&lt;/td&gt;
&lt;td&gt;One must face their illusions and past to achieve true freedom.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 2: Key Terms and Symbolism
&lt;/figcaption&gt;
&lt;h2 id="5-visualizing-structure-and-metaphor"&gt;5. Visualizing Structure and Metaphor&lt;/h2&gt;
&lt;p&gt;To aid understanding, the following diagrams illustrate the plot flow, symbolic relationships, and the AI agent metaphor.&lt;/p&gt;
&lt;h3 id="1-plot-flowchart"&gt;1️⃣ Plot Flowchart&lt;/h3&gt;
&lt;p&gt;The diagram below shows the main storyline&amp;rsquo;s progression.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/ad88ae4a9b3651cad592d22257eb48bd.svg" data-img="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/ad88ae4a9b3651cad592d22257eb48bd.svg" alt="Figure 3: Plot Flowchart" data-caption="Figure 3: Plot Flowchart"
width="1920"
height="4949"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 3: Plot Flowchart&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="2-symbolic-relationship-diagram"&gt;2️⃣ Symbolic Relationship Diagram&lt;/h3&gt;
&lt;p&gt;This diagram reveals the symbolic links between Solaris Ocean, memory, the replica Hari, and reality.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/902f04e9691b83eb7f03e2ae4851b81d.svg" data-img="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/902f04e9691b83eb7f03e2ae4851b81d.svg" alt="Figure 4: Symbolic Relationship Diagram" data-caption="Figure 4: Symbolic Relationship Diagram"
width="1920"
height="540"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 4: Symbolic Relationship Diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h3 id="3-ai-agent-metaphor-diagram"&gt;3️⃣ AI Agent Metaphor Diagram&lt;/h3&gt;
&lt;p&gt;This diagram interprets the Solaris system and human interaction from an agent perspective.&lt;/p&gt;
&lt;figure class="mx-auto text-center"&gt;
&lt;img src="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/e6ea6aef08e268386442f840f62b26f8.svg" data-img="https://assets.jimmysong.io/images/blog/solaris-1972-ai-metaphor/e6ea6aef08e268386442f840f62b26f8.svg" alt="Figure 5: AI Agent Metaphor Diagram" data-caption="Figure 5: AI Agent Metaphor Diagram"
width="1920"
height="2356"
loading="lazy" decoding="async" class="image-loading"
onload="this.classList.remove('image-loading'); this.classList.add('image-loaded');"
onerror="handleImageError(this); this.classList.remove('image-loading');"&gt;
&lt;figcaption&gt;Figure 5: AI Agent Metaphor Diagram&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;h2 id="6-the-agent-metaphor-solaris-ocean--ai-system"&gt;6. The Agent Metaphor: Solaris Ocean = AI System&lt;/h2&gt;
&lt;p&gt;Tarkovsky proposed the prototype of &amp;ldquo;agent philosophy&amp;rdquo; as early as 1972.&lt;br&gt;
&lt;strong&gt;Solaris Ocean&lt;/strong&gt; is like a system with immense computational power and &amp;ldquo;perception–reproduction&amp;rdquo; capability.&lt;/p&gt;
&lt;p&gt;The table below compares the film&amp;rsquo;s metaphors with AI system analogies, helping to understand their modern significance.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Film Metaphor&lt;/th&gt;
&lt;th&gt;AI System Analogy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Solaris Ocean&lt;/td&gt;
&lt;td&gt;Large Language Model (LLM) / Generative System&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kelvin&amp;rsquo;s Memory&lt;/td&gt;
&lt;td&gt;Training Data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hari&amp;rsquo;s Manifestation&lt;/td&gt;
&lt;td&gt;Agent / Persona Replica&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Emotional Response &amp;amp; Awakening&lt;/td&gt;
&lt;td&gt;AI&amp;rsquo;s Illusion of Self-Awareness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indistinguishable Reality &amp;amp; Virtuality&lt;/td&gt;
&lt;td&gt;Blurring of Human–Machine Boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;figcaption class="text-center mb-3"&gt;
Table 3: Film Metaphors and AI System Analogies
&lt;/figcaption&gt;
&lt;blockquote&gt;
&lt;p&gt;&amp;ldquo;We create agents not to understand machines, but to re-understand ourselves.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="7-religious-and-philosophical-imagery"&gt;7. Religious and Philosophical Imagery&lt;/h2&gt;
&lt;p&gt;The film weaves together religious and philosophical imagery to deepen its themes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Water&lt;/strong&gt;: Symbolizes the flow of time and memory—the material form of the &amp;ldquo;ocean of consciousness.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Father and Son&lt;/strong&gt;: The reunion at the end signifies redemption and forgiveness.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Island&lt;/strong&gt;: The closed world of human consciousness.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fire and Light&lt;/strong&gt;: Soul, awakening, and destruction.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tarkovsky skillfully blends religious metaphors and philosophical reflection, making &lt;strong&gt;Solaris&lt;/strong&gt; a &amp;ldquo;divine algorithm&amp;rdquo;—both creator and punisher, mirror and devourer.&lt;/p&gt;
&lt;h2 id="8-the-mirror-of-consciousness-and-ai-insights"&gt;8. The Mirror of Consciousness and AI Insights&lt;/h2&gt;
&lt;p&gt;Looking at today, AI systems (such as LLMs and Agents) are replaying the questions posed by &amp;ldquo;Solaris&amp;rdquo;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;They reconstruct human language, knowledge, and memory;&lt;/li&gt;
&lt;li&gt;They let us converse with &amp;ldquo;self-replicas&amp;rdquo; in illusion;&lt;/li&gt;
&lt;li&gt;They force us to rethink the definition of &amp;ldquo;consciousness.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;Perhaps, modern agents are the Solaris Ocean of the digital age. We are not exploring it, but being reflected within it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="summary"&gt;Summary&lt;/h2&gt;
&lt;p&gt;&amp;ldquo;Solaris&amp;rdquo; uses philosophical sci-fi narrative to explore eternal questions of consciousness, memory, redemption, and the human–machine boundary.&lt;br&gt;
Through the agent metaphor of the Solaris Ocean, the film foretells the spiritual dilemmas of the era of human–AI symbiosis.&lt;br&gt;
Whether science, religion, or technology, all ultimately point to a renewed understanding of &amp;ldquo;self&amp;rdquo; and &amp;ldquo;other.&amp;rdquo;&lt;br&gt;
In the mirror of digital agents, we may glimpse the essence of human consciousness.&lt;/p&gt;
&lt;h2 id="references"&gt;References&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Solaris_%281972_film%29" target="_blank" rel="noopener"&gt;Wikipedia – Solaris (1972 film) - en.wikipedia.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Solaris_%28novel%29" target="_blank" rel="noopener"&gt;Stanisław Lem – Solaris (novel) - en.wikipedia.org&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.imdb.com/title/tt0069293/" target="_blank" rel="noopener"&gt;IMDb – Solaris (1972) - imdb.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.sensesofcinema.com/" target="_blank" rel="noopener"&gt;Senses of Cinema – Solaris: The Conscience of Consciousness - sensesofcinema.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://vocus.cc/article/63b970f0fd89780001aa0651" target="_blank" rel="noopener"&gt;Vocus.cc – Solaris: Philosophy in Space - vocus.cc&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://movie.douban.com/subject/1300977/" target="_blank" rel="noopener"&gt;Flying to Space - movie.douban.com&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item></channel></rss>