From Cloud-Native to AI-Native: A Future-Oriented Architecture Methodology → Read “AI Native Infrastructure”

System Diagnosis Principles: Criteria for Health Status

To maintain the long-term healthy evolution of AI infrastructure, post-mortem summaries are far from sufficient. We need a set of system diagnosis principles to detect hidden risks early and correct deviations.

Based on the Yin-Yang Five Elements Yun model, diagnosis can be conducted from the following five dimensions:

Five-Dimensional Diagnosis Framework

Figure 1: Five-Dimensional Diagnosis Framework Diagram
Figure 1: Five-Dimensional Diagnosis Framework Diagram

Five Elements Balance Check

Assess the current status of five aspects: Data (Water), Models (Wood), Compute (Fire), Platform (Earth), and Hardware (Metal).

Diagnosis Method

Checklist:

  • Can data pipelines keep up with demands? (Water)
  • Are model capabilities fully utilized? (Wood)
  • Are compute resources effectively used? (Fire)
  • Can the platform support current load? (Earth)
  • Is hardware becoming a bottleneck? (Metal)

Identify Problems

Problem TypeManifestationSolution
Short BoardOne element significantly weaker than othersPrioritize strengthening that element
OverloadOne element consumes excessive resources or frequently becomes a bottleneckIntroduce limits or expand other elements to share pressure
Table 1: Problem Types and Solutions

Typical Symptoms

  • Water Level Too Low: Data pipelines always lag behind training needs → Replenish data processing capacity
  • Metal Overload: Hardware often runs at full capacity or even triggers limit alarms → Expand capacity or impose constraints on upper layers

Most failures do not stem from missing components, but from long-term role imbalance

Qi Flow Smoothness Check

Analyze whether Qi flows smoothly through the system via full-link monitoring.

Diagnosis Method

Key Metrics:

  • Latency distribution of key processes
  • Queue backlogs
  • Resource utilization curves

Qi Smooth vs. Qi Not Smooth

StateCharacteristics
Qi SmoothProcessing rates across stages basically match, without long-term backlogs or idle resources
Qi Not SmoothOne stage remains a bottleneck for long periods, or large amounts of resources sit idle
Table 2: Qi Flow: Smooth vs Obstructed

Diagnosis Points

Distinguish temporary fluctuations from persistent trends: brief peaks don’t necessarily indicate Qi blockage, but persistent deviations must be addressed

Tool Support:

  • Dashboards and automated alerts
  • Timely capture of “stagnant Qi” locations
  • Further investigation of causes (which Five Elements imbalance corresponds)

Yin-Yang Dynamics Check

Assess whether current strategy and state are Yang Excess Yin Deficiency or Yin Excess Yang Deficiency.

Diagnosis Method

Qualitative Analysis:

  • Look at whether recent architecture decisions overly favor one extreme
  • Have you been continuously expanding and adding new features while ignoring stability?
  • Or conversely, multiple layers of approval and strict constraints but lack innovation momentum?

Quantitative Metrics:

MetricYang ExcessYin Excess
Change FrequencyExtremely highExtremely low
Incident RateFrequentExtremely low but no change
Release RhythmContinuousLong-term stagnation
Table 3: Yin-Yang Status

Balance Strategy

StateSymptomsSolution
Yang Excess Yin DeficiencyFrequent changes with frequent incidentsPause releases, focus on addressing hazards (replenish Yin)
Yin Excess Yang DeficiencyLong-term no change and stagnationIntroduce challenges and innovation (add Yang)
Table 4: Balance Strategies

Yun Alignment Check

Determine whether the organization’s actions match the system’s current stage, preventing counter-Yun operation.

Diagnosis Method

Combine Business Development and Technical Maturity:

Error PatternManifestationConsequences
Premature StandardizationSpending 大量精力 on process management and cost optimization for emerging projectsThese are typically scale stage concerns, but the project is still in exploration stage
Counter-Yun ExplorationFrequently changing underlying architecture for widely used platforms without rigorous testingInconsistent with scaling stage
Table 5: Error Patterns

Stage-Strategy Reference Table

StageShould Focus OnShould Not Do
Exploration StageDiversity, flexibility, rapid trial and errorPremature pursuit of efficiency
Platform StageStandardization, process normsFrequent arbitrary changes
Scale StageOptimization, stability, efficiencyStill growing wildly
Rebalancing StageTransformation, breakthrough, innovationClinging to the past
Table 6: Stage-Strategy Mapping

Checklist:

  • Which stage are we currently in?
  • Do our actions match the stage?
  • Do we need to adjust strategy?

When discovering actions don’t match the stage, immediately adjust strategy to avoid working at cross-purposes

Yang Runaway Warning

Pay special attention to whether there are signs of Yang state runaway in the system.

What is Yang Runaway?

Exponential explosion or collapse risk caused by unconstrained positive feedback.

Typical Scenarios

ScenarioMechanismRisk
Service Call Volume SurgeBug or abuse → Resource strain → Queuing and retry storms → Further increase in callsResource exhaustion
Training Task Self-ReplicationTasks unlimitedly self-replicate to accelerate → Cluster resource exhaustionSystem collapse
Table 7: Typical Scenarios

Diagnosis Signals

  • A metric shows exponential explosive growth
  • Lack of slowing mechanisms
  • Formation of vicious cycles

Response Strategy

StrategyMeansEffect
Establish Hard LimitsMetal’s constraintsImmediate shutdown
Introduce Negative FeedbackEarth’s governance (rate limiting, quotas)Braking and deceleration
Break Positive Feedback ChainActivate emergency planPull back to steady state
Table 8: Response Strategies

When discovering a metric showing exponential explosive growth without slowing mechanisms, intervene immediately

Diagnosis Implementation Process

Regular Diagnosis Mechanism

Recommend establishing a periodic diagnosis process:

Figure 2: Regular Diagnosis Mechanism Flowchart
Figure 2: Regular Diagnosis Mechanism Flowchart

Diagnosis Meeting Agenda

Fixed Session of Weekly Operations Review Meeting:

  • Check Five Elements scores for each module
  • Browse global Qi flow diagram
  • Analyze Yin-Yang dynamics
  • Discuss current Yun

This systematic examination makes hidden risks 无处遁形,thus achieving prevention before problems occur

Diagnosis Action Matrix

Diagnosis ResultAction Recommendation
Five Elements: One Element Too WeakConcentrate resources to strengthen the weakness
Five Elements: One Element OverloadedExpand capacity or introduce constraints
Qi Stagnation at One StageClear bottlenecks, optimize processes
Yang Excess Yin DeficiencyStrengthen governance and stability mechanisms
Yin Excess Yang DeficiencyActivate innovation and boost vitality
Counter-Yun OperationAdjust strategy and go with the flow
Yang Runaway WarningImmediate intervention, break positive feedback
Table 9: Diagnosis Action Matrix

Summary

Through the above diagnosis principles, architects and operations teams can periodically take the pulse of infrastructure like TCM pulse diagnosis.

When diagnosis indicates imbalance in some aspect, immediately prescribe remedy based on the theory: replenish what needs replenishing, purge what needs purging.

Long-term adherence will keep the system on a healthy evolutionary trajectory.

Created on Feb 10, 2026 Updated on Feb 10, 2026 966 words about 2 Minute

Submit Corrections/Suggestions