Read: From using AI to building AI systems, a defining note on what I’m exploring.

Site Glossary

Site Glossary

A single-page index of cloud native and AI terminology with fast lookup and grouping.

386 Terms
25 Groups

Terms are the doorway into complex systems, not the end of learning.

This page consolidates the site’s key concepts and unique terminology for fast browsing and cross-reference.

386 results

A

18

A2A

Agent-to-Agent, the collaboration and communication pattern between agents.

A2L

Agent-to-LLM, the interaction between agents and language models.

A2T

Agent-to-Tool, the ability of an agent to call external tools.

AARRR

Acquisition-Activation-Retention-Revenue-Referral funnel model.

Agent

An entity or software component that perceives its environment and acts to achieve goals.

Agentic AI

AI that can plan, act, and use tools autonomously.

Agentic Runtime

The execution environment that supports agents.

Agentic web

A web of interacting agents.

AGI

Artificial General Intelligence, a type of artificial intelligence (AI) that matches or exceeds

AI Agent

An autonomous entity that perceives and acts to achieve goals.

Ambient Mesh

A sidecar-less service mesh mode in Istio that implements traffic management through node-level

Apache-2.0

Permissive license with patent grant and notice requirements.

API Gateway

A server that acts as an API front-end, receiving API requests, enforcing throttling and security

At-least-once

Delivery semantics that may duplicate, often with idempotency.

Attention Mechanism

A mechanism that allows models to focus on important parts of input data, improving model

Attention Visualization

A technique for visualizing model attention distributions to understand model focus.

Authentication

The process of verifying the identity of a user or system, typically via credentials and

Authorization

The function of specifying access rights/privileges to resources.

B

9

Backpropagation

A widely used algorithm for training feedforward neural networks.

Batch Size

The number of samples used in one training iteration, affecting training speed and model

Beachhead Market

Initial niche to win for references and momentum.

Blameless Postmortem

Post-incident review focused on system fixes, not blame.

Blockchain

A system in which a record of transactions made in bitcoin or another cryptocurrency is maintained

Blue-Green Deployment

A zero-downtime deployment strategy using two environments, enabling rapid traffic switching.

BM25

A classic ranking function used to evaluate document relevance to queries.

Bowling Alley

Stage of expanding from one niche to adjacent segments.

Burn Rate

Rate of error budget spend for alerting and release.

C

53

CAC

Average cost to acquire a customer, used for channel efficiency.

Canary Deployment

A deployment strategy that gradually directs traffic to the new version, reducing risk and quickly

CANN

Huawei Ascend's heterogeneous computing architecture, providing neural network computing engines

Canonical URL

Canonical URL to avoid duplicate content and split ranking signals.

CAP

Tradeoff: consistency, availability, partition tolerance cannot all be met.

CDC

Captures DB changes for downstream sync and streaming pipelines.

CDN

Content Delivery Network.

Certificate Authority

An organization that issues and manages digital certificates, responsible for verifying identities

cgroup

A mechanism for limiting, accounting for, and isolating the resource usage of a process group.

Chain of Thought

A prompting technique that enables large language models to solve complex tasks by generating a

Chaos Engineering

An engineering method that improves system resilience by actively injecting failures, helping to

Checkpoint

A snapshot of the model training state, used for recovery after training interruption or model

Chiplet

A design approach that decomposes large chips into multiple smaller chips, achieving high-speed

CI/CD

Continuous Integration and Continuous Deployment/Delivery.

Circuit Breaker

A design pattern used in software development to detect failures and encapsulate the logic of

CLA

Agreement clarifying contributor copyright permissions.

CLIP

Contrastive Language-Image Pre-training, a model that connects text and images.

CLS

Cumulative Layout Shift; measures visual stability.

CNCF

Cloud Native Computing Foundation, a sub-foundation of the Linux Foundation.

CNI

Container Network Interface, a project to write specifications and libraries for configuring

CNN

A class of deep neural networks, most commonly applied to analyzing visual imagery.

Cohort Analysis

Cohort-based analysis of retention and behavior changes.

ColBERT

A token-level vector retrieval method that retains fine-grained matching information.

ConfigMap

A Kubernetes resource for storing non-sensitive configuration data, separating configuration from

Consistent Hashing

Sharding/cache strategy that minimizes remapping on node changes.

Container

A standard unit of software that packages up code and all its dependencies so the application runs

Context

The information that surrounds a piece of text and helps to determine its meaning.

Context Engineering

Optimizing the use of the context window.

Context Window

The maximum number of tokens a model can process, determining the model's context understanding

Continuous Batching

A batching technique that dynamically merges requests to improve GPU utilization, also known as

Copyleft

License model requiring derivatives to remain open source.

Coreference Resolution

The task of identifying reference relationships in text, such as pointing 'he' to a specific person.

CoT

A prompting technique that enables large language models to solve complex tasks by breaking them

Crawl Budget

Crawl quota that affects indexing and refresh rates.

CRD

Custom Resource Definition, a mechanism to extend the Kubernetes API.

CRDT

Conflict-free replicated data types with convergent merges.

CRI

Container Runtime Interface, a plugin interface which enables kubelet to use a wide variety of

Crossing the Chasm

Market gap between early adopters and early majority.

CSI

Container Storage Interface, a standard for exposing file and block storage systems to

CUDA

NVIDIA's parallel computing platform and programming model that allows developers to use GPUs for

CV

A field of artificial intelligence that trains computers to interpret and understand the visual

CWV

Core Web Vitals set measuring page experience.

Cloud Native

Application development and deployment methods that fully leverage the advantages of cloud

Containerization

The technology of packaging applications and their dependencies into containers.

Container Orchestration

Technology for automating the deployment, scaling, and connection of containers.

Continuous Delivery

A development practice that keeps code in a state ready to be deployed to production at any time.

Continuous Deployment

The practice of automatically deploying tested code changes to production.

Continuous Integration

A development practice of frequently integrating code changes into the main branch.

Canary Release

Gradually releasing a new version to a subset of users to verify its stability and performance.

Compute Unified Device Architecture

NVIDIA's parallel computing platform and programming model that allows developers to use GPUs for

Configuration Management

The process of managing system configurations, including creating, updating, and maintaining

Canary Deployment

A deployment strategy that gradually directs traffic to the new version, reducing risk and quickly

Chain Invocation

A programming pattern that chains multiple operations or function calls together.

D

18

DaemonSet

A Kubernetes resource that ensures a Pod copy runs on each node, commonly used for system-level

Data Lakehouse

Architecture combining data lake and warehouse capabilities.

DCO

Sign-off asserting lawful origin of contributions.

Deep Learning

Part of a broader family of machine learning methods based on artificial neural networks with

Dependency Parsing

The task of analyzing dependency relationships between words in sentences.

DestinationRule

Configuration in Istio that defines policies for service destinations, implementing load balancing,

Device Plugin

A plugin mechanism in Kubernetes for hardware device resource extension, supporting specialized

DevOps

A set of practices that combines software development (Dev) and IT operations (Ops).

Differential Privacy

A statistical method that protects individual privacy by adding noise.

Diffusion Model

A generative model that produces data by gradually denoising.

Distillation

A technique that transfers knowledge from large models to small models, maintaining performance

Distributed Tracing

A technology that tracks the propagation path of requests between microservices, used for

DPO

Optimizes models directly from preference data.

DRA

Dynamic Resource Allocation, a mechanism that assigns compute resources on demand to workloads.

Dynamic Graph

A graph whose nodes or edges change over time to represent dynamic relationships.

Dropout

A regularization technique that randomly drops some neurons during training, preventing overfitting.

Daemon

Processes that run in the background and perform system-level tasks, usually starting automatically

Decoder

The part of a neural network responsible for converting internal representations into output.

E

16

E-E-A-T

Search quality framework for experience, expertise, authority, trust.

Early Adopters

Visionary users who try new tech and drive adoption.

Early Majority

Pragmatic users needing proven, reliable solutions.

eBPF

A revolutionary technology with the Linux kernel that can run sandboxed programs in a privileged

ELT

Pipeline that extracts, loads, then transforms in target system.

Embedding Model

A model that converts text into numerical vectors.

Entity Recognition

The task of identifying named entities from text, such as person names, place names, etc.

Epoch

One complete pass through the training dataset, a basic unit of model training.

Error Budget

Allowed SLO failure budget used for release decisions.

etcd

A strongly consistent, distributed key-value store that provides a reliable way to store data that

ETL

Pipeline that extracts, transforms, then loads data.

Exactly-once

Semantics where messages are processed exactly once.

Embedding

A representation method that maps discrete data (such as words) to a continuous vector space.

Elastic Scaling

The ability to automatically adjust resources based on load, including horizontal and vertical

Encoder

The part of a neural network responsible for converting input into an internal representation.

Edge Computing

A computing paradigm that performs computation at the network edge close to data sources, reducing

F

9

Fault Tolerance

The property that enables a system to continue operating properly in the event of the failure of

Feature Importance

A metric evaluating the contribution of each feature to model predictions.

Federated Learning

A privacy-preserving technique that trains models on distributed devices without sharing raw data.

Few-shot Learning

The ability to learn new tasks with only a few samples.

Fine-tuning

Additional training on top of a pre-trained model to adapt the model to specific tasks or domains.

Fixed-size Chunking

A chunking strategy that splits documents by fixed size, simple but may break semantics.

FOSS

Free and open source software emphasizing user freedom.

Function Calling

The mechanism for LLMs to call external functions, enabling integration with external systems.

Function as a Service

A cloud computing service model that allows running code without managing servers.

G

15

GAN

A class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014.

Gateway API

Next-gen Kubernetes gateway spec for L4/L7 ingress control.

GitOps

An operational model that takes DevOps best practices used for application development, such as

Goodput

Actual usable throughput under the premise of meeting SLO, a metric that better reflects the true

Gossip Protocol

Randomized state dissemination for membership and config propagation.

GPL

Strong copyleft license requiring derivatives to be open source.

GPUDirect

NVIDIA technology that allows GPUs to directly access network or storage device data, bypassing the

GPUDirect RDMA

Combining GPUDirect and RDMA technologies to achieve direct high-speed data transfer between GPUs.

Gradient

The direction that guides parameter updates in optimization algorithms, representing the direction

Gradient Descent

An optimization algorithm used to minimize some function by iteratively moving in the direction of

Grafana

An open-source visualization monitoring platform supporting multiple data sources and rich panel

Graph RAG

A RAG technique combined with knowledge graphs, providing more structured contextual information.

Growth Loop

Loop where output feeds the next growth cycle.

Guardrails

Constraint mechanisms that limit the output range of AI models, ensuring output meets expectations

Guardrails

Constraint mechanisms that limit the output range of AI models, ensuring output meets expectations

H

12

Hallucination

A phenomenon where a large language model perceives patterns or objects that are nonexistent or

HBM

High Bandwidth Memory, high-speed memory used in GPUs, providing higher bandwidth than traditional

Helm

The package manager for Kubernetes.

Helm Chart

A package of templates and configuration used to define, install, and upgrade Kubernetes

HNSW

Hierarchical Navigable Small World, an efficient vector indexing algorithm.

Homomorphic Encryption

An encryption method that allows computation directly on encrypted data.

Horizontal Pod Autoscaler

A mechanism that automatically adjusts the number of Pods based on load, achieving elastic scaling

HPA

Automatically scales pod replicas based on metrics.

Hreflang

Language/region tags to avoid incorrect search targeting.

Hybrid Search

A retrieval strategy combining keyword search and semantic search.

Health Check

A method for periodically checking whether an application or service is running normally.

High Bandwidth Memory

High-speed memory used in GPUs, providing higher bandwidth than traditional GDDR.

I

12

Idempotency

Property where repeated requests yield the same result.

Image-to-Text

The task of generating text descriptions based on images.

Inference

The process of using a trained machine learning model to make predictions.

Inference Engine

Software or hardware accelerators specifically designed for model inference, optimizing inference

InfiniBand

A high-performance computer network communication standard, providing high-bandwidth, low-latency

Ingress

A Kubernetes API object that manages external access, providing HTTP and HTTPS routing rules.

Init Container

A helper container that runs before the main container starts, used for initializing configuration

INP

Interaction to Next Paint; measures responsiveness.

Intent Detection

A classification task that identifies user query intent.

Infrastructure as Code

A method of managing and configuring infrastructure using code.

Infrastructure as a Service

Cloud computing services that provide virtualized computing resources.

InfiniBand

A high-performance computer network communication standard, providing high-bandwidth, low-latency

J

1

JWT

JSON Web Token is an open standard (RFC 7519) that defines a compact and self-contained way for

K

7

K8s

Common abbreviation for Kubernetes, derived from the 8 letters between 'K' and 's'.

Knowledge Distillation

A technique that transfers knowledge from large models to small models.

Knowledge Graph

A knowledge representation method that uses graph structures to represent entities and their

Knowledge Graph Embedding

A technique that maps entities and relations in knowledge graphs to vector space.

kubectl

The command line tool for communicating with a Kubernetes cluster's control plane.

Kubelet

Node agent managing pod lifecycle and container runtime.

KV Cache

A data structure used to store and retrieve key-value pairs.

L

14

Lamport Clock

Logical clocks for ordering events with causal consistency.

Late Interaction

A retrieval method that interacts between all vector embeddings of queries and documents.

Late Majority

Conservative users relying on standards and low risk.

LCP

Largest Contentful Paint; measures main content load time.

Learning Rate

A hyperparameter that controls the step size of model parameter updates, affecting training

LIME

Local Interpretable Model-agnostic Explanations, a local interpretable model explanation method.

Liveness Probe

A health check that detects whether a container is alive, restarting the container if it fails.

LLM

Large Language Model.

Load Balancer

A device that acts as a reverse proxy and distributes network or application traffic across a

Load Balancing

The process of distributing a set of tasks over a set of resources (computing units), with the aim

LoRA

Low-Rank Adaptation, a parameter-efficient fine-tuning technique for large language models.

Low-Rank Factorization

A compression technique that decomposes weight matrices into the product of two smaller matrices.

LTV

Customer lifetime value used to assess payback potential.

Large Language Model

Deep learning models with huge parameter scales, typically referring to language models with

M

25

Machine Learning

A field of inquiry devoted to understanding and building methods that 'learn'.

Machine Translation

The task of automatically translating text from one language to another.

Main Street

Mature market phase prioritizing stability, cost, standards.

Markov Decision Process

A mathematical framework in reinforcement learning describing agent-environment interaction.

Maximal Marginal Relevance

A reranking strategy that balances relevance and diversity.

MCP

A protocol that standardizes how models exchange context with external tools and data sources,

Metadata Filtering

A technique for filtering results through metadata in vector retrieval.

Metrics

Numerical measurable data points used for monitoring and alerting.

Microservices

An architectural style that structures an application as a collection of services.

MIG

Multi-Instance GPU, a technique that divides a single GPU into multiple instances.

Mixture of Experts

A model architecture that processes inputs by activating a subset of expert networks, improving

Model Compression

A collection of techniques for reducing model size and computational overhead.

MoE

A machine learning technique where a model is composed of multiple 'expert' networks, each

MPS

Multi-Process Service, a technique that allows multiple processes to share GPU resources.

MTBF

Mean time between failures; measures stability.

mTLS

Mutual Transport Layer Security, ensuring service authentication and secure data transmission

MTTR

Mean time to recovery; measures repair speed.

Multi-Cluster

A deployment architecture that involves multiple clusters.

Multi-Head Attention

A mechanism that executes multiple attention operations in parallel, capturing different feature

Multi-Mesh

An architecture involving multiple service meshes.

Multi-vector Retrieval

Generating and retrieving vectors for different parts of a document (such as title, body)

Multimodal

Models or systems that process multiple data types (text, images, audio, etc.).

MUSA

Moore Threads' Unified System Architecture, supporting general-purpose computing on their GPUs.

Mutual Authentication

A security mechanism where both parties verify each other's identity, enhancing security.

Model Pruning

A technique that removes unimportant connections or neurons in neural networks, reducing model size

N

10

Named Entity Recognition

The task of identifying and classifying named entities from text.

Namespace

A virtual cluster in Kubernetes for resource isolation, enabling multi-tenancy and resource quota

Nash Equilibrium

A state in game theory where no player wants to unilaterally change strategy.

Neural Architecture Search

A technique for automatically searching for optimal neural network architectures.

Neural Network

A network or circuit of neurons, or in a modern sense, an artificial neural network, composed of

Neuware

Cambricon's AI software stack, including development tools, runtime, and drivers.

NLP

A subfield of linguistics, computer science, and artificial intelligence concerned with the

North Star Metric

Core metric guiding long-term growth and alignment.

NUMA

Non-Uniform Memory Access, a computer architecture where memory access speed depends on the memory

NVLink

NVLink, a high-speed serial communication interface used to connect GPUs.

O

18

OAuth

An open standard for access delegation, commonly used as a way for Internet users to grant websites

OCI

Open Container Initiative, an open governance structure for the express purpose of creating open

Okapi BM25

The original implementation of the BM25 algorithm, widely used in information retrieval systems.

OLAP

Analytical processing for reporting and data warehousing.

OLTP

Transactional processing focused on low latency and consistency.

On-call

On-call rotation to ensure rapid incident response.

Once-for-All

A neural architecture search method that trains once to adapt to multiple deployment scenarios.

OOM

Out of Memory, an error that occurs when a program runs out of memory.

OPA

General policy engine using Rego for access/compliance rules.

OpenTelemetry

An open standard for observability data collection, unifying the collection of traces, metrics, and

Operator

A controller for encapsulating and managing application operational knowledge in Kubernetes,

Orca

Orca, an optimizer for large-scale distributed training.

Orchestration

The automated configuration, coordination, and management of computer systems and software.

OSI

Organization maintaining the Open Source Definition and licenses.

Overfitting

A phenomenon where a model performs well on the training set but has poor generalization ability,

Overprovisioning

Allocating more resources than currently needed in advance to meet burst demands or ensure high

Oversubscription

A situation where the total allocated resources exceed the physically available resources, commonly

Observability

The ability to understand the internal state of a system through its external outputs, including

P

26

2PC

Two-phase commit protocol for distributed transactions.

PACELC

When partitioned choose C/A; otherwise choose consistency/latency.

PagedAttention

A technique that improves the efficiency of attention mechanisms by using a paging mechanism.

Part-of-Speech Tagging

The task of tagging the part of speech for each word in text.

Pause Container

The container responsible for sharing the network namespace in a Pod, also known as the sandbox

Paxos

Classic consensus algorithm for agreement over unreliable networks.

Payback Period

Time required to recover acquisition cost.

PCIe

A high-speed serial computer expansion bus standard.

PDB

Limits allowable pod disruptions to protect availability.

PEFT

Parameter-efficient fine-tuning with fewer trainable weights.

Permissive License

Open source license allowing proprietary redistribution.

PLG

Product-led growth driven by self-serve product experience.

PMF

Degree of product-market fit and its signals.

Pod Disruption Budget

A mechanism that controls the number of simultaneous Pod interruptions, guaranteeing minimum

Podcast

A series of audio programs published online and typically consumed via subscription.

Pragmatists

Customers focused on reliability and proven value.

Pre-training

The foundational phase of training a model on a large-scale dataset, learning general knowledge.

Prometheus

An open-source monitoring and alerting system that uses a pull model to collect time-series data.

Prompt

The input provided to a model to generate a response.

Prompt Engineering

The process of structuring text that can be interpreted and understood by a generative AI model.

Prompt Tuning

Optimizes prompt vectors while keeping model weights frozen.

PromptOps

Prompt Operations.

Pruning

A technique that removes unimportant parameters or neurons from the model.

Positional Encoding

A technique that adds positional information to each position in a sequence, enabling the model to

Parameter-Efficient Fine-tuning

A method that only fine-tunes a small number of model parameters, dramatically reducing training

Platform as a Service

Cloud computing services that provide environments for application development and deployment.

Q

5

Qos

Quality of Service, a metric used to describe the performance and reliability of a system.

Quantization

A technique that reduces model precision (such as FP32 to INT8) to decrease computational load and

Query Understanding

The step of analyzing query intent and semantics, improving retrieval accuracy.

Question Answering

The task of answering questions based on given context.

Quorum

Read/write succeeds with a majority to ensure consistency.

R

28

Raft

Consensus algorithm for log replication and state machine consistency.

RAG

A method that retrieves external knowledge and combines it with generation to improve accuracy and

Rate Limiting

A strategy for limiting network traffic.

RBAC

Role-Based Access Control, a permission management system that defines user permissions through

RDMA

Remote Direct Memory Access, a direct memory access technique that bypasses the operating system

ReAct

Reasoning + Acting, an agent framework that combines reasoning and action.

Readiness Probe

A health check that detects whether a container is ready to serve requests, removing it from the

Recursive Character Splitting

A document chunking method that recursively splits at the paragraph, sentence, and word levels.

Reflexion

A self-reflection mechanism that enables agents to learn from failures.

Reinforcement Learning

A machine learning method that trains agents through trial and error to maximize rewards.

ReplicaSet

A Kubernetes controller that maintains a set of running Pod replicas, ensuring a specified number

Reranking

A technique that performs secondary sorting on initial retrieval results to improve relevance.

RL

A machine learning method based on rewarding desired behaviors and/or punishing undesired ones.

RLAIF

Alignment via reinforcement learning from AI feedback.

RLHF

A technique that trains a reward model directly from human feedback and uses the model to optimize

RNN

A class of artificial neural networks where connections between nodes can create a cycle, allowing

Robots.txt

Rules file controlling crawler access.

Robust

Strong and healthy; vigorous.

Robustness

The ability of a system to maintain function and performance under disturbances or input variation.

ROCm

AMD's open GPU computing platform, providing a CUDA-like development experience and supporting AMD

Rolling Update

An update strategy that gradually replaces old version Pods, achieving zero-downtime deployment.

RotatE

A knowledge graph embedding method that models relations as rotations in complex space.

RSS

A syndication format for subscribing to and aggregating website updates.

Retrieval-Augmented Generation

A technique combining information retrieval and generative models to improve the accuracy and

Residual Connection

A connection method that skips certain layers, helping gradients propagate better through deep

Resource Quota

Policies that limit resource usage in namespaces, including quotas for CPU, memory, storage, and

Resource Limit

Policies or mechanisms that set upper bounds on resource usage.

Remote Direct Memory Access

A direct memory access technique that bypasses the operating system kernel, reducing network

S

38

Saga Pattern

Long transactions split into local steps with compensations.

Saliency Map

A heatmap showing the importance of each part of the input image to the model output.

SBOM

Bill of materials describing components for security/compliance.

Schema Markup

Structured data markup enabling rich search results.

SDD

Specification-Driven Development.

Secure Multi-Party Computation

Protocols where multiple parties jointly compute a function without revealing their inputs.

Self-Attention

The core mechanism in Transformer that computes relationships between elements within a sequence.

Semantic Chunking

A chunking strategy that splits documents based on semantic boundaries, maintaining semantic

Sentiment Analysis

The task of identifying text sentiment polarity, such as positive, negative, neutral.

SERP

Search engine results page; optimized for visibility and clicks.

Serverless

A cloud computing execution model in which the cloud provider runs the server, and dynamically

Service Identity

A mechanism for identifying microservice identities, used for authentication and authorization

Service Mesh

A dedicated infrastructure layer for handling service-to-service communication.

SHAP

SHapley Additive exPlanations, a model interpretation method.

Sidecar

A design pattern where a helper container runs alongside the main application container in the same

Sitemap.xml

Index file helping crawlers discover and update pages.

SLA

Service Level Agreement, a formal agreement between service providers and customers.

SLI

Service level indicator that quantifies performance.

Sliding Window

A document chunking strategy that maintains overlap between adjacent chunks.

SLO

Service Level Objective, defining specific targets for service performance.

SM

Streaming Multiprocessor, a type of GPU core.

Smart Contract

A self-executing contract with the terms of the agreement between buyer and seller being directly

SPDX

Standard for software license and component identifiers.

Speculative Decoding

Small model proposes tokens; large model verifies.

SPIFFE

Service identity standard for workload authentication.

SPIFFE/SPIRE

Standards for providing identities in dynamic environments, with SPIRE being the implementation of

SPIRE

SPIFFE runtime for issuing and rotating identities.

SSL/TLS

Cryptographic protocols designed to provide communications security over a computer network.

Stable Diffusion

A text-to-image generation model based on diffusion models.

StatefulSet

A Kubernetes workload resource used to manage stateful applications, providing stable identities

Stateless Application

Applications that do not save any session state and can scale instances up or down at any time.

Sidecar-less Mesh

A service mesh architecture that does not require deploying proxies next to each application, such

Stateful Application

Applications that need to maintain state data, such as databases, where each instance has a unique

Service Discovery

The mechanism for automatically detecting and locating available service instances on a network.

Service Level Agreement

A formal agreement between service providers and customers, defining service quality and

Semantic Search

A search method based on semantic understanding rather than keyword matching.

Software as a Service

A cloud computing service model that provides software applications over the internet.

Sidecar Pattern

A design pattern that deploys auxiliary functions alongside the main application, commonly used in

T

27

Technology Adoption Lifecycle

Adoption model from innovators to laggards.

Telemetry

The technology for remotely collecting and transmitting data, used for system monitoring and

Temporal Graph

A graph structure where nodes and edges change over time.

Tensor

A multidimensional array, the basic data structure for AI computing, used to represent data and

Text Classification

The task of assigning text to predefined categories.

Text Generation

The task of automatically generating text content.

Text Summarization

The task of generating a brief summary from a long text.

Text-to-Image

The task of generating images based on text descriptions.

Text-to-Text

A model task type where both input and output are text.

TF-IDF

Term Frequency-Inverse Document Frequency, a metric measuring the importance of terms in documents.

TFLOPS

Trillion Floating Point Operations Per Second, a metric used to measure computational power.

Time-slicing

A technique for GPU sharing through time-slicing, where different processes use the GPU in

TOIL

Manual repetitive ops work SREs aim to reduce.

Token

The basic unit of text processed by large language models, which can be words, subwords, or

Tokenization

Splits text into tokens, affecting context length and cost.

Tool Calling

The ability of agents to perform external operations, expanding the functional boundaries of AI.

TOPS

Trillion Operations Per Second, a metric for measuring AI accelerator performance, indicating the

Tornado

Hyper-growth phase focused on scale and market share.

TPOT

Time Per Output Token, the time interval per token during generation, a metric measuring generation

TPU

Tensor Processing Unit, a specialized hardware accelerator developed by Google for machine learning.

TransE

A simple knowledge graph embedding method that treats relations as translation vectors.

Transformer

A deep learning model that adopts the mechanism of self-attention, differentially weighting the

Tree of Thoughts

A reasoning method that extends chain of thought into a tree structure, exploring multiple possible

TTFT

Time To First Token, a metric measuring inference response speed, representing the time from

TTS

The technology of converting text to speech.

Time Slice

The basic unit of CPU scheduling, the time period during which each process takes turns using the

Training

The process of adjusting model parameters using a dataset, enabling the model to learn patterns in

U

1

Underfitting

A phenomenon where a model fails to fully learn the features of the training data, usually caused

V

12

Vector Clock

Vector timestamps for causality tracking and conflict detection.

Vector Database

A database that indexes and stores vector embeddings for fast retrieval and similarity search.

Vertical Pod Autoscaler

A mechanism that automatically adjusts Pod resource requests, optimizing resource utilization.

vGPU

GPU virtualization technology that divides a physical GPU into multiple virtual GPUs for use by

Vibe Coding

A coding style emphasizing atmosphere and flow.

Viral Coefficient

Ratio of new users generated per existing user.

VirtualService

An Istio resource that defines traffic routing rules, implementing traffic management features such

Vision Transformer

A model that applies Transformer architecture to computer vision tasks.

Visionaries

Customers seeking disruptive innovation and advantage.

VPA

Adjusts pod resource requests/limits to optimize usage.

Vectorization

The process of converting data into vector representations, used in machine learning and

Virtualization

The technology of creating virtual versions of computer system resources.

W

6

WASM

A binary instruction format that can run in browsers, providing near-native performance.

Waypoint

A proxy component in Istio Ambient mode that handles L7 traffic management and policy enforcement.

Webhook

A method of augmenting or altering the behavior of a web page or web application with custom

Weight Sharing

A technique that shares the same parameters across different parts of the model, reducing

Whole Product

Complete solution including delivery, services, and complements.

Weight Decay

A regularization technique that adds weight norms to the loss function, preventing overfitting.

X

2

x509 Certificate

A digital certificate standard for service authentication, defining the format and distribution of

XPU

Baidu's AI chip architecture, specifically designed for deep learning training and inference.

Z

4

Zero Trust

A network security model that does not trust any user or device by default, requiring verification

Zero-shot Learning

The ability to complete new tasks without any samples.

Zero-sum Game

A game scenario where one player's gain equals another player's loss.

ztunnel

The tunnel proxy in Istio Ambient mode, responsible for L4 traffic forwarding and mTLS encryption.

Summary

All terms on this page for consistent writing and translation.