A2A
Agent-to-Agent, the collaboration and communication pattern between agents.
Site Glossary
A single-page index of cloud native and AI terminology with fast lookup and grouping.
Terms are the doorway into complex systems, not the end of learning.
This page consolidates the site’s key concepts and unique terminology for fast browsing and cross-reference.
Agent-to-Agent, the collaboration and communication pattern between agents.
Agent-to-LLM, the interaction between agents and language models.
Agent-to-Tool, the ability of an agent to call external tools.
Acquisition-Activation-Retention-Revenue-Referral funnel model.
An entity or software component that perceives its environment and acts to achieve goals.
AI that can plan, act, and use tools autonomously.
The execution environment that supports agents.
A web of interacting agents.
Artificial General Intelligence, a type of artificial intelligence (AI) that matches or exceeds
An autonomous entity that perceives and acts to achieve goals.
A sidecar-less service mesh mode in Istio that implements traffic management through node-level
Permissive license with patent grant and notice requirements.
A server that acts as an API front-end, receiving API requests, enforcing throttling and security
Delivery semantics that may duplicate, often with idempotency.
A mechanism that allows models to focus on important parts of input data, improving model
A technique for visualizing model attention distributions to understand model focus.
The process of verifying the identity of a user or system, typically via credentials and
The function of specifying access rights/privileges to resources.
A widely used algorithm for training feedforward neural networks.
The number of samples used in one training iteration, affecting training speed and model
Initial niche to win for references and momentum.
Post-incident review focused on system fixes, not blame.
A system in which a record of transactions made in bitcoin or another cryptocurrency is maintained
A zero-downtime deployment strategy using two environments, enabling rapid traffic switching.
A classic ranking function used to evaluate document relevance to queries.
Stage of expanding from one niche to adjacent segments.
Rate of error budget spend for alerting and release.
Average cost to acquire a customer, used for channel efficiency.
A deployment strategy that gradually directs traffic to the new version, reducing risk and quickly
Huawei Ascend's heterogeneous computing architecture, providing neural network computing engines
Canonical URL to avoid duplicate content and split ranking signals.
Tradeoff: consistency, availability, partition tolerance cannot all be met.
Captures DB changes for downstream sync and streaming pipelines.
Content Delivery Network.
An organization that issues and manages digital certificates, responsible for verifying identities
A mechanism for limiting, accounting for, and isolating the resource usage of a process group.
A prompting technique that enables large language models to solve complex tasks by generating a
An engineering method that improves system resilience by actively injecting failures, helping to
A snapshot of the model training state, used for recovery after training interruption or model
A design approach that decomposes large chips into multiple smaller chips, achieving high-speed
Continuous Integration and Continuous Deployment/Delivery.
A design pattern used in software development to detect failures and encapsulate the logic of
Agreement clarifying contributor copyright permissions.
Contrastive Language-Image Pre-training, a model that connects text and images.
Cumulative Layout Shift; measures visual stability.
Cloud Native Computing Foundation, a sub-foundation of the Linux Foundation.
Container Network Interface, a project to write specifications and libraries for configuring
A class of deep neural networks, most commonly applied to analyzing visual imagery.
Cohort-based analysis of retention and behavior changes.
A token-level vector retrieval method that retains fine-grained matching information.
A Kubernetes resource for storing non-sensitive configuration data, separating configuration from
Sharding/cache strategy that minimizes remapping on node changes.
A standard unit of software that packages up code and all its dependencies so the application runs
The information that surrounds a piece of text and helps to determine its meaning.
Optimizing the use of the context window.
The maximum number of tokens a model can process, determining the model's context understanding
A batching technique that dynamically merges requests to improve GPU utilization, also known as
License model requiring derivatives to remain open source.
The task of identifying reference relationships in text, such as pointing 'he' to a specific person.
A prompting technique that enables large language models to solve complex tasks by breaking them
Crawl quota that affects indexing and refresh rates.
Custom Resource Definition, a mechanism to extend the Kubernetes API.
Conflict-free replicated data types with convergent merges.
Container Runtime Interface, a plugin interface which enables kubelet to use a wide variety of
Market gap between early adopters and early majority.
Container Storage Interface, a standard for exposing file and block storage systems to
NVIDIA's parallel computing platform and programming model that allows developers to use GPUs for
A field of artificial intelligence that trains computers to interpret and understand the visual
Core Web Vitals set measuring page experience.
Application development and deployment methods that fully leverage the advantages of cloud
The technology of packaging applications and their dependencies into containers.
Technology for automating the deployment, scaling, and connection of containers.
A development practice that keeps code in a state ready to be deployed to production at any time.
The practice of automatically deploying tested code changes to production.
A development practice of frequently integrating code changes into the main branch.
Gradually releasing a new version to a subset of users to verify its stability and performance.
NVIDIA's parallel computing platform and programming model that allows developers to use GPUs for
The process of managing system configurations, including creating, updating, and maintaining
A deployment strategy that gradually directs traffic to the new version, reducing risk and quickly
A programming pattern that chains multiple operations or function calls together.
A Kubernetes resource that ensures a Pod copy runs on each node, commonly used for system-level
Architecture combining data lake and warehouse capabilities.
Sign-off asserting lawful origin of contributions.
Part of a broader family of machine learning methods based on artificial neural networks with
The task of analyzing dependency relationships between words in sentences.
Configuration in Istio that defines policies for service destinations, implementing load balancing,
A plugin mechanism in Kubernetes for hardware device resource extension, supporting specialized
A set of practices that combines software development (Dev) and IT operations (Ops).
A statistical method that protects individual privacy by adding noise.
A generative model that produces data by gradually denoising.
A technique that transfers knowledge from large models to small models, maintaining performance
A technology that tracks the propagation path of requests between microservices, used for
Optimizes models directly from preference data.
Dynamic Resource Allocation, a mechanism that assigns compute resources on demand to workloads.
A graph whose nodes or edges change over time to represent dynamic relationships.
A regularization technique that randomly drops some neurons during training, preventing overfitting.
Processes that run in the background and perform system-level tasks, usually starting automatically
The part of a neural network responsible for converting internal representations into output.
Search quality framework for experience, expertise, authority, trust.
Visionary users who try new tech and drive adoption.
Pragmatic users needing proven, reliable solutions.
A revolutionary technology with the Linux kernel that can run sandboxed programs in a privileged
Pipeline that extracts, loads, then transforms in target system.
A model that converts text into numerical vectors.
The task of identifying named entities from text, such as person names, place names, etc.
One complete pass through the training dataset, a basic unit of model training.
Allowed SLO failure budget used for release decisions.
A strongly consistent, distributed key-value store that provides a reliable way to store data that
Pipeline that extracts, transforms, then loads data.
Semantics where messages are processed exactly once.
A representation method that maps discrete data (such as words) to a continuous vector space.
The ability to automatically adjust resources based on load, including horizontal and vertical
The part of a neural network responsible for converting input into an internal representation.
A computing paradigm that performs computation at the network edge close to data sources, reducing
The property that enables a system to continue operating properly in the event of the failure of
A metric evaluating the contribution of each feature to model predictions.
A privacy-preserving technique that trains models on distributed devices without sharing raw data.
The ability to learn new tasks with only a few samples.
Additional training on top of a pre-trained model to adapt the model to specific tasks or domains.
A chunking strategy that splits documents by fixed size, simple but may break semantics.
Free and open source software emphasizing user freedom.
The mechanism for LLMs to call external functions, enabling integration with external systems.
A cloud computing service model that allows running code without managing servers.
A class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014.
Next-gen Kubernetes gateway spec for L4/L7 ingress control.
An operational model that takes DevOps best practices used for application development, such as
Actual usable throughput under the premise of meeting SLO, a metric that better reflects the true
Randomized state dissemination for membership and config propagation.
Strong copyleft license requiring derivatives to be open source.
NVIDIA technology that allows GPUs to directly access network or storage device data, bypassing the
Combining GPUDirect and RDMA technologies to achieve direct high-speed data transfer between GPUs.
The direction that guides parameter updates in optimization algorithms, representing the direction
An optimization algorithm used to minimize some function by iteratively moving in the direction of
An open-source visualization monitoring platform supporting multiple data sources and rich panel
A RAG technique combined with knowledge graphs, providing more structured contextual information.
Loop where output feeds the next growth cycle.
Constraint mechanisms that limit the output range of AI models, ensuring output meets expectations
Constraint mechanisms that limit the output range of AI models, ensuring output meets expectations
A phenomenon where a large language model perceives patterns or objects that are nonexistent or
High Bandwidth Memory, high-speed memory used in GPUs, providing higher bandwidth than traditional
The package manager for Kubernetes.
A package of templates and configuration used to define, install, and upgrade Kubernetes
Hierarchical Navigable Small World, an efficient vector indexing algorithm.
An encryption method that allows computation directly on encrypted data.
A mechanism that automatically adjusts the number of Pods based on load, achieving elastic scaling
Automatically scales pod replicas based on metrics.
Language/region tags to avoid incorrect search targeting.
A retrieval strategy combining keyword search and semantic search.
A method for periodically checking whether an application or service is running normally.
High-speed memory used in GPUs, providing higher bandwidth than traditional GDDR.
Property where repeated requests yield the same result.
The task of generating text descriptions based on images.
The process of using a trained machine learning model to make predictions.
Software or hardware accelerators specifically designed for model inference, optimizing inference
A high-performance computer network communication standard, providing high-bandwidth, low-latency
A Kubernetes API object that manages external access, providing HTTP and HTTPS routing rules.
A helper container that runs before the main container starts, used for initializing configuration
Interaction to Next Paint; measures responsiveness.
A classification task that identifies user query intent.
A method of managing and configuring infrastructure using code.
Cloud computing services that provide virtualized computing resources.
A high-performance computer network communication standard, providing high-bandwidth, low-latency
JSON Web Token is an open standard (RFC 7519) that defines a compact and self-contained way for
Common abbreviation for Kubernetes, derived from the 8 letters between 'K' and 's'.
A technique that transfers knowledge from large models to small models.
A knowledge representation method that uses graph structures to represent entities and their
A technique that maps entities and relations in knowledge graphs to vector space.
The command line tool for communicating with a Kubernetes cluster's control plane.
Node agent managing pod lifecycle and container runtime.
A data structure used to store and retrieve key-value pairs.
Logical clocks for ordering events with causal consistency.
A retrieval method that interacts between all vector embeddings of queries and documents.
Conservative users relying on standards and low risk.
Largest Contentful Paint; measures main content load time.
A hyperparameter that controls the step size of model parameter updates, affecting training
Local Interpretable Model-agnostic Explanations, a local interpretable model explanation method.
A health check that detects whether a container is alive, restarting the container if it fails.
Large Language Model.
A device that acts as a reverse proxy and distributes network or application traffic across a
The process of distributing a set of tasks over a set of resources (computing units), with the aim
Low-Rank Adaptation, a parameter-efficient fine-tuning technique for large language models.
A compression technique that decomposes weight matrices into the product of two smaller matrices.
Customer lifetime value used to assess payback potential.
Deep learning models with huge parameter scales, typically referring to language models with
A field of inquiry devoted to understanding and building methods that 'learn'.
The task of automatically translating text from one language to another.
Mature market phase prioritizing stability, cost, standards.
A mathematical framework in reinforcement learning describing agent-environment interaction.
A reranking strategy that balances relevance and diversity.
A protocol that standardizes how models exchange context with external tools and data sources,
A technique for filtering results through metadata in vector retrieval.
Numerical measurable data points used for monitoring and alerting.
An architectural style that structures an application as a collection of services.
Multi-Instance GPU, a technique that divides a single GPU into multiple instances.
A model architecture that processes inputs by activating a subset of expert networks, improving
A collection of techniques for reducing model size and computational overhead.
A machine learning technique where a model is composed of multiple 'expert' networks, each
Multi-Process Service, a technique that allows multiple processes to share GPU resources.
Mean time between failures; measures stability.
Mutual Transport Layer Security, ensuring service authentication and secure data transmission
Mean time to recovery; measures repair speed.
A deployment architecture that involves multiple clusters.
A mechanism that executes multiple attention operations in parallel, capturing different feature
An architecture involving multiple service meshes.
Generating and retrieving vectors for different parts of a document (such as title, body)
Models or systems that process multiple data types (text, images, audio, etc.).
Moore Threads' Unified System Architecture, supporting general-purpose computing on their GPUs.
A security mechanism where both parties verify each other's identity, enhancing security.
A technique that removes unimportant connections or neurons in neural networks, reducing model size
The task of identifying and classifying named entities from text.
A virtual cluster in Kubernetes for resource isolation, enabling multi-tenancy and resource quota
A state in game theory where no player wants to unilaterally change strategy.
A technique for automatically searching for optimal neural network architectures.
A network or circuit of neurons, or in a modern sense, an artificial neural network, composed of
Cambricon's AI software stack, including development tools, runtime, and drivers.
A subfield of linguistics, computer science, and artificial intelligence concerned with the
Core metric guiding long-term growth and alignment.
Non-Uniform Memory Access, a computer architecture where memory access speed depends on the memory
NVLink, a high-speed serial communication interface used to connect GPUs.
An open standard for access delegation, commonly used as a way for Internet users to grant websites
Open Container Initiative, an open governance structure for the express purpose of creating open
The original implementation of the BM25 algorithm, widely used in information retrieval systems.
Analytical processing for reporting and data warehousing.
Transactional processing focused on low latency and consistency.
On-call rotation to ensure rapid incident response.
A neural architecture search method that trains once to adapt to multiple deployment scenarios.
Out of Memory, an error that occurs when a program runs out of memory.
General policy engine using Rego for access/compliance rules.
An open standard for observability data collection, unifying the collection of traces, metrics, and
A controller for encapsulating and managing application operational knowledge in Kubernetes,
Orca, an optimizer for large-scale distributed training.
The automated configuration, coordination, and management of computer systems and software.
Organization maintaining the Open Source Definition and licenses.
A phenomenon where a model performs well on the training set but has poor generalization ability,
Allocating more resources than currently needed in advance to meet burst demands or ensure high
A situation where the total allocated resources exceed the physically available resources, commonly
The ability to understand the internal state of a system through its external outputs, including
Two-phase commit protocol for distributed transactions.
When partitioned choose C/A; otherwise choose consistency/latency.
A technique that improves the efficiency of attention mechanisms by using a paging mechanism.
The task of tagging the part of speech for each word in text.
The container responsible for sharing the network namespace in a Pod, also known as the sandbox
Classic consensus algorithm for agreement over unreliable networks.
Time required to recover acquisition cost.
A high-speed serial computer expansion bus standard.
Limits allowable pod disruptions to protect availability.
Parameter-efficient fine-tuning with fewer trainable weights.
Open source license allowing proprietary redistribution.
Product-led growth driven by self-serve product experience.
Degree of product-market fit and its signals.
A mechanism that controls the number of simultaneous Pod interruptions, guaranteeing minimum
A series of audio programs published online and typically consumed via subscription.
Customers focused on reliability and proven value.
The foundational phase of training a model on a large-scale dataset, learning general knowledge.
An open-source monitoring and alerting system that uses a pull model to collect time-series data.
The input provided to a model to generate a response.
The process of structuring text that can be interpreted and understood by a generative AI model.
Optimizes prompt vectors while keeping model weights frozen.
Prompt Operations.
A technique that removes unimportant parameters or neurons from the model.
A technique that adds positional information to each position in a sequence, enabling the model to
A method that only fine-tunes a small number of model parameters, dramatically reducing training
Cloud computing services that provide environments for application development and deployment.
Quality of Service, a metric used to describe the performance and reliability of a system.
A technique that reduces model precision (such as FP32 to INT8) to decrease computational load and
The step of analyzing query intent and semantics, improving retrieval accuracy.
The task of answering questions based on given context.
Read/write succeeds with a majority to ensure consistency.
Consensus algorithm for log replication and state machine consistency.
A method that retrieves external knowledge and combines it with generation to improve accuracy and
A strategy for limiting network traffic.
Role-Based Access Control, a permission management system that defines user permissions through
Remote Direct Memory Access, a direct memory access technique that bypasses the operating system
Reasoning + Acting, an agent framework that combines reasoning and action.
A health check that detects whether a container is ready to serve requests, removing it from the
A document chunking method that recursively splits at the paragraph, sentence, and word levels.
A self-reflection mechanism that enables agents to learn from failures.
A machine learning method that trains agents through trial and error to maximize rewards.
A Kubernetes controller that maintains a set of running Pod replicas, ensuring a specified number
A technique that performs secondary sorting on initial retrieval results to improve relevance.
A machine learning method based on rewarding desired behaviors and/or punishing undesired ones.
Alignment via reinforcement learning from AI feedback.
A technique that trains a reward model directly from human feedback and uses the model to optimize
A class of artificial neural networks where connections between nodes can create a cycle, allowing
Rules file controlling crawler access.
Strong and healthy; vigorous.
The ability of a system to maintain function and performance under disturbances or input variation.
AMD's open GPU computing platform, providing a CUDA-like development experience and supporting AMD
An update strategy that gradually replaces old version Pods, achieving zero-downtime deployment.
A knowledge graph embedding method that models relations as rotations in complex space.
A syndication format for subscribing to and aggregating website updates.
A technique combining information retrieval and generative models to improve the accuracy and
A connection method that skips certain layers, helping gradients propagate better through deep
Policies that limit resource usage in namespaces, including quotas for CPU, memory, storage, and
Policies or mechanisms that set upper bounds on resource usage.
A direct memory access technique that bypasses the operating system kernel, reducing network
Long transactions split into local steps with compensations.
A heatmap showing the importance of each part of the input image to the model output.
Bill of materials describing components for security/compliance.
Structured data markup enabling rich search results.
Specification-Driven Development.
Protocols where multiple parties jointly compute a function without revealing their inputs.
The core mechanism in Transformer that computes relationships between elements within a sequence.
A chunking strategy that splits documents based on semantic boundaries, maintaining semantic
The task of identifying text sentiment polarity, such as positive, negative, neutral.
Search engine results page; optimized for visibility and clicks.
A cloud computing execution model in which the cloud provider runs the server, and dynamically
A mechanism for identifying microservice identities, used for authentication and authorization
A dedicated infrastructure layer for handling service-to-service communication.
SHapley Additive exPlanations, a model interpretation method.
A design pattern where a helper container runs alongside the main application container in the same
Index file helping crawlers discover and update pages.
Service Level Agreement, a formal agreement between service providers and customers.
Service level indicator that quantifies performance.
A document chunking strategy that maintains overlap between adjacent chunks.
Service Level Objective, defining specific targets for service performance.
Streaming Multiprocessor, a type of GPU core.
A self-executing contract with the terms of the agreement between buyer and seller being directly
Standard for software license and component identifiers.
Small model proposes tokens; large model verifies.
Service identity standard for workload authentication.
Standards for providing identities in dynamic environments, with SPIRE being the implementation of
SPIFFE runtime for issuing and rotating identities.
Cryptographic protocols designed to provide communications security over a computer network.
A text-to-image generation model based on diffusion models.
A Kubernetes workload resource used to manage stateful applications, providing stable identities
Applications that do not save any session state and can scale instances up or down at any time.
A service mesh architecture that does not require deploying proxies next to each application, such
Applications that need to maintain state data, such as databases, where each instance has a unique
The mechanism for automatically detecting and locating available service instances on a network.
A formal agreement between service providers and customers, defining service quality and
A search method based on semantic understanding rather than keyword matching.
A cloud computing service model that provides software applications over the internet.
A design pattern that deploys auxiliary functions alongside the main application, commonly used in
Adoption model from innovators to laggards.
The technology for remotely collecting and transmitting data, used for system monitoring and
A graph structure where nodes and edges change over time.
A multidimensional array, the basic data structure for AI computing, used to represent data and
The task of assigning text to predefined categories.
The task of automatically generating text content.
The task of generating a brief summary from a long text.
The task of generating images based on text descriptions.
A model task type where both input and output are text.
Term Frequency-Inverse Document Frequency, a metric measuring the importance of terms in documents.
Trillion Floating Point Operations Per Second, a metric used to measure computational power.
A technique for GPU sharing through time-slicing, where different processes use the GPU in
Manual repetitive ops work SREs aim to reduce.
The basic unit of text processed by large language models, which can be words, subwords, or
Splits text into tokens, affecting context length and cost.
The ability of agents to perform external operations, expanding the functional boundaries of AI.
Trillion Operations Per Second, a metric for measuring AI accelerator performance, indicating the
Hyper-growth phase focused on scale and market share.
Time Per Output Token, the time interval per token during generation, a metric measuring generation
Tensor Processing Unit, a specialized hardware accelerator developed by Google for machine learning.
A simple knowledge graph embedding method that treats relations as translation vectors.
A deep learning model that adopts the mechanism of self-attention, differentially weighting the
A reasoning method that extends chain of thought into a tree structure, exploring multiple possible
Time To First Token, a metric measuring inference response speed, representing the time from
The technology of converting text to speech.
The basic unit of CPU scheduling, the time period during which each process takes turns using the
The process of adjusting model parameters using a dataset, enabling the model to learn patterns in
A phenomenon where a model fails to fully learn the features of the training data, usually caused
Vector timestamps for causality tracking and conflict detection.
A database that indexes and stores vector embeddings for fast retrieval and similarity search.
A mechanism that automatically adjusts Pod resource requests, optimizing resource utilization.
GPU virtualization technology that divides a physical GPU into multiple virtual GPUs for use by
A coding style emphasizing atmosphere and flow.
Ratio of new users generated per existing user.
An Istio resource that defines traffic routing rules, implementing traffic management features such
A model that applies Transformer architecture to computer vision tasks.
Customers seeking disruptive innovation and advantage.
Adjusts pod resource requests/limits to optimize usage.
The process of converting data into vector representations, used in machine learning and
The technology of creating virtual versions of computer system resources.
A binary instruction format that can run in browsers, providing near-native performance.
A proxy component in Istio Ambient mode that handles L7 traffic management and policy enforcement.
A method of augmenting or altering the behavior of a web page or web application with custom
A technique that shares the same parameters across different parts of the model, reducing
Complete solution including delivery, services, and complements.
A regularization technique that adds weight norms to the loss function, preventing overfitting.
A digital certificate standard for service authentication, defining the format and distribution of
Baidu's AI chip architecture, specifically designed for deep learning training and inference.
A network security model that does not trust any user or device by default, requiring verification
The ability to complete new tasks without any samples.
A game scenario where one player's gain equals another player's loss.
The tunnel proxy in Istio Ambient mode, responsible for L4 traffic forwarding and mTLS encryption.
All terms on this page for consistent writing and translation.