ZML

A high-performance inference and compilation stack designed for production deployments across diverse hardware and platforms.

Author: ZML

Added Date: 2025-09-30

Open Source Since: 2024-09-17

Visit Website GitHub

Overview

ZML is a production-oriented, high-performance inference and compilation stack built with Zig, MLIR, and Bazel. It targets efficient execution across heterogeneous hardware (NVIDIA, AMD, TPU, etc.) and provides examples, tooling, and documentation for integration in both research and engineering contexts.

Key Features

High-performance runtime with support and optimizations for multiple accelerators (CUDA, ROCm, TPU).
Portable builds through Bazel, enabling cross-compilation and reproducible deployments.
Comprehensive examples and tooling, including example models and benchmarking suites.

Use Cases

Deploying high-throughput inference services in production environments.
Compiling and benchmarking models across heterogeneous accelerator fleets.
Research on high-performance inference and cross-device collaborative execution.

Technical Details

Core components implemented in Zig for low overhead and portability.
Integrates MLIR/OpenXLA toolchains for compilation and multi-backend targeting.
Uses Bazel to provide reproducible builds and manage complex dependencies.

ZML

Overview

Key Features

Use Cases

Technical Details

Resource Info

Related Resources

OpenEnv — Agentic Execution Environments

PyTorch Lightning

Machine Learning Engineering