A curated list of AI tools and resources for developers, see the AI Resources .

ZML

A high-performance inference and compilation stack designed for production deployments across diverse hardware and platforms.

Overview

ZML is a production-oriented, high-performance inference and compilation stack built with Zig, MLIR, and Bazel. It targets efficient execution across heterogeneous hardware (NVIDIA, AMD, TPU, etc.) and provides examples, tooling, and documentation for integration in both research and engineering contexts.

Key Features

  • High-performance runtime with support and optimizations for multiple accelerators (CUDA, ROCm, TPU).
  • Portable builds through Bazel, enabling cross-compilation and reproducible deployments.
  • Comprehensive examples and tooling, including example models and benchmarking suites.

Use Cases

  • Deploying high-throughput inference services in production environments.
  • Compiling and benchmarking models across heterogeneous accelerator fleets.
  • Research on high-performance inference and cross-device collaborative execution.

Technical Details

  • Core components implemented in Zig for low overhead and portability.
  • Integrates MLIR/OpenXLA toolchains for compilation and multi-backend targeting.
  • Uses Bazel to provide reproducible builds and manage complex dependencies.

Comments

ZML
Resource Info
🌱 Open Source 🔮 Inference 🖥️ ML Platform