Overview
ZML is a production-oriented, high-performance inference and compilation stack built with Zig, MLIR, and Bazel. It targets efficient execution across heterogeneous hardware (NVIDIA, AMD, TPU, etc.) and provides examples, tooling, and documentation for integration in both research and engineering contexts.
Key Features
- High-performance runtime with support and optimizations for multiple accelerators (CUDA, ROCm, TPU).
- Portable builds through Bazel, enabling cross-compilation and reproducible deployments.
- Comprehensive examples and tooling, including example models and benchmarking suites.
Use Cases
- Deploying high-throughput inference services in production environments.
- Compiling and benchmarking models across heterogeneous accelerator fleets.
- Research on high-performance inference and cross-device collaborative execution.
Technical Details
- Core components implemented in Zig for low overhead and portability.
- Integrates MLIR/OpenXLA toolchains for compilation and multi-backend targeting.
- Uses Bazel to provide reproducible builds and manage complex dependencies.