ONNX Runtime

ONNX Runtime is a cross-platform, high-performance machine learning inference and training accelerator that runs models exported from PyTorch, TensorFlow/Keras and traditional ML libraries across diverse hardware.

Author: Microsoft

Since: 2017-05-01

Visit Website GitHub

Overview

ONNX Runtime, maintained by Microsoft, is a cross-platform accelerator for model inference and training. It improves performance through graph optimizations and hardware integrations, enabling efficient execution of models exported from PyTorch, TensorFlow/Keras and classical ML libraries across CPUs, GPUs and other accelerators.

Key features

Cross-platform support and hardware acceleration for various backends.
Graph-level transformations and optimizations for better runtime performance.
Support for both inference and distributed training acceleration.

Use cases

Production model serving with reduced latency and increased throughput.
Heterogeneous hardware deployments to optimize cost and performance.
Large-scale batch inference and preprocessing for ML pipelines.

Technical notes

Native ONNX ecosystem compatibility and extensive deployment examples to simplify integration.

ONNX Runtime

Overview

Key features

Use cases

Technical notes

Resource Info

Related Resources

VSCode Copilot Chat

Call Center AI

Amplifier