Overview
Transformer Engine provides optimized kernels and FP8/mixed-precision support to accelerate Transformer training and inference on NVIDIA hardware.
Key features
- FP8 convergence recipes and optimized kernels.
- Integrations with PyTorch and other frameworks.
Use cases
- High-performance Transformer training and inference.
Technical highlights
- Low-level kernel optimizations targeting NVIDIA accelerators.