A guide to building long-term compounding knowledge infrastructure. See details on GitHub .

Megatron-LM

Reference implementation for large-scale model training and inference with distributed optimizations.

Overview

Megatron-LM is NVIDIA’s reference implementation for training large language models, focusing on GPU-optimized kernels, tensor/pipeline parallelism, and end-to-end training utilities.

Key features

  • Flexible parallelism strategies (tensor, pipeline, context, FSDP).
  • Optimized kernels and mixed-precision support (FP16/BF16/FP8).
  • End-to-end training scripts and examples.

Use cases

  • Research and engineering for training large-scale LLMs.
  • Performance tuning and kernel validation on NVIDIA GPUs.

Technical highlights

  • Built on PyTorch with modular Megatron Core components.
  • Integrates with acceleration libraries like Transformer Engine.

Comments

Megatron-LM
Resource Info
Author NVIDIA
Added Date 2025-10-02
Open Source Since 2019-03-21
Tags
ML Platform Open Source