Read: From using AI to building AI systems, a defining note on what I’m exploring.

PatrickStar

A framework that enables parallel training of large pretrained models via chunk-based dynamic memory management.

Tencent · Since 2021-04-02
Loading score...

Detailed Introduction

PatrickStar is a PyTorch-based parallel training framework for large pretrained models (PTMs). It uses a chunk-based dynamic memory management and heterogeneous training strategy to offload non-critical data to CPU memory, allowing training of larger models with fewer GPUs and reducing OOM risk. Maintained by Tencent’s NLP / WeChat AI teams, PatrickStar aims to democratize access to large-scale model training.

Main Features

  • Chunk-based dynamic memory scheduling: manage activations and parameters by computation windows to lower GPU memory usage.
  • Heterogeneous offloading: move non-immediate data to CPU to support mixed CPU/GPU memory usage.
  • Efficient communication and scalability: optimized collective operations for multi-GPU and multi-node setups.
  • PyTorch compatibility: configuration style similar to DeepSpeed for easier migration.

Use Cases

Suitable for pretraining and large-scale fine-tuning, especially when hardware is constrained and teams need to train models in the tens to hundreds of billions of parameters. Also useful for benchmarking, framework research, and academic or research teaching environments.

Technical Features

PatrickStar implements chunk-based memory management and runtime dynamic scheduling to keep only the chunks needed for current computation while asynchronously migrating others. It optimizes collective communication for multi-card efficiency and provides benchmarks and examples for V100/A100 clusters. The project is released under BSD-3-Clause.

PatrickStar
Score Breakdown
🏋️ Training ⚡ Optimization 🏗️ Framework 📁 Project 🖥️ ML Platform