EPLB

Expert Parallelism Load Balancer for dynamically distributing expert requests and compute load in expert-parallel training to improve cluster utilization and performance.

Author: DeepSeek

Added Date: 2025-10-06

Open Source Since: 2025-02-26

GitHub

Overview

EPLB is an Expert Parallelism Load Balancer that dynamically distributes expert requests and compute load during training to improve resource utilization and reduce hotspot pressure. It helps maintain stable throughput under imbalanced loads.

Key Features

Dynamic load distribution strategies to mitigate imbalance in expert-parallel setups.
Lightweight Python implementation for rapid integration and experimentation.
Designed to pair with existing expert-parallel training frameworks as a performance baseline and tuning tool.

Use Cases

Load scheduling and balancing when expert-parallel training shows hotspots or imbalance.
Improving overall throughput and reducing single-node bottlenecks in multi-node/multi-GPU clusters.
Experimental platform for research and engineering teams evaluating load balancing strategies.

Technical Details

Policy-driven dynamic scheduling with configurable load distribution strategies.
Python-based implementation for quick iteration and integration into training pipelines.
Runtime lightness and low overhead to avoid significant scheduling cost.

EPLB

Overview

Key Features

Use Cases

Technical Details

Resource Info

Related Resources

DeepSeek-OCR

FlashMLA

DualPipe