verl

A reinforcement learning training framework for large models, designed for scalable RLHF and agent training.

Author: ByteDance

Since: 2024-10-31

Introduction

verl is a reinforcement learning (RL) training framework for large models, offering high-performance RLHF/agent training pipelines and supporting distributed backends such as FSDP and Megatron.

Key Features

Supports multiple RL algorithms and training recipes, including PPO, GRPO, and DAPO
Integrates with inference/model ecosystems like vLLM, SGLang, and Hugging Face
Scalable implementation for large-scale multi-GPU and expert parallelism

Use Cases

Training alignment models (RLHF) and agents based on LLMs
Research and reproduction of RL training recipes and baselines
Model performance and throughput optimization on large clusters

Technical Highlights

Supports FSDP/FSDP2, Megatron, vLLM backends, and hybrid parallel strategies
Extensible recipes and modular training pipelines
Rich examples, documentation, and community contributions, suitable for production adaptation

verl

Introduction

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

Trae Agent

MineContext

Eino