Petals

Run large language models at home or in a distributed swarm for collaborative inference and fine-tuning.

Author: bigscience-workshop

Added Date: 2025-09-30

Open Source Since: 2022-06-12

Overview

Petals is a community-run system that enables distributed inference and fine-tuning of large language models by splitting model layers across multiple machines (BitTorrent-style). It supports models from the Hugging Face Hub and provides tutorials, Docker images, and Colab notebooks for easy experimentation.

Key Features

Distributed inference: split model computation across a network to enable running large models on commodity hardware.
Multi-model support: run Llama 3.1, Falcon, BLOOM and other large pretrained models.
Portable tooling: Docker images, examples, and Colab demos make it easy to get started on Linux, macOS, or WSL.

Use Cases

Interactive chatbots and research experiments on resource-constrained hardware.
Collaborative GPU sharing and fine-tuning among volunteers or private swarms.
Building public or private swarms to host models and improve availability.

Technical Details

Implemented on PyTorch and Hugging Face Transformers, compatible with existing model weights and tooling.
Uses pipeline parallelism and network layer distribution to partition model computation.
Provides Docker support, monitoring tools, and a public swarm health dashboard ( https://health.petals.dev/ ).

Petals

Overview

Key Features

Use Cases

Technical Details

Resource Info

Related Resources

Nano-vLLM

DeepSeek-OCR

LeRobot