A guide to building long-term compounding knowledge infrastructure. See details on GitHub .

Volcano

Volcano is a Kubernetes-native batch scheduling system (a CNCF project) that enhances kube-scheduler with advanced features for batch, HPC, and AI workloads.

Introduction

Volcano is a Kubernetes-native batch scheduling system that extends the capabilities of kube-scheduler to support batch jobs, elastic training, and high-performance computing (HPC) scenarios. It offers a rich set of scheduling policies and a plugin ecosystem for large-scale AI/ML and big data job scheduling, enabling efficient utilization of cluster resources.

Key Features

  • Comprehensive scheduling strategies and a pluggable design, supporting topology awareness, priority, preemption, and more.
  • Seamless integration with frameworks such as Spark, Flink, MPI, and Horovod.
  • Supports one-click installation via Helm and quick deployment with YAML.

Use Cases

  • Unified scheduling for large-scale offline training and batch processing jobs.
  • Improved GPU/CPU resource utilization and reduced fragmentation.
  • Integration with cloud providers or in-house platforms as a custom scheduler.

Technical Highlights

  • Built on Kubernetes CRDs and controllers, fully compatible with the cloud-native ecosystem.
  • Production-ready design with high availability and scalability.

Comments

Volcano
Resource Info
🛠️ Dev Tools 🎼 Orchestration 🌱 Open Source