Introduction
Volcano is a Kubernetes-native batch scheduling system that extends the capabilities of kube-scheduler to support batch jobs, elastic training, and high-performance computing (HPC) scenarios. It offers a rich set of scheduling policies and a plugin ecosystem for large-scale AI/ML and big data job scheduling, enabling efficient utilization of cluster resources.
Key Features
- Comprehensive scheduling strategies and a pluggable design, supporting topology awareness, priority, preemption, and more.
- Seamless integration with frameworks such as Spark, Flink, MPI, and Horovod.
- Supports one-click installation via Helm and quick deployment with YAML.
Use Cases
- Unified scheduling for large-scale offline training and batch processing jobs.
- Improved GPU/CPU resource utilization and reduced fragmentation.
- Integration with cloud providers or in-house platforms as a custom scheduler.
Technical Highlights
- Built on Kubernetes CRDs and controllers, fully compatible with the cloud-native ecosystem.
- Production-ready design with high availability and scalability.