Polyaxon is an MLOps platform designed to help teams reproduce, automate and scale machine learning workloads.
Key features
- Job orchestration and scheduling: container-native DAG/workflow engine supporting parallel and distributed training.
- Experiment tracking and comparison: centralized logging of metrics and resource usage with dashboards and comparison views.
- Automation and hyperparameter tuning: built-in grid search, random search, Hyperband and Bayesian optimization.
Use cases
- Large-scale distributed training and hyperparameter optimization.
- CI/CD driven training pipelines and reproducible experiments.
- Multi-tenant resource sharing and team-level experiment management.
Technical notes
- Flexible deployment: self-hosted (Kubernetes/Helm), cloud-hosted or Polyaxon-managed services.
- CLI and SDK:
polyaxon
CLI, polyaxonfile configurations and SDKs for integration and automation. - Modular architecture: submodules and plugins (e.g., hypertune, traceml) to extend functionality.