Detailed Introduction
Below is a time-traveling resource monitor for modern Linux systems designed to view and record historical system data. It supports live observation (live), continuous recording (record), and replaying historical snapshots (replay) for post-mortem analysis of performance events, resource usage trends, and cgroup hierarchies. Below exports scriptable outputs suitable for integration with monitoring stacks such as Prometheus and Grafana.
Main Features
- Multiple operation modes:
live(real-time viewing),record(persistent collection),replay(playback of historical data), anddump(script-friendly exports like JSON/CSV/OpenMetrics). - Records process and cgroup-level metrics, including Pressure Stall Information (PSI).
- Snapshot and replay functionality enables reproducible offline analysis and debugging.
- Implemented in Rust for performance and reliability; available as distribution packages and Docker images.
Use Cases
- Incident investigation: replay recorded data to locate transient or intermittent issues.
- Performance regression: compare replayed runs across time to identify regressions.
- Cluster and container monitoring: complement Prometheus/Grafana with event replay capabilities.
- Automated testing: capture snapshots for CI-based replay and validation.
Technical Features
- Written primarily in Rust with a focus on low overhead and robustness.
- Scriptable export formats include JSON, CSV, and OpenMetrics for downstream processing.
- Targets modern cgroup interfaces and kernel metrics (note: no support for cgroup v1).
- Distributed via Docker images and packaged for common Linux distributions for easy deployment and integration.