Detailed Introduction
CellARC is an open-source toolkit for dataset generation and loading of ARC-style cellular-automaton episodes. The project publishes dataset snapshots on the Hugging Face Hub (for example mireklzicar/cellarc_100k) and provides convenient APIs such as EpisodeDataset and EpisodeDataLoader to download, cache, and batch episodes. Built-in visualization helpers make it easy to replay CA rollouts and inspect episode cards.
Main Features
- Dataset snapshots: ready-to-use 100k dataset with fixed 100-episode subsets for fast iteration.
- Simulation & visualization: CA rollouts and episode card rendering for debugging and analysis.
- Optional generation stack: install
cellarc[all]to enable JAX/Flax/CAX-based generation and advanced sampling tools.
Use Cases
- ML research: benchmark tasks for models on structured reasoning and CA dynamics.
- Teaching and reproducibility: classroom examples and baseline experiments with easy dataset access.
- Data analysis: tools for studying rule-space coverage, episode difficulty, and dataset statistics.
Technical Features
- Lightweight Python API:
EpisodeDataset.from_huggingfaceandEpisodeDataLoadersupport on-demand downloads and caching for integration with training loops. - Flexible storage: JSONL and Parquet artifacts with
data_files.jsonanddataset_stats.jsonfor quick split enumeration. - Packaging & compatibility: published as a PyPI package; full generation/simulation features require Python 3.11+ and extra dependencies.