A curated list of AI tools and resources for developers, see the AI Resources .

CellARC

An open-source toolkit for generating, publishing, and loading CellARC datasets (cellular-automaton episodes), with Hugging Face snapshots and visualization utilities.

Detailed Introduction

CellARC is an open-source toolkit for dataset generation and loading of ARC-style cellular-automaton episodes. The project publishes dataset snapshots on the Hugging Face Hub (for example mireklzicar/cellarc_100k) and provides convenient APIs such as EpisodeDataset and EpisodeDataLoader to download, cache, and batch episodes. Built-in visualization helpers make it easy to replay CA rollouts and inspect episode cards.

Main Features

  • Dataset snapshots: ready-to-use 100k dataset with fixed 100-episode subsets for fast iteration.
  • Simulation & visualization: CA rollouts and episode card rendering for debugging and analysis.
  • Optional generation stack: install cellarc[all] to enable JAX/Flax/CAX-based generation and advanced sampling tools.

Use Cases

  • ML research: benchmark tasks for models on structured reasoning and CA dynamics.
  • Teaching and reproducibility: classroom examples and baseline experiments with easy dataset access.
  • Data analysis: tools for studying rule-space coverage, episode difficulty, and dataset statistics.

Technical Features

  • Lightweight Python API: EpisodeDataset.from_huggingface and EpisodeDataLoader support on-demand downloads and caching for integration with training loops.
  • Flexible storage: JSONL and Parquet artifacts with data_files.json and dataset_stats.json for quick split enumeration.
  • Packaging & compatibility: published as a PyPI package; full generation/simulation features require Python 3.11+ and extra dependencies.
CellARC
Resource Info
🌱 Open Source 💾 Data 🕹️ Simulator 🏋️ Training