Overview
nanochat is an open-source full-stack project by Andrej Karpathy that implements a compact, readable pipeline to train, fine-tune, evaluate, and serve a ChatGPT-like conversational model. It aims to enable reproducible experiments and interactive demos on constrained budgets, using scripts like speedrun.sh
to run a complete training-and-inference loop.
Key Features
- End-to-end pipeline covering tokenization, pretraining, finetuning, evaluation, and a simple web UI for conversational interaction.
- Reproducible “speedrun” scripts to quickly bootstrap training runs and produce evaluation report cards.
- Minimal, hackable codebase that is easy to read and modify for experiments or teaching purposes.
Use Cases
- Research and teaching: serves as a practical baseline for LLM courses and small-scale experiments.
- Rapid prototyping: build and test conversational models on limited budgets.
- Reproducible benchmarking: generate report cards to document training outcomes and baselines.
Technical Highlights
- PyTorch-based training with support for multi-GPU and single-GPU execution modes.
- Includes a Rust-based BPE tokenizer and lightweight dependencies for easier deployment.
- Configurable model depth and batch parameters to scale training cost and capability.