DeepFabric

DeepFabric is a framework for generating high-quality training datasets and exporting multiple formats to train agentic small language models.

Author: Luke Hinds

Since: 2024-10-25

Visit Website GitHub

Overview

DeepFabric is designed to streamline dataset generation and fine-tuning pipelines for training small language models as capable agents. By combining hierarchical topic generation, structured reasoning templates, and multi-format exporters, DeepFabric helps engineers and researchers produce model-ready datasets that support tool-calling and multi-step reasoning.

Key Features

Hierarchical topic generation for broad domain coverage;
Multi-format exporters (TRL, XLAM, GRPO, etc.) to avoid conversion overhead;
Tool-calling support with function-schema examples to train function-invoking models;
Built-in quality controls such as deduplication and schema validation;
Multi-provider compatibility (OpenAI, Anthropic, Google, Ollama).

Use Cases

Training agentic chatbots and assistants with tool integration;
Distilling complex decision-making into smaller local models for cost efficiency;
Rapid dataset generation for research experiments and reproducible pipelines;
Building private, auditable agent training workflows within enterprise infrastructure.

Technical Highlights

Structured Chain-of-Thought traces enforced with Pydantic and Outlines;
Plug-in formatter engine to support custom output formats;
Direct integration with popular training toolchains (TRL, Unsloth, Axolotl);
Opinionated quality checks to improve dataset reliability.

DeepFabric

Overview

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

OpenMCP Client

Swarms

karpathy