Overview
DeepFabric is designed to streamline dataset generation and fine-tuning pipelines for training small language models as capable agents. By combining hierarchical topic generation, structured reasoning templates, and multi-format exporters, DeepFabric helps engineers and researchers produce model-ready datasets that support tool-calling and multi-step reasoning.
Key Features
- Hierarchical topic generation for broad domain coverage;
- Multi-format exporters (TRL, XLAM, GRPO, etc.) to avoid conversion overhead;
- Tool-calling support with function-schema examples to train function-invoking models;
- Built-in quality controls such as deduplication and schema validation;
- Multi-provider compatibility (OpenAI, Anthropic, Google, Ollama).
Use Cases
- Training agentic chatbots and assistants with tool integration;
- Distilling complex decision-making into smaller local models for cost efficiency;
- Rapid dataset generation for research experiments and reproducible pipelines;
- Building private, auditable agent training workflows within enterprise infrastructure.
Technical Highlights
- Structured Chain-of-Thought traces enforced with Pydantic and Outlines;
- Plug-in formatter engine to support custom output formats;
- Direct integration with popular training toolchains (TRL, Unsloth, Axolotl);
- Opinionated quality checks to improve dataset reliability.