Detailed Introduction
VibeThinker is an open-source small-scale reasoning model family from WeiboAI. VibeThinker-1.5B employs a post-training approach called the “Spectrum-to-Signal Principle (SSP)” to achieve strong reasoning ability at a 1.5B parameter scale. The project uses a two-stage pipeline—diversity-exploring distillation in SFT and MaxEnt-Guided Policy Optimization (MGPO) in RL—to improve signal strength for correct solutions, achieving competitive results on mathematical and coding benchmarks while keeping resource costs low.
Main Features
- Parameter-efficient: strong benchmark performance with only 1.5B parameters.
- Multi-stage training: diversity-exploring distillation combined with MGPO to amplify correct-solution signals.
- Reproducible and open: model weights and technical report are publicly available for community validation and downstream work.
Use Cases
- Research and evaluation for competitive mathematical problems and complex reasoning tasks.
- Verifying reasoning capability in coding and code-generation scenarios.
- Deploying inference in resource-constrained environments and fast iteration for research.
Technical Features
- Designed with Large Language Model (LLM, 大语言模型) principles but optimized post-training to significantly enhance reasoning ability in a small model.
- Two-stage diversity-focused distillation to generate a broad solution spectrum, followed by entropy-driven policy optimization to amplify correct answers.
- Provides model downloads and evaluation toolchains via Hugging Face and ModelScope; supports standard inference stacks such as Transformers and vLLM.