A curated list of AI tools and resources for developers, see the AI Resources .

VibeThinker

An open-source model project that improves small-model reasoning via multi-stage distillation and optimization.

Detailed Introduction

VibeThinker is an open-source small-scale reasoning model family from WeiboAI. VibeThinker-1.5B employs a post-training approach called the “Spectrum-to-Signal Principle (SSP)” to achieve strong reasoning ability at a 1.5B parameter scale. The project uses a two-stage pipeline—diversity-exploring distillation in SFT and MaxEnt-Guided Policy Optimization (MGPO) in RL—to improve signal strength for correct solutions, achieving competitive results on mathematical and coding benchmarks while keeping resource costs low.

Main Features

  • Parameter-efficient: strong benchmark performance with only 1.5B parameters.
  • Multi-stage training: diversity-exploring distillation combined with MGPO to amplify correct-solution signals.
  • Reproducible and open: model weights and technical report are publicly available for community validation and downstream work.

Use Cases

  • Research and evaluation for competitive mathematical problems and complex reasoning tasks.
  • Verifying reasoning capability in coding and code-generation scenarios.
  • Deploying inference in resource-constrained environments and fast iteration for research.

Technical Features

  • Designed with Large Language Model (LLM, 大语言模型) principles but optimized post-training to significantly enhance reasoning ability in a small model.
  • Two-stage diversity-focused distillation to generate a broad solution spectrum, followed by entropy-driven policy optimization to amplify correct answers.
  • Provides model downloads and evaluation toolchains via Hugging Face and ModelScope; supports standard inference stacks such as Transformers and vLLM.
VibeThinker
Resource Info
🧬 LLM 🏗️ Model 🌱 Open Source