VibeThinker

An open-source model project that improves small-model reasoning via multi-stage distillation and optimization.

Author: WeiboAI

Since: 2025-11-04

Detailed Introduction

VibeThinker is an open-source small-scale reasoning model family from WeiboAI. VibeThinker-1.5B employs a post-training approach called the “Spectrum-to-Signal Principle (SSP)” to achieve strong reasoning ability at a 1.5B parameter scale. The project uses a two-stage pipeline—diversity-exploring distillation in SFT and MaxEnt-Guided Policy Optimization (MGPO) in RL—to improve signal strength for correct solutions, achieving competitive results on mathematical and coding benchmarks while keeping resource costs low.

Main Features

Parameter-efficient: strong benchmark performance with only 1.5B parameters.
Multi-stage training: diversity-exploring distillation combined with MGPO to amplify correct-solution signals.
Reproducible and open: model weights and technical report are publicly available for community validation and downstream work.

Use Cases

Research and evaluation for competitive mathematical problems and complex reasoning tasks.
Verifying reasoning capability in coding and code-generation scenarios.
Deploying inference in resource-constrained environments and fast iteration for research.

Technical Features

Designed with Large Language Model (LLM, 大语言模型) principles but optimized post-training to significantly enhance reasoning ability in a small model.
Two-stage diversity-focused distillation to generate a broad solution spectrum, followed by entropy-driven policy optimization to amplify correct answers.
Provides model downloads and evaluation toolchains via Hugging Face and ModelScope; supports standard inference stacks such as Transformers and vLLM.

VibeThinker

Detailed Introduction

Main Features

Use Cases

Technical Features

Resource Info

Related Resources

Pixeltable

CoTyle

TOON