Back to Repositories

huggingface/open-r1

Open R1 is Hugging Face's fully open reproduction of DeepSeek-R1, the 671-billion-parameter Mixture-of-Experts reasoning model that matches OpenAI o1 on math, code, and reasoning benchmarks. While DeepSeek released the model weights under an MIT license, the datasets and training code remained proprietary -- Open R1 fills that gap by reconstructing the entire pipeline so anyone can reproduce, study, and extend state-of-the-art reasoning capabilities. The project ships three core scripts: grpo.py for Group Relative Policy Optimization reinforcement learning, sft.py for supervised fine-tuning, and generate.py for synthetic data generation via Distilabel. Key dataset releases include OpenR1-Math-220k (220,000 math reasoning traces), CodeForces-CoTs (10,000 competitive programming problems with 100,000 chain-of-thought solutions), and the Mixture-of-Thoughts dataset (350,000 verified reasoning traces spanning mathematics, coding, and science distilled from DeepSeek-R1). A 7B Qwen model trained on the CodeForces-CoTs dataset outperforms Claude 3.5 Sonnet on the IOI 2024 benchmark, demonstrating that small models trained on high-quality reasoning data can punch far above their weight. The three-step roadmap progresses from distilling R1's reasoning into compact models, to replicating the pure RL pipeline behind R1-Zero, to demonstrating full base-model-to-SFT-to-RL multi-stage training. With 25.9k GitHub stars and active community contributions, Open R1 has become the central hub for open reasoning model research.

AI Framework
Python

Why It Matters

DeepSeek-R1 proved that reinforcement learning alone can teach language models to reason step-by-step, self-verify, and solve complex problems -- but without open datasets and training code, the broader community could not replicate or build on this breakthrough. Open R1 changes that by providing every missing piece: curated reasoning datasets, training scripts, and reproducible recipes that let any research lab or developer train their own reasoning models from scratch. This democratization is critical because reasoning capability is the key differentiator in the current generation of frontier models, and concentrating that knowledge in a few closed labs limits progress. By releasing 350,000 verified reasoning traces and proving that a 7B model can beat much larger proprietary models on competitive programming, Open R1 shows that high-quality open data matters more than raw scale, lowering the barrier to entry for the entire field.

Repository Stats

Stars
25.9k
Forks
2.4k
Last Commit
11/24/2025

Category

Related Resources

Weekly AI Digest