Turns out reinforcement learning is all you need Check out my prior video on RL: Resources: Code: Model: DeepSeek-R1 Paper: DeepSeek Math Paper: Unsloth Reasoning Blog: Willccbb’s GRPO Demo: Chapters: 00:00 - LLM Reasoning 01:44 - PPO Context 05:07 - GRPO Algorithm 07:24 - DeepSeek-R1-Zero Training 10:41 - DeepSeek-R1 Training 14:41 - Training: Model Loading 19:17 - Training: Dataset Prep 21:24 - Training: Reward Functions 23:11 - Training: GRPO Trainer 24:05 - Training: Outcome and Inference #ai #datascience #programming











