How to fine-tune LLMs for with…

Unlock the full potential of your large language models with Tunix, an innovative open-source JAX-based library for post-training. This video explains the two-stage LLM training process, focusing on how Tunix excels in the post-training phase to instill strong reasoning capabilities. See a practical example of using Tunix with reinforcement learning to improve math problem-solving, leveraging its efficiency on accelerators like Google TPUs. Improve your LLM performance with this powerful tool. Resources: GitHub for Tunix → Tunix GRPO example → Additional examples → DeepSeekMath(GRPO) paper → Chapters: 0:00 - Introduction to Tunix 0:17 - Understanding LLM training stages 0:35 - Tunix: A JAX-based LLM post-training library 0:50 - Exploring Tunix's capabilities and supported models 1:05 - Reinforcement learning for LLMs overview 1:25 - RLVR for math reasoning demo (GSM8K dataset) 1:50 - Setting up and training with GRPO 2:05 - Tunix performance results and benefits 2:20 - Getting involved with Tunix Subscribe to Google for Developers → Speaker: Wei Wei Products Mentioned: Google AI

How to fine-tune LLMs for with Tunix

Похожее видео