The practical, end-to-end course that teaches you how RL is used to fine-tune LLMs, through clear explanations and hands-on labs.

 

 

Join us to learn how to:

  • Explain the full LLM training stack and where RL fits
  • Choose between PPO/DPO/GRPO/GSPO for real use cases
  • Design reward signals (including verifiable rewards)
  • Build a simple tool-using agent and fine-tune it with RL
  • Debug RL fine-tuning with practical diagnostics

 

 

Join our waitlist and be the first to know when we open admissions

 

A course for:

  • ML / LLM engineers and applied scientists who want practical RL-for-LLMs skills
  • Data scientists who can code but want to understand and implement RLHF-style training
  • Builders working on agents, tool use, reasoning, or reliability improvements 

You should be comfortable with: Python + basic deep learning concepts.

 

Program Overview

Live virtual sessions

Learn directly from five well-known instructors who teach clearly and build things that work

 

Access to labs 

and a learning platform resources available for you to learn on your own schedule

 

Community 

You won’t be learning alone. Expect structured support and a place to ask questions as you go.

You’ll also get access to:

  • Step-by-step coding labs (including an agent you’ll build and improve)
  • Templates, notebooks, and reference implementations
  • Recommended readings + “cheat sheets” for key methods
  • Replays (so you can review anything you missed)

Course curriculum 

 

Module 1: The Essential Concepts of Reinforcement Learning

with Josh Starmer

A clean foundation: environments, rewards, policies, and how RL differs from supervised learning. You’ll see RL in action and code a simple example to make optimal decisions under uncertainty.

 

Module 2: The Evolution of RL for LLMs: RLHF → Verifiable Rewards

with Maarten Grootendorst

Understand the modern landscape: PPO, DPO, GRPO, GSPO—what they are, why they exist, and when to use which. Also: rewards, reward models, and where multimodal RL fits.

 

Module 3: GRPO Deep Dive

with Luis Serrano

A focused, intuitive deep dive into GRPO—what’s happening under the hood, why it works well for verifiable rewards, and what tradeoffs to watch.

 

Module 4: Labs 1–2: Setting up the Calculator Agent

with Chris McCormick

You’ll build the full scaffold: prompts, tool interface, evaluation loop, and a reward signal that can be verified.

 

Module 5: Lab 3 for Fine-tuning the Calculator with RL

with Jay Alammar

Put it all together: fine-tune with RL, analyze what changed, and learn how to debug training when results aren’t improving.

 

BONUS Module: Join us for an "Ask Us Anything" session. Whether you have additional technical or career questions, the full team will be available for you. 

Join our waitlist 

And be among the first to find out when we open admissions for our course. Don't miss out! 

by signing up you'll receive details about our upcoming course along with valuable updates and invitations from ragpack.ai. You can unsubscribe at any time.

RAGPACK.ai

[email protected] 

All rights reserved, 2026