CS 224R Deep Reinforcement Learning

Spring 2025, Class: Wed, Fri 10:30am-11:50am @ Hewlett 200

Description:

Humans, animals, and robots faced with the world must make decisions and take actions in the world. Moreover, the decisions they choose affect the world they exist in and those outcomes must be taken into account. This course is about algorithms for deep reinforcement learning – methods for learning behavior from experience, with a focus on practical algorithms that use deep neural networks to learn behavior from high-dimensional observations. Topics will include methods for learning from demonstrations, both model-based and model-free deep RL methods, methods for learning from offline datasets, and more advanced techniques for learning multiple tasks such as goal-conditioned RL, meta-RL, and unsupervised skill discovery. These methods will be instantiated with examples from domains with high-dimensional state and action spaces, such as robotics, visual navigation, and control. This course is complementary to CS234, which neither being a pre-requisite for the other. In comparison to CS234, this course will have a more applied and deep learning focus and an emphasis on use-cases in robotics and language modeling.

Format:

The course will consist of twice weekly lectures, four homework assignments, and a final project. The lectures will cover fundamental topics in deep reinforcement learning, with a focus on methods that are applicable to domains such as robotics and language modeling The assignments will focus on conceptual questions and coding problems that emphasize these fundamentals. Finally, students will present their projects at a poster session and through a final report at the end of the quarter.

Prerequisites:

Machine learning: CS229 or equivalent is a prerequisite. We will be assuming knowledge of concepts including, but not limited to (stochastic) gradient descent and cross-validation, and pre-requisites such as probability theory, multivariable calculus, and basic linear algebra.

Some familiarity with deep learning: The course will build on deep learning concepts such as backpropagation, convolutional networks, and sequence models such as transformers. The assignments will involve programming in PyTorch. The first week will include a short PyTorch review tutorial.

Some familiarity with reinforcement learning: We will assume some familiarity with the basics of reinforcement learning. For introductory material on RL and Markov decision processes (MDPs), see CS221's lectures on MDPs and RL, or see Chapters 3 and 4 of Sutton & Barto.

Previous Offerings

Spring 2023

Staff

Prof. Chelsea Finn

Instructor

OH: Wed 5 – 6pm

Location: Gates 358

John Cho

Course Manager

Swati Batra

Course Advisor

Jubayer Ibn Hamid

Jubayer Ibn Hamid

Head Teaching Assistant

OH: Tues 7 – 9pm

Location: Huang Basement

Anikait Singh

Teaching Assistant

OH: Wed 1 – 3pm

Location: Huang Basement

Annie Chen

Teaching Assistant

OH: Thurs 10am – 12pm

Location: Huang Basement

Sergio Charles

Teaching Assistant

OH: Mon 1 – 3pm

Location: Huang Basement

Haoyi Duan

Teaching Assistant

OH: Fri 3 – 5pm

Location: Huang Basement

Yash Kankariya

Teaching Assistant

OH: Sun 1 – 3pm

Location: Huang Basement

Andy Tang

Teaching Assistant

OH: Tues 1 – 3pm

Location: Huang Basement

Marcel Torne

Teaching Assistant

OH: Thurs 1 – 3 pm

Location: Huang Basement

Fengyu Li

Teaching Assistant

OH: Tues 9 – 11am

Location: Zoom (link on Canvas calendar)

Ashish Rao

Teaching Assistant

OH: Sat 9 – 11am

Location: Huang Basement

Shirley Wu

Teaching Assistant

OH: Sun 7 – 9pm

Location: Zoom (link on Canvas calendar)

Zhen Wu

Teaching Assistant

OH: Mon 3 – 5pm

Location: Zoom (link on Canvas calendar)

Jinny Chung

Teaching Assistant

OH: Mon 9 – 11am

Location: Huang Basement

Sirui Chen

Teaching Assistant

OH: Mon 9 – 11am

Location: Huang Basement

Pulkit Goel

Teaching Assistant

OH: Wed 3 – 5pm

Location: Huang Basement

Joy He Yueya

Teaching Assistant

OH: Fri 1 – 3pm

Location: Huang Basement

Jensen Gao

Teaching Assistant

OH: Sat 3 – 5pm

Location: Zoom (link on Canvas calendar)

Sri Jaladi

Teaching Assistant

OH: Thurs 3 – 5pm

Location: Zoom (link on Canvas calendar)

Daniel Shin

Teaching Assistant

OH: Tues 5 – 7pm

Location: Zoom (link on Canvas calendar)

Timeline

Date	Lecture	Deadlines	Notes & Optional Reading
Week 1 Wed, April 2	Lecture Course intro + MDPs
Week 1 Fri, April 4	Lecture Imitation Learning	Homework 1 Out	Diffusion Policy: Visuomotor Policy Learning via Action Diffusion. Chi et al. (2024)
Week 1 Fri, April 4	TA Session Pytorch Tutorial		Logistics: 1:30 - 2:30 pm in Gates B01
Week 2 Wed, April 9	Lecture Policy Gradients		Simple statistical gradient-following algorithms for connectionist reinforcement learning. Williams (1992)
Week 2 Fri, April 11	Lecture Actor-Critic Methods		Policy Gradient Methods for Reinforcement Learning with Function Approximation. Sutton et al. (1999) Proximal Policy Optimization. Schulman et al. (2017)
Week 3 Wed, April 16	Lecture Q-Learning	Due Project Survey	Playing Atari with Deep Reinforcement Learning. Mnih et al. (2013)
Week 3 Fri, April 18	Lecture Practical Deep RL Implementation Techniques	Due Homework 1 Homework 2 Out	Deep Reinforcement Learning with Double Q-learning. Hasselt et al. (2015) A Distributional Perspective on Reinforcement Learning. Bellemare et al. (2017)
Week 3 Fri, April 18	TA Session Extra section on Q-learning		Logistics: 1:30 - 2:30 pm in Gates B01
Week 4 Wed, April 23	Lecture Offline RL		Conservative Q-Learning for Offline Reinforcement Learning. Kumar et al. (2020) Offline Reinforcement Learning with Implicit Q-Learning Kostrikov et al. (2021)
Week 4 Fri, April 25	Lecture Reward Learning	Due Project Proposal	Deep reinforcement learning from human preferences. Christiano et al. (2017)
Week 5 Wed, April 30	Guest Lecture RL for LLMs: Preference Optimization		Direct Preference Optimization: Your Language Model is Secretly a Reward Model. Rafailov et al. (2023)
Week 5 Fri, May 2	Guest Lecture RL for LLMs: Reasoning	Due Homework 2 Homework 3 Out
Week 6 Wed, May 7	Lecture Model-based RL		When to Trust Your Model: Model-Based Policy Optimization Janner et al. (2019)
Week 6 Fri, May 9	Lecture Multi-Task and Goal-Conditioned RL		Hindsight Experience Replay Andrychowicz et al. (2018) Rakelly et al. (2019)
Week 7 Wed, May 14	Lecture Meta-RL		RL2: Fast Reinforcement Learning via Slow Reinforcement Learning. Duan et al. (2016)
Week 7 Fri, May 16	Lecture Exploration	Due Homework 3 Homework 4 Out	Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices. Liu et al. (2021)
Week 8 Wed, May 21	Lecture Hierarchical RL and IL		Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. Ahn et al. (2022)
Week 8 Fri, May 23	Lecture RL for Robots: Reset-Free RL	Due Milestone
Week 9 Wed, May 28	Guest Lecture RL for Robots: Sim-to-Real Transfer
Week 9 Fri, May 30	Lecture Frontiers	Due Homework 4
Week 10 Wed, June 4	Presentations No Lecture (Poster Session)	Due Poster	The poster session is 10:30 am - 1:30 pm at Burnham Pavilion
Week 11 Mon, June 9		Due Final Project Report

Course Calendar

A course calendar with details of lectures, TA sessions, office hours, and miscellaneous course events is available in a variety of formats:

Grading and Course Policies

Homeworks (50%): There are four graded homework assignments. To provide some flexibility, the lowest scoring homework for each student will be worth 5% of the grade, while the remaining three will be worth 15% of the grade. Assignments will require training neural networks in PyTorch. All assignments are due on Gradescope at 11:59 pm Pacific Time on the respective due date.

Project (50%): There's a research-level project of your choice. You may form groups of 1-3 students to complete the project, and you are encouraged to start early! Project guidelines will be posted in the first week of the class.

Late Days: You have 6 total late days across homeworks, the project proposal, and the project milestone. Late days are not applicable to the project poster (due to its live nature) and the project report (due to the university grading deadline). You may use a maximum of 2 late days for any single assignment. Late days used for group projects apply to all members of the group.

Lecture Attendance: While we do not require lecture attendance, students are encouraged to join the live lecture. If you want to watch the lecture remotely, please see the Panopto tab of Canvas. For those who cannot join the live lectures, lecture recordings will also be available on Canvas shortly following the lecture.

Honor Code: Students are free to form study groups and may discuss homework in groups. However, each student must write down the solutions and code from scratch independently, and without referring to any written notes from the joint session. When debugging code together, you are only allowed to look at the input-output behavior of each other's programs and not the code itself. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. It is an honor code violation to copy, refer to, or look at written or code solutions from a previous year, including but not limited to: official solutions from a previous year, solutions posted online, and solutions you or someone else may have written up in a previous year. Furthermore, it is an honor code violation to post your assignment solutions online, such as on a public git repo. For more details about honor code, see The Stanford Honor Code and The Stanford Honor Code Pertaining to CS Courses.

Using AI Tools: While we allow collaboration with AI tools to answer conceptual questions pertaining to the course content, employing AI tools (e.g. ChatGPT, Cursor) substantially is not allowed for the homeworks and for parts of the project. Doing so is in violation of the honor code. Please see the generative AI policy guidance for more information.

Academic Accommodations

If you need an academic accommodation based on a disability, please register with the Office of Accessible Education (OAE). Professional staff will evaluate your needs, support appropriate and reasonable accommodations, and prepare an Academic Accommodation Letter for faculty. To get started, or to re-initiate services, please visit oae.stanford.edu. If you already have an Academic Accommodation Letter, please send your letter to cs224r-staff-spr2425@cs.stanford.edu. OAE Letters should be sent to us at the earliest possible opportunity so that the course staff can partner with you and OAE to make the appropriate accommodations.

Note on Financial Aid

All students should retain receipts for books and other course-related expenses, as these may be qualified educational expenses for tax purposes. If you are an undergraduate receiving financial aid, you may be eligible for additional financial aid for required books and course materials if these expenses exceed the aid amount in your award letter. For more information, review your award letter or visit the Student Budget website.

© Chelsea Finn 2025