CS 224R Deep Reinforcement Learning

Spring 2025, Class: Wed, Fri 10:30am-11:50am @ Hewlett 200


Description:

Humans, animals, and robots faced with the world must make decisions and take actions in the world. Moreover, the decisions they choose affect the world they exist in and those outcomes must be taken into account. This course is about algorithms for deep reinforcement learning – methods for learning behavior from experience, with a focus on practical algorithms that use deep neural networks to learn behavior from high-dimensional observations. Topics will include methods for learning from demonstrations, both model-based and model-free deep RL methods, methods for learning from offline datasets, and more advanced techniques for learning multiple tasks such as goal-conditioned RL, meta-RL, and unsupervised skill discovery. These methods will be instantiated with examples from domains with high-dimensional state and action spaces, such as robotics, visual navigation, and control. This course is complementary to CS234, which neither being a pre-requisite for the other. In comparison to CS234, this course will have a more applied and deep learning focus and an emphasis on use-cases in robotics and language modeling.

Format:

The course will consist of twice weekly lectures, four homework assignments, and a final project. The lectures will cover fundamental topics in deep reinforcement learning, with a focus on methods that are applicable to domains such as robotics and language modeling The assignments will focus on conceptual questions and coding problems that emphasize these fundamentals. Finally, students will present their projects at a poster session and through a final report at the end of the quarter.

Prerequisites:

Machine learning: CS229 or equivalent is a prerequisite. We will be assuming knowledge of concepts including, but not limited to (stochastic) gradient descent and cross-validation, and pre-requisites such as probability theory, multivariable calculus, and basic linear algebra.

Some familiarity with deep learning: The course will build on deep learning concepts such as backpropagation, convolutional networks, and sequence models such as transformers. The assignments will involve programming in PyTorch. The first week will include a short PyTorch review tutorial.

Some familiarity with reinforcement learning: We will assume some familiarity with the basics of reinforcement learning. For introductory material on RL and Markov decision processes (MDPs), see CS221's lectures on MDPs and RL, or see Chapters 3 and 4 of Sutton & Barto.

Previous Offerings


Staff

Chelsea Finn

Prof. Chelsea Finn

Instructor
OH: Wed 5 – 6pm
Location: Gates 358
John Cho

John Cho

Course Manager
Swati Batra

Swati Batra

Course Advisor
Jubayer Ibn Hamid

Jubayer Ibn Hamid

Head Teaching Assistant
OH: Tues 7 – 9pm
Location: Huang Basement
Anikait Singh

Anikait Singh

Teaching Assistant
OH: Wed 1 – 3pm
Location: Huang Basement
Annie Chen

Annie Chen

Teaching Assistant
OH: Thurs 10am – 12pm
Location: Huang Basement
Sergio Charles

Sergio Charles

Teaching Assistant
OH: Mon 1 – 3pm
Location: Huang Basement
Haoyi Duan

Haoyi Duan

Teaching Assistant
OH: Fri 3 – 5pm
Location: Huang Basement
Yash Kankariya

Yash Kankariya

Teaching Assistant
OH: Sun 1 – 3pm
Location: Huang Basement
Andy Tang

Andy Tang

Teaching Assistant
OH: Tues 1 – 3pm
Location: Huang Basement
Marcel Torne

Marcel Torne

Teaching Assistant
OH: Thurs 1 – 3 pm
Location: Huang Basement
Fengyu Li

Fengyu Li

Teaching Assistant
OH: Tues 9 – 11am
Location: Zoom (link on Canvas calendar)
Ashish Rao

Ashish Rao

Teaching Assistant
OH: Sat 9 – 11am
Location: Huang Basement
Shirley Wu

Shirley Wu

Teaching Assistant
OH: Sun 7 – 9pm
Location: Zoom (link on Canvas calendar)
Zhen Wu

Zhen Wu

Teaching Assistant
OH: Mon 3 – 5pm
Location: Zoom (link on Canvas calendar)
Jinny Chung

Jinny Chung

Teaching Assistant
OH: Mon 9 – 11am
Location: Huang Basement
Sirui Chen

Sirui Chen

Teaching Assistant
OH: Mon 9 – 11am
Location: Huang Basement
Pulkit Goel

Pulkit Goel

Teaching Assistant
OH: Wed 3 – 5pm
Location: Huang Basement
Joy He Yueya

Joy He Yueya

Teaching Assistant
OH: Fri 1 – 3pm
Location: Huang Basement
Jensen Gao

Jensen Gao

Teaching Assistant
OH: Sat 3 – 5pm
Location: Zoom (link on Canvas calendar)
Sri Jaladi

Sri Jaladi

Teaching Assistant
OH: Thurs 3 – 5pm
Location: Zoom (link on Canvas calendar)
Daniel Shin

Daniel Shin

Teaching Assistant
OH: Tues 5 – 7pm
Location: Zoom (link on Canvas calendar)


Timeline

Date Lecture Deadlines Notes & Optional Reading
Week 1
Wed, April 2
Lecture Course intro + MDPs
Week 1
Fri, April 4
Lecture Imitation Learning Homework 1 Out
Week 1
Fri, April 4
TA Session Pytorch Tutorial
  • Logistics: 1:30 - 2:30 pm in Gates B01
Week 2
Wed, April 9
Lecture Policy Gradients
Week 2
Fri, April 11
Lecture Actor-Critic Methods
Week 3
Wed, April 16
Lecture Q-Learning Due Project Survey
Week 3
Fri, April 18
Lecture Practical Deep RL Implementation Techniques Due Homework 1
Homework 2 Out
Week 3
Fri, April 18
TA Session Extra section on Q-learning
  • Logistics: 1:30 - 2:30 pm in Gates B01
Week 4
Wed, April 23
Lecture Offline RL
Week 4
Fri, April 25
Lecture Reward Learning Due Project Proposal
Week 5
Wed, April 30
Guest Lecture RL for LLMs: Preference Optimization
Week 5
Fri, May 2
Guest Lecture RL for LLMs: Reasoning Due Homework 2
Homework 3 Out
Week 6
Wed, May 7
Lecture Model-based RL
Week 6
Fri, May 9
Lecture Multi-Task and Goal-Conditioned RL
Week 7
Wed, May 14
Lecture Meta-RL
Week 7
Fri, May 16
Lecture Exploration Due Homework 3
Homework 4 Out
Week 8
Wed, May 21
Lecture Hierarchical RL and IL
Week 8
Fri, May 23
Lecture RL for Robots: Reset-Free RL Due Milestone
Week 9
Wed, May 28
Guest Lecture RL for Robots: Sim-to-Real Transfer
Week 9
Fri, May 30
Lecture Frontiers Due Homework 4
Week 10
Wed, June 4
Presentations No Lecture (Poster Session) Due Poster
  • The poster session is 10:30 am - 1:30 pm at Burnham Pavilion
Week 11
Mon, June 9
Due Final Project Report



Course Calendar

A course calendar with details of lectures, TA sessions, office hours, and miscellaneous course events is available in a variety of formats:

Grading and Course Policies

Homeworks (50%): There are four graded homework assignments. To provide some flexibility, the lowest scoring homework for each student will be worth 5% of the grade, while the remaining three will be worth 15% of the grade. Assignments will require training neural networks in PyTorch. All assignments are due on Gradescope at 11:59 pm Pacific Time on the respective due date.

Project (50%): There's a research-level project of your choice. You may form groups of 1-3 students to complete the project, and you are encouraged to start early! Project guidelines will be posted in the first week of the class.

Late Days: You have 6 total late days across homeworks, the project proposal, and the project milestone. Late days are not applicable to the project poster (due to its live nature) and the project report (due to the university grading deadline). You may use a maximum of 2 late days for any single assignment. Late days used for group projects apply to all members of the group.

Lecture Attendance: While we do not require lecture attendance, students are encouraged to join the live lecture. If you want to watch the lecture remotely, please see the Panopto tab of Canvas. For those who cannot join the live lectures, lecture recordings will also be available on Canvas shortly following the lecture.

Honor Code: Students are free to form study groups and may discuss homework in groups. However, each student must write down the solutions and code from scratch independently, and without referring to any written notes from the joint session. When debugging code together, you are only allowed to look at the input-output behavior of each other's programs and not the code itself. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. It is an honor code violation to copy, refer to, or look at written or code solutions from a previous year, including but not limited to: official solutions from a previous year, solutions posted online, and solutions you or someone else may have written up in a previous year. Furthermore, it is an honor code violation to post your assignment solutions online, such as on a public git repo. For more details about honor code, see The Stanford Honor Code and The Stanford Honor Code Pertaining to CS Courses.

Using AI Tools: While we allow collaboration with AI tools to answer conceptual questions pertaining to the course content, employing AI tools (e.g. ChatGPT, Cursor) substantially is not allowed for the homeworks and for parts of the project. Doing so is in violation of the honor code. Please see the generative AI policy guidance for more information.

Academic Accommodations

If you need an academic accommodation based on a disability, please register with the Office of Accessible Education (OAE). Professional staff will evaluate your needs, support appropriate and reasonable accommodations, and prepare an Academic Accommodation Letter for faculty. To get started, or to re-initiate services, please visit oae.stanford.edu. If you already have an Academic Accommodation Letter, please send your letter to cs224r-staff-spr2425@cs.stanford.edu. OAE Letters should be sent to us at the earliest possible opportunity so that the course staff can partner with you and OAE to make the appropriate accommodations.

Note on Financial Aid

All students should retain receipts for books and other course-related expenses, as these may be qualified educational expenses for tax purposes. If you are an undergraduate receiving financial aid, you may be eligible for additional financial aid for required books and course materials if these expenses exceed the aid amount in your award letter. For more information, review your award letter or visit the Student Budget website.




    © Chelsea Finn 2025