CS 224R Deep Reinforcement Learning

Spring 2023, Class: Mon, Wed 4:30pm-5:50pm, Gates B1



Description:

Humans, animals, and robots faced with the world must make decisions and take actions in the world. Moreover, the decisions they choose affect the world they exist in and those outcomes must be taken into account. This course is about algorithms for deep reinforcement learning – methods for learning behavior from experience, with a focus on practical algorithms that use deep neural networks to learn behavior from high-dimensional observations. Topics will include methods for learning from demonstrations, both model-based and model-free deep RL methods, methods for learning from offline datasets, and more advanced techniques for learning multiple tasks such as goal-conditioned RL, meta-RL, and unsupervised skill discovery. These methods will be instantiated with examples from domains with high-dimensional state and action spaces, such as robotics, visual navigation, and control. This course is complementary to CS234, which neither being a pre-requisite for the other. In comparison to CS234, this course will have a more applied and deep learning focus and an emphasis on use-cases in robotics and motor control

Format:

The course will consist of twice weekly lectures, four homework assignments, and a final project. The lectures will cover fundamental topics in deep reinforcement learning, with a focus on methods that are applicable to domains such as robotics and control. The assignments will focus on conceptual questions and coding problems that emphasize these fundamentals. Finally, students will present their projects at a poster session and through a final report at the end of the quarter.

Prerequisites:

Machine learning: CS229 or equivalent is a prerequisite. We will be assuming knowledge of concepts including, but not limited to (stochastic) gradient descent and cross-validation, and pre-requisites such as probability theory, multivariable calculus, and linear algebra.

Some familiarity with deep learning: The course will build on deep learning concepts such as backpropagation, convolutional networks, and recurrent neural networks. The assignments will involve programming in PyTorch. The first week will include a short PyTorch review tutorial.

Some familiarity with reinforcement learning: We will assume some familiarity with the basics of reinforcement learning. For introductory material on RL and Markov decision processes (MDPs), see CS221’s lectures on MDPs and RL, or see Chapters 3 and 4 of Sutton & Barto.

Lecture Slides and Videos:

Lecture slides will be posted on the course website one hour before each lecture. For students enrolled in the course, recorded lecture videos will be posted to canvas after each lecture.


Staff

Chelsea Finn

Prof. Chelsea Finn

Instructor
OH: Mon 10:30 – 11:30am
Location: Gates 358
Webpage
Karol Hausman

Dr. Karol Hausman

Instructor
OH: Mon 10:30 – 11:30am
Location: Gates 358
Webpage
Amelie Byun

Amelie Byun

Course Manager
John Cho

John Cho

Course Coordinator
Rafael Rafailov

Rafael Rafailov

Head Teaching Assistant
OH: Thurs 5 – 7pm
Location: Gates 392
Dilip Arumugam

Dilip Arumugam

Teaching Assistant
OH: Thurs 10am – 12pm
Location: Gates 498
Annie Xie

Annie Xie

Teaching Assistant
OH: Tues 3 – 5pm
Location: Gates 459
Regina Wang

Regina Wang

Teaching Assistant
OH: Wed 9 – 11am
Location: Zoom (link on Canvas calendar)
Ansh Khurana

Ansh Khurana

Teaching Assistant
OH: Fri 9 – 11am
Location: Gates 459 and Zoom (link on Canvas calendar)
Saurabh Kumar

Saurabh Kumar

Teaching Assistant
OH: Sun 3 – 5pm
Location: Zoom (link on Canvas calendar)
Jonathan Yang

Jonathan Yang

Teaching Assistant
Wed 6 – 8 pm
Gates 358
Max Sobol Mark

Max Sobol Mark

Teaching Assistant
OH: Tues 6 – 8pm
Location: Zoom (link on Canvas calendar)


Timeline

Date Lecture Deadlines Notes
Week 1
Mon, April 3
Lecture Course introduction
Week 1
Wed, April 5
Lecture Imitation Learning Homework 1 out [PDF, code, template]
Week 1
Thur, April 6
TA Session PyTorch tutorial
Week 2
Mon, April 10
Lecture MDPs and Policy Gradients
Week 2
Wed, April 12
Lecture Actor-Critic Methods
Week 3
Mon, April 17
Lecture Q-Learning Due Project survey
Week 3
Wed, April 19
Lecture Practical Deep RL Implementation Techniques Due Homework 1
Homework 2 out [PDF, code, template]
Week 4
Mon, April 24
Lecture Model-Based RL
Week 4
Wed, April 26
Leacture Reward Learning Due Project proposal
Week 5
Mon, May 1
Lecture Offline RL
Week 5
Wed, May 3
Lecture Offline RL Part 2 Due Homework 2
Homework 3 out [PDF, code, template]
Week 6
Mon, May 8
Lecture Multi-Task RL and Goal-Conditioned RL
Week 6
Wed, May 10
Guest Lecture Transfer Learning in RL (Jie Tan)
Week 7
Mon, May 15
Lecture Meta-RL
Week 7
Wed, May 17
Lecture Meta-RL Part 2 Due Homework 3
Homework 4 out [PDF, code, template]
Week 8
Mon, May 22
Guest Lecture Reset-Free RL (Archit Sharma)
Week 8
Wed, May 24
Lecture Hierarchical RL and Skill Discovery Due Project milestone
Week 9
Mon, May 29
No Lecture (Memorial Day)
Week 9
Wed, May 31
Guest Lecture RL in the Real World (Anna Goldie) Due Homework 4
Week 10
Mon, June 5
Lecture Review and Frontiers
Week 10
Wed, June 7
Presentations Project Poster Session
  • The poster session will be held at the Gates AT&T Lawn from 4-7pm.
Week 11
Mon, June 12
No class DueFinal Project Report
Week 11
Wed, June 14
No class



Course Calendar

A course calendar with details of lectures, TA sessions, office hours, and miscellaneous course events is available in a variety of formats:

Grading and Course Policies

Homeworks (50%): There are four graded homework assignments. To provide some flexibility, the lowest scoring homework for each student will be worth 5% of the grade, while the remaining three will be worth 15% of the grade. Assignments will require training neural networks in PyTorch. All assignments are due on Gradescope at 11:59 pm Pacific Time on the respective due date.

Project (50%): There's a research-level project of your choice. You may form groups of 1-3 students to complete the project, and you are encouraged to start early! Detailed guidelines on the project can be found here.

Late Days: You have 6 total late days across homeworks and project deliverables (anything worth a grade), except for the project poster. You may use a maximum of 2 late days for any single assignment. Late days used for group projects apply to all members of the group.

Lecture Attendance: While we do not require lecture attendance, students are encouraged to join the live lecture. To accommodate various circumstances, we will be live-streaming the in-person lecture via a zoom link on canvas. For those who cannot join the live lectures, lecture recordings will also be available on Canvas shortly following the lecture.

Honor Code: Students are free to form study groups and may discuss homework in groups. However, each student must write down the solutions and code from scratch independently, and without referring to any written notes from the joint session. When debugging code together, you are only allowed to look at the input-output behavior of each other's programs and not the code itself. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. It is an honor code violation to copy, refer to, or look at written or code solutions from a previous year, including but not limited to: official solutions from a previous year, solutions posted online, and solutions you or someone else may have written up in a previous year. Furthermore, it is an honor code violation to post your assignment solutions online, such as on a public git repo. For more details about honor code, see The Stanford Honor Code and The Stanford Honor Code Pertaining to CS Courses.

Academic Accommodations

If you need an academic accommodation based on a disability, please register with the Office of Accessible Education (OAE). Professional staff will evaluate your needs, support appropriate and reasonable accommodations, and prepare an Academic Accommodation Letter for faculty. To get started, or to re-initiate services, please visit oae.stanford.edu. If you already have an Academic Accommodation Letter, please send your letter to cs224r-spr2223-staff@lists.stanford.edu. OAE Letters should be sent to us at the earliest possible opportunity so that the course staff can partner with you and OAE to make the appropriate accommodations.

Note on Financial Aid

All students should retain receipts for books and other course-related expenses, as these may be qualified educational expenses for tax purposes. If you are an undergraduate receiving financial aid, you may be eligible for additional financial aid for required books and course materials if these expenses exceed the aid amount in your award letter. For more information, review your award letter or visit the Student Budget website.




    © Chelsea Finn 2023