Humans, animals, and robots faced with the world must make decisions and take actions in the world. Moreover, the decisions they choose affect the world they exist in and those outcomes must be taken into account. This course is about algorithms for deep reinforcement learning – methods for learning behavior from experience, with a focus on practical algorithms that use deep neural networks to learn behavior from high-dimensional observations. Topics will include methods for learning from demonstrations, both model-based and model-free deep RL methods, methods for learning from offline datasets, and more advanced techniques for learning multiple tasks such as goal-conditioned RL, meta-RL, and unsupervised skill discovery. These methods will be instantiated with examples from domains with high-dimensional state and action spaces, such as robotics, visual navigation, and control. This course is complementary to CS234, which neither being a pre-requisite for the other. In comparison to CS234, this course will have a more applied and deep learning focus and an emphasis on use-cases in robotics and motor control
The course will consist of twice weekly lectures, four homework assignments, and a final project. The lectures will cover fundamental topics in deep reinforcement learning, with a focus on methods that are applicable to domains such as robotics and control. The assignments will focus on conceptual questions and coding problems that emphasize these fundamentals. Finally, students will present their projects at a poster session and through a final report at the end of the quarter.
Machine learning: CS229 or equivalent is a prerequisite. We will be assuming knowledge of concepts including, but not limited to (stochastic) gradient descent and cross-validation, and pre-requisites such as probability theory, multivariable calculus, and linear algebra.
Some familiarity with deep learning: The course will build on deep learning concepts such as backpropagation, convolutional networks, and recurrent neural networks. The assignments will involve programming in PyTorch. The first week will include a short PyTorch review tutorial.
Some familiarity with reinforcement learning: We will assume some familiarity with the basics of reinforcement learning. For introductory material on RL and Markov decision processes (MDPs), see CS221’s lectures on MDPs and RL, or see Chapters 3 and 4 of Sutton & Barto.
Lecture slides will be posted on the course website one hour before each lecture. For students enrolled in the course, recorded lecture videos will be posted to canvas after each lecture.
A course calendar with details of lectures, TA sessions, office hours, and miscellaneous course events is available in a variety of formats:
Homeworks (50%): There are four graded homework assignments. To provide some flexibility, the lowest scoring homework for each student will be worth 5% of the grade, while the remaining three will be worth 15% of the grade. Assignments will require training neural networks in PyTorch. All assignments are due on Gradescope at 11:59 pm Pacific Time on the respective due date.
Project (50%): There's a research-level project of your choice. You may form groups of 1-3 students to complete the project, and you are encouraged to start early! Detailed guidelines on the project can be found here.
Late Days: You have 6 total late days across homeworks and project deliverables (anything worth a grade), except for the project poster. You may use a maximum of 2 late days for any single assignment. Late days used for group projects apply to all members of the group.
Lecture Attendance: While we do not require lecture attendance, students are encouraged to join the live lecture. To accommodate various circumstances, we will be live-streaming the in-person lecture via a zoom link on canvas. For those who cannot join the live lectures, lecture recordings will also be available on Canvas shortly following the lecture.
Honor Code: Students are free to form study groups and may discuss homework in groups. However, each student must write down the solutions and code from scratch independently, and without referring to any written notes from the joint session. When debugging code together, you are only allowed to look at the input-output behavior of each other's programs and not the code itself. In other words, each student must understand the solution well enough in order to reconstruct it by him/herself. It is an honor code violation to copy, refer to, or look at written or code solutions from a previous year, including but not limited to: official solutions from a previous year, solutions posted online, and solutions you or someone else may have written up in a previous year. Furthermore, it is an honor code violation to post your assignment solutions online, such as on a public git repo. For more details about honor code, see The Stanford Honor Code and The Stanford Honor Code Pertaining to CS Courses.
If you need an academic accommodation based on a disability, please register with the Office of Accessible Education (OAE). Professional staff will evaluate your needs, support appropriate and reasonable accommodations, and prepare an Academic Accommodation Letter for faculty. To get started, or to re-initiate services, please visit oae.stanford.edu. If you already have an Academic Accommodation Letter, please send your letter to cs224r-spr2223-staff@lists.stanford.edu. OAE Letters should be sent to us at the earliest possible opportunity so that the course staff can partner with you and OAE to make the appropriate accommodations.
All students should retain receipts for books and other course-related expenses, as these may be qualified educational expenses for tax purposes. If you are an undergraduate receiving financial aid, you may be eligible for additional financial aid for required books and course materials if these expenses exceed the aid amount in your award letter. For more information, review your award letter or visit the Student Budget website.
© Chelsea Finn 2023