← Back to Home

CS224R Final Projects

Type Title Authors Mentor TA
Default Outstanding Project: Efficient Arithmetic Reasoning in Small LMs via Function Calling Alina Hu, Ron Wang Andy Tang
Default Outstanding Project: Reinforcement Learning Training for Dynamic Context Management in Mathematical Reasoning Batu El, Mehmet Hamza Erol, Hannah Park-Kaufmann Anikait Singh
Custom Outstanding Project: Reinforcement Learning for Automated Spectrometer Calibration Amine Lamouchi, Martina Del Gaudio Andy Tang
Custom Reinforcement Learning for Automatic Speech Recognition Ali Sartaz Khan, Prerana Rane Shirley Wu
Custom Divide, Embed, Conquer: Role-conditioned MAPPO with Continuous Latent Embeddings Avinash Singh Rajput Annie Chen
Default RL Fine-Tuning of Language Models with GenRM-CoT Prabhjot Singh Rai, Anirban Chatterjee Marcel Torne
Custom BallPy: Bug Analyzing Local LLM for Python TJ Jefferson, Eli LeChien, Ben Pekarek Jinny Chung
Custom Using RL to Generalize Robot Policies for Multiple Embodiments Ariel Bachman, Raúl Molina Gómez, Daniel Voxlin Jensen Gao
Custom 2048: Reinforcement Learning in a Delayed Reward Environment Prady Saligram, Tanvir Bhathal, Robby Manihani Sri Jaladi
Custom Better Goals, Better Policies: Goal Representation for Offline Hierarchical RL Yongce Li, Xiaoyue Wang Pulkit Goel
Default A Multi-Stage Self-Optimization Framework for LLM Reasoning: Exploration, Structured Improvement, and Robust Inference Virginia Chen, Sheng-Kai Huang, ChienTsung Huang Jensen Gao
Default Data-Augmented DPO: Comparing Enhancements of SFT-Trained LLMs Austin Bennett, Rishi Padmanabhan, Jared Weissberg Fengyu Li
Default Adaptive Test Time Compute for RL Fine Tuning James Chen, Grace Luo, Aarav Wattal Andy Tang
Default Galaxy: Fine-Tuning Large Language Models with Reinforcement Learning Feiyang Zhu, Shutong Zhang, Siyuan Wu Fengyu Li
Custom Deep Reinforcement Learning with Action Chunks. Fine-tuning ALOHA robot Dmitry Usanov Marcel Torne
Default Improving Test Time Inference via Learned Self Correction, Backtracking, and Verification Jacob Householder Joy He-Yueya
Custom Reinforcement Learning in Mental Healthcare Dean Barrow Bassem Akoush
Custom Control Lyapunov Functions for Reinforcement Learning and Adaptive Control Alex Leonessa, Jordan Berg Xingze Dai
Default Exploration into RL-based Language Model Finetuning Rui Chen Pulkit Goel
Custom Understanding the effect of RL on the internal representation of LLMs Rahul Chand, Arpandeep Khatua Arpandeep Khatua
Custom Test-Time Stochasticity Estimation for Adaptive Action Chunk Selection Sarosh Khan, Ellie Tanimura Jubayer Ibn Hamid
Custom Garen-teed Not a Bot - Realtime League of Legends Agent Rohan Tan Bhowmik, Gabriel Tsou-Hsian Tsai Andy Tang
Custom Adaptive Multi-Agent Deep Reinforcement Learning for Unsupervised Online Optimization of Concurrent Asynchronous Dataflow Pipelines Raed Al Sabawi Jinny Chung
Custom Does Visual Latent Quality Improve Dreamer-Style Model-Based RL? Hazel Chen Yixin Li
Custom Optimizing Re-Masking Schedules for Reasoning in Discrete Diffusion Models Radostin Cholakov, Zeyneb N. Kaya, Nicole Ma Joy He-Yueya
Default Fine-Tuning Qwen-0.5B for Math Reasoning and Instruction Following Boyu Han, Haoran Jia, Shuchen Liu Fengyu Li
Default Beyond Strict Verification: Exploring Reinforcement Learning with Weak Verifiers Jon Saad-Falcon, Jordan Juravsky Jinny Chung
Default Reinforcement Learning-based Multi-Objective Optimization Methods for LLMs Chetan Nair, Ishaan Singh, Khanh Tran Daniel Shin
Custom Evolutionary Population-Based Policy Optimization via High-Throughput Parallel Simulation Charlotte Ka Yee Yan, Asanshay Gupta Jubayer Ibn Hamid
Default Improving performance of a large language model on mathematical reasoning task through various fine-tuning and test-time compute methods Witold Gawlikowicz Zhen Wu
Custom Reinforcement Learning for Pick(ing) and (Bank)roll: Applying Deep Q-Learning to NBA Regular Season Money Line Betting Charles Shaviro Jinny Chung
Default Exploring Curriculum Learning in Different Stages of Large-Language Model Fine-tuning Jiaqi Wu Anikait Singh
Default Critique-Guided Instruction Following on UltraFeedback Pooja Sethi, Marielle Baumgartner, Malisha Lutchmeea Jubayer Ibn Hamid
Custom 20 Questions: Multi-turn RLHF for Sparse Rewards Aditi Bhaskar Shirley Wu
Default Learning to Search with an Oracle: Finetuning for Countdown with a Classical Solver Rupanshu Soi, Masoud Charkhabi Prateek Varshney
Custom Generative Planning: Conditioning Vs Reward Forecasting Jon Frydman Xingze Dai
Custom Improving NL2SQL Capabilities of LLMs Using Direct Preference Optimization Bora Oztekin, Elizabeth Sinyavin, Sajid Farook Pulkit Goel
Custom RL Methods for Mitigating Catastrophic Forgetting in Continual SLM Translation Abhijit Devalapura, Riley Carlson Ashish Rao
Custom In-context Search: Efficiency Boost or Fundamentally New Capability? Angel Raychev, Yalcin Tur, Mihajlo Stojkovic Shirley Wu
Custom Learning New Biophysical Controls in Protein Language Models via Supervised and Preference-Based Fine-Tuning Nahum Maru Andy Tang
Custom Data-Guided Noise (DGN) for Online Exploration Alec Lessing Yash Kankariya
Custom To N Equals Infinity and Beyond: Generalization Trends in Post-Trained LLMs Sudharsan Sundar Annie Chen
Custom Personalized Pedagogically Aligned English Learning Chatbot via Preference Optimization and Curiosity Ziqi Shu, Samantha Liu Anikait Singh
Custom Bluff and Learn: Comparing CFR and NFSP in Liar Bar Cici Hou, Louise Li, Phillip Miao Jubayer Ibn Hamid
Default CountUP: Improving LLM Reasoning with Reinforcement Learning and Synthetic Data Bruno de Moraes Dumont, Ethan Goodhart Shirley Wu
Default Improving LLM Instruction-Following Capabilities with Multi-objective Reinforcement Learning An Doan, Felicity Huang, Linda Liu Xingze Dai
Default Instruction Following via Self-Revision: Fine-Tuning Qwen with Teacher Feedback and Staged Curriculum Landon Choy, Tracy Li Fengyu Li
Default Reinforcement Learning Fine-Tuning with Calculator Tool Integration for Mathematical Reasoning Yahaya Ndutu Prateek Varshney
Custom Disentangling Knowledge and Reasoning in Medical Rahul Thapa, James Zou Shirley Wu
Custom Reinforcement Learning for Protein Motif Scaffolding Design Jordan Cahoon, Yaowei Deng Sergio Charles
Default Scaling DPO with Synthetic Preferences for Instruction-Following Language Models Keyan Azbijari Jinny Chung
Default Direct vs Adversarial Direct Preference Optimization (DPO vs. A-DPO) Ian Lasic-Ellis, Cameron Camp, Dominic Borg Prateek Varshney
Custom WebHierarch: Hierarchical Skill-Learning for Web Agents Su Kara, Ameya Jadhav, Allen Chau Ashish Rao
Custom Temperature Autotuning and Efficient Exploration in Online MaxEnt Diffusion RL Javier Nieto Andy Tang
Default Multi-Objective Alignment of Language Model using Novel Scalarization Methods Chenxi Feng, Zijian Du, Jing Luo Haoyi Duan
Custom Cross-Institution RL Benchmarking for Non-Synthetic Clinical Settings Kalyani Limaye Xingze Dai
Custom HIVE: A Multi-Agent Message Pooling Framework Ty Toney, Julian Allchin, Diego Bustamante Joy He-Yueya
Custom Improving Small Language Models via Test-Time Prompt Compression and Retrieval Neha Balamurugan, Keshav Patel Keval, Pranava Singhal Jubayer Ibn Hamid
Custom RL Methods on Large Language Models: A Curriculum Learning Approach Jack Hung, Luke Moberly Sergio Charles
Default Can Small LLMs Learn from Medium Ones? Charlie Jiang, Yixing Jiang, Yi Jing Prateek Varshney
Custom Leveraging Deep Q Networks for Kidney Paired Donation Matching Odelia Lorch Jinny Chung
Custom AwkAI: An AI-powered Command Line DSL Nikesh Mishra Xingze Dai
Custom Reinforcement Learning in Mental Healthcare Dean Barrow Bassem Akoush
Custom DPOBind: Ligand Generation Through Direct Preference Optimization of Chemical Language Models Rafael Prado Basto Sergio Charles
Default Supervised Fine-Tuning and Curriculum-Guided Direct Preference Optimization on Qwen2.5-0.5B Christopher Sun, Abishek Satish Anikait Singh
Default Unified Reasoning Traces for Small Language Model Enhancement: Combining Chain-of-Thought, Logic Predicates, and Executable Code Isaiah Hall Haoyi Duan
Custom Align Small Language Models for Personality-Consistent Agent Simulation Caroline Santos Marques da Silva Fengyu Li
Custom Applications of Reinforcement Learning in Music Arindam Saha Yash Kankariya
Default Role of SFT in RL Tuning of Qwen2.5 Chaoqun Jia Bassem Akoush
Custom Analysis of RL Architectures for Delayed Rewards in Super Smash Brothers Melee Danica Xiong, Tony Xia Jensen Gao
Custom Improving biological safety of genomic language models via direct preference optimization Alejandro Buendia, Mohini Misra, Samantha Mutiti Jensen Gao
Custom Sim2Sim on Legged Robots Jiaqi Shao, Chenhao Zhu, Yizhao Hou Zhen Wu
Default Distilling Reasoning Into Conversational Models Using Generated Data Jack Younger, Mateo Quiros-Bloch, Carlos Santana Anikait Singh
Custom Adaptive Mask Learning for MaskedMimic via Meta-RL Prasuna Chatla Jubayer Ibn Hamid
Custom Improving Q-Learning Sample Efficiency with Representation Learning for 2048 Rachael Cooper, Melinda Zhu Sri Jaladi
Custom Graph Reasoning-Tailored (GReaT) VLMs Mike Zhao, Raina Song, Joonwon Kang Fengyu Li
Custom Simulation Reinforcement Learning: Improving LLM Predictive Social Modeling Niles Egan Ashish Rao
Default RL Fine-Tuning of Language Model for Instruction Following and Math Reasoning Yifu Han, Geo Zhang Xingze Dai
Default Using Curriculum to Improve Mathematical Reasoning Joshua Shunk Ashish Rao
Custom Teaching Models to Reason about Vision-Based Code Generation using GRPO Soham V. Govande, Taeuk Kang, Andrew Shi Annie Chen
Custom From Rules to Strategy: Teaching Reinforcement Learning Agents to Play Sequence Nick Monozon Xingze Dai
Custom Protein-Agent – an RL Surrogate for Atomistic Molecular Dynamics Chetan Chilkunda Jensen Gao
Custom Budget–Aware Medical Form–Filling via Cooperative Q–Learning and Modular Tool Orchestration Ismael Arechiga Duran Bassem Akoush
Default Fine-Tuning Language Models with Curriculum Learning Ethan Trepka Ashish Rao
Custom Forest or Field: It Drone Matter Sarah Barragan Jensen Gao
Custom From Game-Playing to Self-Driving: Comparing AlphaGo vs AlphaZero Approaches for Driving Controls Ellen Xu Annie Chen
Default Curriculum and Augmented RL Fine-Tuning for Aligned Language Models Yisi Lyu, Yuqiao Zeng, Jiayu Chang Joy He-Yueya
Custom GRPO&Master: Multi Task Reasoning-First Chess RL Parth Sarthi, Salman Abdullah, Krrish Chawla Sri Jaladi
Custom ReCAP: Recursive Context-Aware Reasoning and Planning with Language Models Zhenyu Zhang, Tianyi Chen, Weiran Xu Prateek Varshney
Custom Dynamic Dataset Curation Alberto Mancarella Xingze Dai
Custom Dora The Explorer: Learning Explorative Policies for Language Model RL-Finetuning Ayush Chakravarthy Jubayer Ibn Hamid
Custom Towards Exponential Exploration Tejan Karmali Jubayer Ibn Hamid
Custom MARIO: Reinforcement Learning on Image Observations Nika Zahedi, Nils Kuhn, Evelyn Yee Pulkit Goel
Custom Reinforcement Learning for Retrieval Optimization in RAG Systems Ryan Tan, Jeffrey Xue, Richard Gu Anikait Singh
Default Fine-tuning Large Language Models via Tapered Off-Policy REINFORCE (TOPR) Mengge Pu Zhen Wu
Default ICReward: Learning Image-to-Video Consistency Rewards Agnes Liang, Renee Zbizika Haoyi Duan
Custom Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs Chelsea Zou, Yiheng Yao, Basant Khalil Joy He-Yueya
Custom Hydra: Training End-to-End Parallel Reasoners Suppakit Waiwitlikhit Bassem Akoush
Default Strengthening Reasoning: Curriculum-Based SFT on Countdown Yoshi Nakachi, Daniel Reichfeld Anikait Singh
Custom Goal-Conditioned Reinforcement Learning for Surgical Robotic Manipulation Daphne Barretto, Alycia Lee, Elsa Bismuth Jensen Gao
Custom KernelCompare: Optimizing CUDA Kernel Generation on Slow vs Fast Kernel Pairs Aryan Gulati Anikait Singh
Default Guiding Language Model Outputs via Principles Learned from User Feedback Justin Adjasu Shirley Wu
Custom Cook or be Cooked: The Bitter Lesson Derek Askaryar, Parthav Shergill Jensen Gao
Custom Aligning Text-to-Image Diffusion Models using Reinforcement Learning from Human Utility Wendy Yin, Yiwen Zhang Daniel Shin
Custom Training Robotics Policies With Imitation Learning from Simulated Teleoperation: A Proof of Concept for the BEHA VIOR-1k Project Niklas Vainio Sirui Chen
Custom RL-Guided Data Selection for Language Model Finetuning Harshit Gupta, Animesh Jha, Rohan Garg Sirui Chen
Custom A Critical Study of the Entropy Bonus for Exploration Ifdita Hasan Orney, Iddah Mlauzi, George Kojo Frimpong Birikorang Jubayer Ibn Hamid
Custom Reflection-Augmented QA: Reinforcement Learning Meets Online Search Zhulian Huang, Binbin Li, Ying Lu Joy He-Yueya
Custom Guess, Group, Reward: Solving Connections with Policy Learning over Latent Assignments Iris Xia, Justin Wei, Linda Tong Sergio Charles
Custom Discretizing Action Space to Improve Data Center Thermal Control with RL Aanika Atluri, Riya Karumanchi Anikait Singh
Default Improving LLM Reasoning Through Optimized Chain of Thought Revisioning and Precomputation Akhil Vyas Anikait Singh
Custom Quadruped Parkour–Mixture of Experts with Visual Input to Enable Generalization Michael Ziegltrum Zhen Wu
Custom Offline RL with Decision Transformers for T1D Glucose Control Katherine Greatwood Jinny Chung
Custom Training Large Language Models as Active Agents for Dense Control Tasks Ishan Khare, Gabe Seir, Anthony Zhan Sri Jaladi
Default Combining Direct Preference Optimization with Reward-Based Reranking for Improved Instruction-Following in Small Language Models Renee Qin, Nicole Garcia Fengyu Li
Custom Training Large Language Models as Optimizers for Drug Discovery Buttenschoen, Chakraborty, Hla, Wang Jubayer Ibn Hamid
Default Countdown to Brilliance: Evolving Math Reasoning Through Train and Test-Time Compute Codey Sun, Doug Fulop, Xiang Li Xingze Dai
Custom Entropy-Based Rewards for Chain of Thought Reasoning Julien Darve Anikait Singh
Default Enhancing Mathematical Reasoning Capabilities Through Frontier Examples Ohm Patel, Will Healy, Klara Andra-Thomas Zhen Wu
Custom ReFL guided Text2SVG Generation for Tactile Graphics Seonghee Lee Yash Kankariya
Custom A Dream Is All You Need Devin Gupta, Ali Ahmad, Annie Lee Jensen Gao
Custom Deep Reinforcement Learning for Efficient PDE Solvers Ivan Ge Sri Jaladi
Custom RL-Based Game Generation with PRewrite: Iterative Prompt Optimization for AI-Assisted Game Development Abel Dagne Bassem Akoush
Default CDPO: Curriculum-Driven Preference Optimization for Small-Scale LLM Alignment Adam Chun, Josh Francis, Tom Nguyen Joy He-Yueya
Custom WhiteboardGym Shraman Kar, Shreyas Kar Jubayer Ibn Hamid
Custom Less Details, But Be Thorough: Addressing Contradicting User Preferences in Multi-turn LLM-based Conversation Eugenie Shi, Haorui Guo, Shuojia Fu Jinny Chung
Custom Exploration Strategies for Reasoning Fine-tuning Nick Mecklenburg Joy He-Yueya
Custom Learning Optimal Military Resource Allocation using Reinforcement Learning Lucas Bosman Jensen Gao
Custom Video Style Transfer with Reinforcement Learning Amelia Kuang, Sirui (Ariel) Chen Jinny Chung
Custom Multi-Negative Softmax DPO for Legal Reasoning Connor Huang Marsh Haoyi Duan
Custom Automatic Piano Transcription with Diffusion Q-Learning Alex Hodges, Dante Danelian, Ramya Ayyagari Bassem Akoush
Custom Deep Reinforcement Learning for Rhythm Game Control Everett Lee Sergio Charles
Custom Contrastive Test-Time Scaling Nattaput Namchittai, Julia Huang Ashish Rao
Default Improving Math Reasoning via GRPO on Offline Synthetic Hard Negative Generation Arun Moorthy, Alan Ma Shirley Wu
Default Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids Kaizhe Hu, Haochen Shi, Yao He Zhen Wu
Default Stylized user preference alignment with Direct Preference Optimization (DPO) Justine Breuch, Rafael Cardoso Ferreira Jinny Chung
Custom Reinforcement Learning in Cryptocurrency Trading Alex Bloom, Michael Liu Xingze Dai
Custom SigmaGo: Five Steps Stones Ahead Emily Xia, Eric Chen, Robin Li Sri Jaladi
Custom Real-Time Vision-Language-Action Control for Open-Source Quadruped Robots Sagar Manglani Haoyi Duan
Custom Reinforcement Learning for Sea-surface UAV Tracking of Underwater Submarines Maxime de Belloy, Varun Agarwal Zhen Wu
Custom Adaptive action chunk selector Ruopei Chen, Ke Wang, Yazhou Zhang Marcel Torne
Custom Improving Multi-Agent Path Planning via Indirect Communication in Cooperative Tasks Abhinav Shaw Shirley Wu
Custom Compression of Thought Andrew Lanpouthakoun Xingze Dai
Custom Reinforcement Learning for Predicting Future Stock Performance Jonathan Larkin, Ram Komarraju, Tamika Bassman Pulkit Goel
Custom Goal based Long-Horizon Multi-task Robot aided by LLM or VLP with Action Primitives Harish Balasubramaniam Sirui Chen
Custom Direct Preference Optimization for Low-Level Actions in Robotic and Simulation Learning Kenneth Ma, Parker Stewart, Thomas Yim Marcel Torne
Custom Adversarial Strategy Generation for RL Poker Agents Using LLMs Ben Felter, Travis Senf, Alex Michael Annie Chen
Custom Toward Whole-Body Locomotion for Humanoid Robot Tae Yang, Zhicong Zhang Zhen Wu
Custom A Novel Approach Using Implicit Q-Learning to Optimize ER Patient Triage Aadhav Prabu Jinny Chung
Custom Adversarial RL for Hard-Negative Code Generation Eric Li, Ella Mao Jinny Chung
Custom Logits and Labyrinths: Using Meta RL to Play Text Based Adventure Games Javokhir Arifov, Philip Baillargeon, Nathanael Cadicamo Xingze Dai
Custom MergeRL: Preference-Driven Reinforcement Learning for 2048 Andy Liang, Abhinav Sinha, Jeremy Tian Sri Jaladi
Custom Dynamics Modeling between Learnable State Space Subsets for Data Efficient Reinforcement Learning Hyun Dong Lee, Kyle Ellefsen Jubayer Ibn Hamid
Custom Adaptive Trade Execution using Deep Reinforcement Learning Deep Learni Andy Tang
Custom AI Health Database: Structuring Medical Notes and Imitation Learning for Clinical Decision Support Jingdong Xiang Jensen Gao
Default Strategies for Improving Math Reasoning: SFT, RLOO, Synthetic Data, and Test-Time Inference Allison Jia, John Hsu, Jacob Faierman Andy Tang
Custom Comparative Study of Zero-Shot LLM Planning vs. Reinforcement Learning in Procgen James Cheng, Andy Ouyang, Nishikar Paruchuri Andy Tang
Default Learning From Failure - Loss Based Curriculum Learning Tomas Coghlan, Ismail Mardin, Mattheus Wolff Joy He-Yueya
Custom Reinforcement Learning in Tetris with Multi-Agent Systems Eric He, Deren Ji, Karl Songcuan Jinny Chung
Default RL fine-tuning of Language Models with Synthetic Data Daniel Sorvisto Sirui Chen
Custom 3PO - Causal Model-Based Reinforcement Learning Agent for Adaptive Pricing and Promotion Planning Minha Hwang Daniel Shin
Custom Rapid Feedback Loop Mitigation for Fair Policing Vyoma Raman Daniel Shin
Custom Implementing and Improving the Seminal Automated Red-Teaming RL Formulation Allie Griffith, Emma Casey Yash Kankariya
Default RL on Quarto: Experimenting with Dense Reward Environments Ishvi Mathai, Humishka Zope, Aditri Patil Prateek Varshney
Custom A Comparative Study of Deep Reinforcement Learning and Expectimax Search for the Game 2048 Soham Konar, Danny S. Park, Andrew Wu Sri Jaladi
Custom Fine-tuning Rule Merging for Efficient Math Reasoning Yifan Zhang Prateek Varshney
Default Augment and Align: Leveraging LLMs for Improved Preference Data in DPO Li Miao, Haoran Qi, Yue Shen Annie Chen
Custom Incentivizing Unsafe Reasoning in Chain-of-Thought Katherine Worden, Jeong Shin Sirui Chen
Custom Advancing Multi-Agent Reasoning in Open-Face Chinese Poker Alice Guo, Ramya Iyer, Isabella Lee Pulkit Goel
Custom Optimizing 3D Scene Agent Exploration for Novel View Generation Brian Lee Bassem Akoush
Custom Simulation-Based Policy Training for Costly-Data Scenarios Ethan Bogle Jinny Chung
Default Instruction Following with Qwen2.5: Multi-Objective Reinforcement Learning Gabrielle Belanger, George Dimopoulos, Nahome Hagos Haoyi Duan
Default DPPO: Direct Preference and Penalization Optimization Pengwei Sun Xingze Dai
Custom Robotic Tray Sorting with Soft Actor-Critic and Task-Conditioned Learning Keyuan Wu Sirui Chen
Custom Gaming the Video Against Yourself Joshua Boisvert Marcel Torne
Custom Uncertainty-Aware Planning for Off-road Autonomous Driving with Vision-Conditioned Learned Dynamics Aaron Feldman Annie Chen
Custom TANTALUS Nathaniel Voorhies Marcel Torne
Default Aligning Language Models with Multi-Objective Reinforcement Learning Maoan Wang, Yawen Guo Jubayer Ibn Hamid
Default Learning Obstacle-Avoiding Drone Navigation with PPO and SAC Claire Du Yash Kankariya
Custom Analysis of RL Architectures for Delayed Rewards in Super Smash Brothers Melee Danica Xiong, Tony Xia Jensen Gao
Custom Training an Expert Negotiator: RL-Based Fine-Tuning of LLMs to Improve Social Negotiation Skills Akaash Kolluri, Sally Zhu Fengyu Li
Custom Recipes for Language Model Reasoning through Reverse Curriculum Reinforcement Learning Jack Hsieh, Dillon Nguyen, Ethan Zhang Fengyu Li
Custom IaC-DPO: Improving Infrastructure-as-Code Generation with DPO-Refined Controller Models Sophia Zhang, Aditya Gupta Bassem Akoush
Custom Reinforcement Learning for Practical Wildfire Suppression with UAV Agents Amy Guan, Emily Tianshi, Ryne Reger Xingze Dai
Default Self-Improvement variants on DPO Yiyang Hao Sri Jaladi
Custom BountyBench: The Design of Environments and Rewards for Cybersecurity Agents Andy Zhang, Riya Dulepet Joy He-Yueya
Custom Reward Modeling and Policy Optimization for Volleyball Rally Decision-Making Rishi Alluri, Will Furlow Haoyi Duan
Default Many Minds, One Goal: Enhancing DPO in a Multi-Agent Framework Yasmine Mabene Sergio Charles
Custom MARIO: Reinforcement Learning on Image Observations Nika Zahedi, Nils Kuhn, Evelyn Yee Pulkit Goel
Custom Autonomous Drone Navigation for First Response Victor Greenberg, Carlos Hernandez Marcel Torne
Custom Learning from Experts: Three Stage Training for Multi-Action Physics-Based Control Andi Xu, Eric Chen Sirui Chen
Custom Robust Aerodynamic Shape Optimization on NIGnets using Deep Reinforcement Learning Atharva Aalok Sergio Charles
Default Default Project: Supervised Fine Tuning of Qwen2.5-0.5B on the Countdown Dataset Anushree Aggarwal, Saif Moolji, Bryce Tiglon Pulkit Goel
Custom Learning CAD Program Generation using Reinforcement Learning George Nakayama Xingze Dai
Custom Neural Decoding of Heard Speech Using RL-tuned LLMs Ricky Rojas, Ramani Tyagi Shirley Wu
Custom Sequential Reinforcement Learning on Economic Discourse for Real-Time Sarang Goel, Chirag Maheshwari Sri Jaladi
Custom Morphology-Aware Imitation Learning for Cross-Robot Generalization Jermaine Zhao Bassem Akoush
Custom Exploration in a Reward Uncertain Environment Denis Liu, Victor Li Anikait Singh
Custom Bluffing with Precision: LLM-Guided Strategy and Opponent Modeling in Multi-Agent Poker Sara Kothari, Yanny Gao Fengyu Li
Custom PPO Reinforcement Learning for Pipetting Laboratory Automation Gurmenjit Kaur Bahia, Ashley Chen Zhen Wu
Custom Reward Densification For RL in Multi-hop Calculators Coco Xu, Jenny Chen, Yolanda Wang Daniel Shin
Custom Vision-Based Autonomous Landing through Imitation Learning Mike Timmerman Pulkit Goel
Default Off-Policy Finetuning for LLM Math Reasoning Althea Hudson, Narvin Phouksouvath Ashish Rao
Custom RL4KG-RAG: Reinforcement Learning for Knowledge Graph Optimization in Open-Domain QA Banda Gayatri Srujana Haoyi Duan
Custom Real-Time Edge Deployment of Vision and RL Policy on a Quadruped Robot Frances Raphael, Saron Samuel, Yohannes Aklilu Jinny Chung
Custom Epistemic Uncertainty Estimation for Human-in-the-Loop Reinforcement Learning Ngorli Paintsil, Natalie Greenfield Annie Chen
Custom ACT-RL: Efficient Fine-Tuning through Reinforcement Learning for Robotic Manipulation Rohan Sikand, Diego Stone, Andre Yeung Haoyi Duan
Default Adaptive Test-Time Inference Extensions for Fine-Tuned Instruction-Following Models Diego Valdez Duran Sri Jaladi
Custom Test-Time Training for Efficient RL Sequence Modeling Alexander Waitz, Vrushank Gunjur, Kenny Dao Sri Jaladi
Custom Decision-Focused Offline Deep Reinforcement Learning for Healthcare Policy Optimization Praneet Bhoj, Ali Eshragh, Yuexing Li Jubayer Ibn Hamid
Default Exploratory Preference Optimization and Synthetic Dataset Generation for Preference Alignment Pranshu Chaturvedi, Parth Shroff Anikait Singh
Default Mastering Mathematical Reasoning in Large Language Models via Self-Play and Reinforcement Learning Fan Wang Joy He-Yueya
Default Improving Chain-of-Thought Reasoning in LLMs via Generative Reward Modeling Akhilesh Balasingam, Axel Wennstrom, Vishal Jain Jinny Chung
Custom Latent Observation Forecasting for Long-Horizon Imitation Learning Annmaria Antony, Rebecca Joseph, Mikul Rai Jubayer Ibn Hamid
Custom Refining and Expanding the Self-Consistency Strategy Charlotte Nicks, Teddy Ganea, Eli Myers Annie Chen
Custom ARPO: Adaptive Reward-driven Prompt Optimization Arnav Singhvi, Shreyas Agarwal Shirley Wu
Custom Towards a General Social Deduction Reinforcement Learning Model Isaac Zhao, Wesley Tjangnaka Fengyu Li
Custom Exploring Video Generation for Robot Learning Ashna Khetan, Daphne Liu, Poonam Sahoo Jensen Gao
Default Cooperative Self-Improvement: Hint-Augmented Self-Play for Math Learning Bar Weiner, Aadi Nashikkar Sri Jaladi
Custom Comparative Performance and Stability Analysis of PPO and DDPG Agents on Inverse Design of Meta-materials for Demultiplexing Selin Ertan, Matthew Villescas Sergio Charles
Custom Swing For the Fences: Deep Reinforcement Learning for Batting Practice Chase Joyner, Mack Smith Yash Kankariya
Default TSCS: A Two-Staged, Curriculum-Based Synthetic Data Approach to Improving Countdown Performance Ayush Alag Jubayer Ibn Hamid
Custom Smart Strategy: Dynamic Ordering for Multi-Agent RL Updates Michael Zhang, Karthik Vetrivel, Arjun Jain Marcel Torne
Default Leveraging RL to Improve LLM Response Generation Ethan Hellman Haoyi Duan
Default What’s On My Mind: Modelling Latent Human Thoughts via Reasoning Traces Agam Bhatia Xingze Dai
Custom ChokoZero Eugene Francisco, Bodo Wirth, Naveen Kannan Annie Chen
Default RL Fine-Tuning of Language Models June Zheng, Yihan Zhao, Zixin Li Ashish Rao
Custom MEMBOT: Memory-Based Robot in Intermittent POMDP Eyan Noronha, Yousef (Youzhi) Liang Sergio Charles
Custom Verbal Reinforcement Learning for Multi-Agent Systems Aakriti Lakshmanan, Rohan Davidi, Sathvik NaLLaMalli Shirley Wu
Default Fine-Dining with Fine-Tuning: Constraint-Driven Recipe Bots Aryan Sahai, Marcus Lintott, Regina Sevilla Jubayer Ibn Hamid
Default Navigating the Pareto Frontier: Multi-Objective Optimization for Fine-Tuning Language Models Idil Defne Cekin Sri Jaladi
Default SFT, DPO & RLOO on UltraFeedback & Countdown with Off-Policy Replay Billy Gao Haoyi Duan
Custom Sim2Real: Simulation to Reality Transfer for Autonomous Driving World Models Jack Liu, Ryan Catullo, Mac Ya, Sunny Yu Haoyi Duan
Custom On-the-Fly Adaptation for Out-of-Distribution Robustness in Reinforcement Learning Emily Liu, Flora Yuan Annie Chen
Custom Forest or Field: It Drone Matter Sarah Barragan, Karan Bhasin, Sukrut Oak Fengyu Li
Custom Human Chess: A Novel Searchless RL-based Chess Agent Capable of Multi-ELO Human-Like Play Prerit Choudhary, Rikhil Vagadia, Ankush Dhawan Pulkit Goel
Custom Step-wise Policy for Rare-tool Knowledge (SPaRK) Gabriel Bo, Koa Chang, Justin Gu Pulkit Goel
Default Learning to Think: Integrating Test-Time Compute into RL Fine-Tuning André Natal, Yacine Dolivet, Thomas Huang Sirui Chen
Custom Reinforcement Learning for Covered Path Planning on Robot Vacuum Ngoc Vo, Yiran Fan, Siqi Ma Zhen Wu
Default Multi-Objective Exploration: A Novel Reinforcement Learning Method to Solve Instruction Following and Mathematical Reasoning Problems Hugo Nathanael Yuwono Sergio Charles
Default Train smarter LLMs: balancing Data Quality, Fine-Tuning and Reinforcement Learning Gabriel Mesquida Masana, François Chesnay Ashish Rao
Default Improving Reasoning with Multi-Agent Systems Benjamin Marks Prateek Varshney
Custom Training Super Smash Bros. Melee Agents Matthew Lee, William Hu, Samuel Do Daniel Shin
Custom Reinforcement Tuning Open Source LLMs for Kernel Generation Aksh Garg, Jeffrey Heo, Megan Mou Fengyu Li
Custom Towards a General Social Deduction Reinforcement Learning Model Isaac Zhao, Wesley Tjangnaka Fengyu Li
Default Precise RLAIF: Improving Instruction Following with Sentence-Level AI Feedback Marcelo Peña Jinny Chung
Default Improving LLM Mathematical Reasoning Capabilities Using External Tools Jack Albright, Sheden Andemicael Anikait Singh
Custom Investigating Reward Hacking When Using Vision-Language Models as Reward Models (VLM-RMs) Kai Fronsdal, Emma Sun, Zoe Quake Marcel Torne
Custom Granularity in Reward Shaping for Automated Theorem Proving Marcelo Sena Annie Chen
Custom Beyond Single Rewards: Multi-Objective Scalarization in Few-Shot Preference Learning Gabriel SantaCruz Fengyu Li
Custom Graph-based Stock Market RL Agent Nevin Aresh Prateek Varshney
Custom Fine-Tuning Vision-Language Model with Reinforcement Learning for Visual Question Answering Dayoung Kim Marcel Torne
Default RL Fine-Tuning of Large Language Models Gerardus de Bruijn, Nareauphol Liu, Dev Jayram Ashish Rao
Default Synthetic Data Augmentation for LLM Training Ben Gonzalez-Maldonado, Diego Padilla, Emily Park Jinny Chung
Custom A Truly Hollow Knight: Resource Constrained Visually Dense Boss Fights Anthony Maltsev Pulkit Goel
Default Fine-Tuning of Large Language Models using Retrospective Exploration with Critic-Augmented Progress Caroline Van, Natalie Wang Sirui Chen
Default Debate for Multiagent Large Language Model Tuning Andrew Wang Pulkit Goel
Custom RL for Autonomous Drone Navigation Guilherme Bonfim, Matteo Tucci, Sumedha Kethini Marcel Torne
Custom Towards Size-Invariant Policy Learning in Grid Environments via Curriculum-Guided Transfer Yiling Huang, Wei Liu Haoyi Duan
Custom Enhancing LLM Reasoning on External Knowledge Claire Tang Sergio Charles