| Default |
Outstanding Project: Efficient Arithmetic Reasoning in Small LMs via Function Calling |
Alina Hu, Ron Wang |
Andy Tang |
| Default |
Outstanding Project: Reinforcement Learning Training for Dynamic Context Management in Mathematical Reasoning |
Batu El, Mehmet Hamza Erol, Hannah Park-Kaufmann |
Anikait Singh |
| Custom |
Outstanding Project: Reinforcement Learning for Automated Spectrometer Calibration |
Amine Lamouchi, Martina Del Gaudio |
Andy Tang |
| Custom |
Reinforcement Learning for Automatic Speech Recognition |
Ali Sartaz Khan, Prerana Rane |
Shirley Wu |
| Custom |
Divide, Embed, Conquer: Role-conditioned MAPPO with Continuous Latent Embeddings |
Avinash Singh Rajput |
Annie Chen |
| Default |
RL Fine-Tuning of Language Models with GenRM-CoT |
Prabhjot Singh Rai, Anirban Chatterjee |
Marcel Torne |
| Custom |
BallPy: Bug Analyzing Local LLM for Python |
TJ Jefferson, Eli LeChien, Ben Pekarek |
Jinny Chung |
| Custom |
Using RL to Generalize Robot Policies for Multiple Embodiments |
Ariel Bachman, Raúl Molina Gómez, Daniel Voxlin |
Jensen Gao |
| Custom |
2048: Reinforcement Learning in a Delayed Reward Environment |
Prady Saligram, Tanvir Bhathal, Robby Manihani |
Sri Jaladi |
| Custom |
Better Goals, Better Policies: Goal Representation for Offline Hierarchical RL |
Yongce Li, Xiaoyue Wang |
Pulkit Goel |
| Default |
A Multi-Stage Self-Optimization Framework for LLM Reasoning: Exploration, Structured Improvement, and Robust Inference |
Virginia Chen, Sheng-Kai Huang, ChienTsung Huang |
Jensen Gao |
| Default |
Data-Augmented DPO: Comparing Enhancements of SFT-Trained LLMs |
Austin Bennett, Rishi Padmanabhan, Jared Weissberg |
Fengyu Li |
| Default |
Adaptive Test Time Compute for RL Fine Tuning |
James Chen, Grace Luo, Aarav Wattal |
Andy Tang |
| Default |
Galaxy: Fine-Tuning Large Language Models with Reinforcement Learning |
Feiyang Zhu, Shutong Zhang, Siyuan Wu |
Fengyu Li |
| Custom |
Deep Reinforcement Learning with Action Chunks. Fine-tuning ALOHA robot |
Dmitry Usanov |
Marcel Torne |
| Default |
Improving Test Time Inference via Learned Self Correction, Backtracking, and Verification |
Jacob Householder |
Joy He-Yueya |
| Custom |
Reinforcement Learning in Mental Healthcare |
Dean Barrow |
Bassem Akoush |
| Custom |
Control Lyapunov Functions for Reinforcement Learning and Adaptive Control |
Alex Leonessa, Jordan Berg |
Xingze Dai |
| Default |
Exploration into RL-based Language Model Finetuning |
Rui Chen |
Pulkit Goel |
| Custom |
Understanding the effect of RL on the internal representation of LLMs |
Rahul Chand, Arpandeep Khatua |
Arpandeep Khatua |
| Custom |
Test-Time Stochasticity Estimation for Adaptive Action Chunk Selection |
Sarosh Khan, Ellie Tanimura |
Jubayer Ibn Hamid |
| Custom |
Garen-teed Not a Bot - Realtime League of Legends Agent |
Rohan Tan Bhowmik, Gabriel Tsou-Hsian Tsai |
Andy Tang |
| Custom |
Adaptive Multi-Agent Deep Reinforcement Learning for Unsupervised Online Optimization of Concurrent Asynchronous Dataflow Pipelines |
Raed Al Sabawi |
Jinny Chung |
| Custom |
Does Visual Latent Quality Improve Dreamer-Style Model-Based RL? |
Hazel Chen |
Yixin Li |
| Custom |
Optimizing Re-Masking Schedules for Reasoning in Discrete Diffusion Models |
Radostin Cholakov, Zeyneb N. Kaya, Nicole Ma |
Joy He-Yueya |
| Default |
Fine-Tuning Qwen-0.5B for Math Reasoning and Instruction Following |
Boyu Han, Haoran Jia, Shuchen Liu |
Fengyu Li |
| Default |
Beyond Strict Verification: Exploring Reinforcement Learning with Weak Verifiers |
Jon Saad-Falcon, Jordan Juravsky |
Jinny Chung |
| Default |
Reinforcement Learning-based Multi-Objective Optimization Methods for LLMs |
Chetan Nair, Ishaan Singh, Khanh Tran |
Daniel Shin |
| Custom |
Evolutionary Population-Based Policy Optimization via High-Throughput Parallel Simulation |
Charlotte Ka Yee Yan, Asanshay Gupta |
Jubayer Ibn Hamid |
| Default |
Improving performance of a large language model on mathematical reasoning task through various fine-tuning and test-time compute methods |
Witold Gawlikowicz |
Zhen Wu |
| Custom |
Reinforcement Learning for Pick(ing) and (Bank)roll: Applying Deep Q-Learning to NBA Regular Season Money Line Betting |
Charles Shaviro |
Jinny Chung |
| Default |
Exploring Curriculum Learning in Different Stages of Large-Language Model Fine-tuning |
Jiaqi Wu |
Anikait Singh |
| Default |
Critique-Guided Instruction Following on UltraFeedback |
Pooja Sethi, Marielle Baumgartner, Malisha Lutchmeea |
Jubayer Ibn Hamid |
| Custom |
20 Questions: Multi-turn RLHF for Sparse Rewards |
Aditi Bhaskar |
Shirley Wu |
| Default |
Learning to Search with an Oracle: Finetuning for Countdown with a Classical Solver |
Rupanshu Soi, Masoud Charkhabi |
Prateek Varshney |
| Custom |
Generative Planning: Conditioning Vs Reward Forecasting |
Jon Frydman |
Xingze Dai |
| Custom |
Improving NL2SQL Capabilities of LLMs Using Direct Preference Optimization |
Bora Oztekin, Elizabeth Sinyavin, Sajid Farook |
Pulkit Goel |
| Custom |
RL Methods for Mitigating Catastrophic Forgetting in Continual SLM Translation |
Abhijit Devalapura, Riley Carlson |
Ashish Rao |
| Custom |
In-context Search: Efficiency Boost or Fundamentally New Capability? |
Angel Raychev, Yalcin Tur, Mihajlo Stojkovic |
Shirley Wu |
| Custom |
Learning New Biophysical Controls in Protein Language Models via Supervised and Preference-Based Fine-Tuning |
Nahum Maru |
Andy Tang |
| Custom |
Data-Guided Noise (DGN) for Online Exploration |
Alec Lessing |
Yash Kankariya |
| Custom |
To N Equals Infinity and Beyond: Generalization Trends in Post-Trained LLMs |
Sudharsan Sundar |
Annie Chen |
| Custom |
Personalized Pedagogically Aligned English Learning Chatbot via Preference Optimization and Curiosity |
Ziqi Shu, Samantha Liu |
Anikait Singh |
| Custom |
Bluff and Learn: Comparing CFR and NFSP in Liar Bar |
Cici Hou, Louise Li, Phillip Miao |
Jubayer Ibn Hamid |
| Default |
CountUP: Improving LLM Reasoning with Reinforcement Learning and Synthetic Data |
Bruno de Moraes Dumont, Ethan Goodhart |
Shirley Wu |
| Default |
Improving LLM Instruction-Following Capabilities with Multi-objective Reinforcement Learning |
An Doan, Felicity Huang, Linda Liu |
Xingze Dai |
| Default |
Instruction Following via Self-Revision: Fine-Tuning Qwen with Teacher Feedback and Staged Curriculum |
Landon Choy, Tracy Li |
Fengyu Li |
| Default |
Reinforcement Learning Fine-Tuning with Calculator Tool Integration for Mathematical Reasoning |
Yahaya Ndutu |
Prateek Varshney |
| Custom |
Disentangling Knowledge and Reasoning in Medical |
Rahul Thapa, James Zou |
Shirley Wu |
| Custom |
Reinforcement Learning for Protein Motif Scaffolding Design |
Jordan Cahoon, Yaowei Deng |
Sergio Charles |
| Default |
Scaling DPO with Synthetic Preferences for Instruction-Following Language Models |
Keyan Azbijari |
Jinny Chung |
| Default |
Direct vs Adversarial Direct Preference Optimization (DPO vs. A-DPO) |
Ian Lasic-Ellis, Cameron Camp, Dominic Borg |
Prateek Varshney |
| Custom |
WebHierarch: Hierarchical Skill-Learning for Web Agents |
Su Kara, Ameya Jadhav, Allen Chau |
Ashish Rao |
| Custom |
Temperature Autotuning and Efficient Exploration in Online MaxEnt Diffusion RL |
Javier Nieto |
Andy Tang |
| Default |
Multi-Objective Alignment of Language Model using Novel Scalarization Methods |
Chenxi Feng, Zijian Du, Jing Luo |
Haoyi Duan |
| Custom |
Cross-Institution RL Benchmarking for Non-Synthetic Clinical Settings |
Kalyani Limaye |
Xingze Dai |
| Custom |
HIVE: A Multi-Agent Message Pooling Framework |
Ty Toney, Julian Allchin, Diego Bustamante |
Joy He-Yueya |
| Custom |
Improving Small Language Models via Test-Time Prompt Compression and Retrieval |
Neha Balamurugan, Keshav Patel Keval, Pranava Singhal |
Jubayer Ibn Hamid |
| Custom |
RL Methods on Large Language Models: A Curriculum Learning Approach |
Jack Hung, Luke Moberly |
Sergio Charles |
| Default |
Can Small LLMs Learn from Medium Ones? |
Charlie Jiang, Yixing Jiang, Yi Jing |
Prateek Varshney |
| Custom |
Leveraging Deep Q Networks for Kidney Paired Donation Matching |
Odelia Lorch |
Jinny Chung |
| Custom |
AwkAI: An AI-powered Command Line DSL |
Nikesh Mishra |
Xingze Dai |
| Custom |
Reinforcement Learning in Mental Healthcare |
Dean Barrow |
Bassem Akoush |
| Custom |
DPOBind: Ligand Generation Through Direct Preference Optimization of Chemical Language Models |
Rafael Prado Basto |
Sergio Charles |
| Default |
Supervised Fine-Tuning and Curriculum-Guided Direct Preference Optimization on Qwen2.5-0.5B |
Christopher Sun, Abishek Satish |
Anikait Singh |
| Default |
Unified Reasoning Traces for Small Language Model Enhancement: Combining Chain-of-Thought, Logic Predicates, and Executable Code |
Isaiah Hall |
Haoyi Duan |
| Custom |
Align Small Language Models for Personality-Consistent Agent Simulation |
Caroline Santos Marques da Silva |
Fengyu Li |
| Custom |
Applications of Reinforcement Learning in Music |
Arindam Saha |
Yash Kankariya |
| Default |
Role of SFT in RL Tuning of Qwen2.5 |
Chaoqun Jia |
Bassem Akoush |
| Custom |
Analysis of RL Architectures for Delayed Rewards in Super Smash Brothers Melee |
Danica Xiong, Tony Xia |
Jensen Gao |
| Custom |
Improving biological safety of genomic language models via direct preference optimization |
Alejandro Buendia, Mohini Misra, Samantha Mutiti |
Jensen Gao |
| Custom |
Sim2Sim on Legged Robots |
Jiaqi Shao, Chenhao Zhu, Yizhao Hou |
Zhen Wu |
| Default |
Distilling Reasoning Into Conversational Models Using Generated Data |
Jack Younger, Mateo Quiros-Bloch, Carlos Santana |
Anikait Singh |
| Custom |
Adaptive Mask Learning for MaskedMimic via Meta-RL |
Prasuna Chatla |
Jubayer Ibn Hamid |
| Custom |
Improving Q-Learning Sample Efficiency with Representation Learning for 2048 |
Rachael Cooper, Melinda Zhu |
Sri Jaladi |
| Custom |
Graph Reasoning-Tailored (GReaT) VLMs |
Mike Zhao, Raina Song, Joonwon Kang |
Fengyu Li |
| Custom |
Simulation Reinforcement Learning: Improving LLM Predictive Social Modeling |
Niles Egan |
Ashish Rao |
| Default |
RL Fine-Tuning of Language Model for Instruction Following and Math Reasoning |
Yifu Han, Geo Zhang |
Xingze Dai |
| Default |
Using Curriculum to Improve Mathematical Reasoning |
Joshua Shunk |
Ashish Rao |
| Custom |
Teaching Models to Reason about Vision-Based Code Generation using GRPO |
Soham V. Govande, Taeuk Kang, Andrew Shi |
Annie Chen |
| Custom |
From Rules to Strategy: Teaching Reinforcement Learning Agents to Play Sequence |
Nick Monozon |
Xingze Dai |
| Custom |
Protein-Agent – an RL Surrogate for Atomistic Molecular Dynamics |
Chetan Chilkunda |
Jensen Gao |
| Custom |
Budget–Aware Medical Form–Filling via Cooperative Q–Learning and Modular Tool Orchestration |
Ismael Arechiga Duran |
Bassem Akoush |
| Default |
Fine-Tuning Language Models with Curriculum Learning |
Ethan Trepka |
Ashish Rao |
| Custom |
Forest or Field: It Drone Matter |
Sarah Barragan |
Jensen Gao |
| Custom |
From Game-Playing to Self-Driving: Comparing AlphaGo vs AlphaZero Approaches for Driving Controls |
Ellen Xu |
Annie Chen |
| Default |
Curriculum and Augmented RL Fine-Tuning for Aligned Language Models |
Yisi Lyu, Yuqiao Zeng, Jiayu Chang |
Joy He-Yueya |
| Custom |
GRPO&Master: Multi Task Reasoning-First Chess RL |
Parth Sarthi, Salman Abdullah, Krrish Chawla |
Sri Jaladi |
| Custom |
ReCAP: Recursive Context-Aware Reasoning and Planning with Language Models |
Zhenyu Zhang, Tianyi Chen, Weiran Xu |
Prateek Varshney |
| Custom |
Dynamic Dataset Curation |
Alberto Mancarella |
Xingze Dai |
| Custom |
Dora The Explorer: Learning Explorative Policies for Language Model RL-Finetuning |
Ayush Chakravarthy |
Jubayer Ibn Hamid |
| Custom |
Towards Exponential Exploration |
Tejan Karmali |
Jubayer Ibn Hamid |
| Custom |
MARIO: Reinforcement Learning on Image Observations |
Nika Zahedi, Nils Kuhn, Evelyn Yee |
Pulkit Goel |
| Custom |
Reinforcement Learning for Retrieval Optimization in RAG Systems |
Ryan Tan, Jeffrey Xue, Richard Gu |
Anikait Singh |
| Default |
Fine-tuning Large Language Models via Tapered Off-Policy REINFORCE (TOPR) |
Mengge Pu |
Zhen Wu |
| Default |
ICReward: Learning Image-to-Video Consistency Rewards |
Agnes Liang, Renee Zbizika |
Haoyi Duan |
| Custom |
Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs |
Chelsea Zou, Yiheng Yao, Basant Khalil |
Joy He-Yueya |
| Custom |
Hydra: Training End-to-End Parallel Reasoners |
Suppakit Waiwitlikhit |
Bassem Akoush |
| Default |
Strengthening Reasoning: Curriculum-Based SFT on Countdown |
Yoshi Nakachi, Daniel Reichfeld |
Anikait Singh |
| Custom |
Goal-Conditioned Reinforcement Learning for Surgical Robotic Manipulation |
Daphne Barretto, Alycia Lee, Elsa Bismuth |
Jensen Gao |
| Custom |
KernelCompare: Optimizing CUDA Kernel Generation on Slow vs Fast Kernel Pairs |
Aryan Gulati |
Anikait Singh |
| Default |
Guiding Language Model Outputs via Principles Learned from User Feedback |
Justin Adjasu |
Shirley Wu |
| Custom |
Cook or be Cooked: The Bitter Lesson |
Derek Askaryar, Parthav Shergill |
Jensen Gao |
| Custom |
Aligning Text-to-Image Diffusion Models using Reinforcement Learning from Human Utility |
Wendy Yin, Yiwen Zhang |
Daniel Shin |
| Custom |
Training Robotics Policies With Imitation Learning from Simulated Teleoperation: A Proof of Concept for the BEHA VIOR-1k Project |
Niklas Vainio |
Sirui Chen |
| Custom |
RL-Guided Data Selection for Language Model Finetuning |
Harshit Gupta, Animesh Jha, Rohan Garg |
Sirui Chen |
| Custom |
A Critical Study of the Entropy Bonus for Exploration |
Ifdita Hasan Orney, Iddah Mlauzi, George Kojo Frimpong Birikorang |
Jubayer Ibn Hamid |
| Custom |
Reflection-Augmented QA: Reinforcement Learning Meets Online Search |
Zhulian Huang, Binbin Li, Ying Lu |
Joy He-Yueya |
| Custom |
Guess, Group, Reward: Solving Connections with Policy Learning over Latent Assignments |
Iris Xia, Justin Wei, Linda Tong |
Sergio Charles |
| Custom |
Discretizing Action Space to Improve Data Center Thermal Control with RL |
Aanika Atluri, Riya Karumanchi |
Anikait Singh |
| Default |
Improving LLM Reasoning Through Optimized Chain of Thought Revisioning and Precomputation |
Akhil Vyas |
Anikait Singh |
| Custom |
Quadruped Parkour–Mixture of Experts with Visual Input to Enable Generalization |
Michael Ziegltrum |
Zhen Wu |
| Custom |
Offline RL with Decision Transformers for T1D Glucose Control |
Katherine Greatwood |
Jinny Chung |
| Custom |
Training Large Language Models as Active Agents for Dense Control Tasks |
Ishan Khare, Gabe Seir, Anthony Zhan |
Sri Jaladi |
| Default |
Combining Direct Preference Optimization with Reward-Based Reranking for Improved Instruction-Following in Small Language Models |
Renee Qin, Nicole Garcia |
Fengyu Li |
| Custom |
Training Large Language Models as Optimizers for Drug Discovery |
Buttenschoen, Chakraborty, Hla, Wang |
Jubayer Ibn Hamid |
| Default |
Countdown to Brilliance: Evolving Math Reasoning Through Train and Test-Time Compute |
Codey Sun, Doug Fulop, Xiang Li |
Xingze Dai |
| Custom |
Entropy-Based Rewards for Chain of Thought Reasoning |
Julien Darve |
Anikait Singh |
| Default |
Enhancing Mathematical Reasoning Capabilities Through Frontier Examples |
Ohm Patel, Will Healy, Klara Andra-Thomas |
Zhen Wu |
| Custom |
ReFL guided Text2SVG Generation for Tactile Graphics |
Seonghee Lee |
Yash Kankariya |
| Custom |
A Dream Is All You Need |
Devin Gupta, Ali Ahmad, Annie Lee |
Jensen Gao |
| Custom |
Deep Reinforcement Learning for Efficient PDE Solvers |
Ivan Ge |
Sri Jaladi |
| Custom |
RL-Based Game Generation with PRewrite: Iterative Prompt Optimization for AI-Assisted Game Development |
Abel Dagne |
Bassem Akoush |
| Default |
CDPO: Curriculum-Driven Preference Optimization for Small-Scale LLM Alignment |
Adam Chun, Josh Francis, Tom Nguyen |
Joy He-Yueya |
| Custom |
WhiteboardGym |
Shraman Kar, Shreyas Kar |
Jubayer Ibn Hamid |
| Custom |
Less Details, But Be Thorough: Addressing Contradicting User Preferences in Multi-turn LLM-based Conversation |
Eugenie Shi, Haorui Guo, Shuojia Fu |
Jinny Chung |
| Custom |
Exploration Strategies for Reasoning Fine-tuning |
Nick Mecklenburg |
Joy He-Yueya |
| Custom |
Learning Optimal Military Resource Allocation using Reinforcement Learning |
Lucas Bosman |
Jensen Gao |
| Custom |
Video Style Transfer with Reinforcement Learning |
Amelia Kuang, Sirui (Ariel) Chen |
Jinny Chung |
| Custom |
Multi-Negative Softmax DPO for Legal Reasoning |
Connor Huang Marsh |
Haoyi Duan |
| Custom |
Automatic Piano Transcription with Diffusion Q-Learning |
Alex Hodges, Dante Danelian, Ramya Ayyagari |
Bassem Akoush |
| Custom |
Deep Reinforcement Learning for Rhythm Game Control |
Everett Lee |
Sergio Charles |
| Custom |
Contrastive Test-Time Scaling |
Nattaput Namchittai, Julia Huang |
Ashish Rao |
| Default |
Improving Math Reasoning via GRPO on Offline Synthetic Hard Negative Generation |
Arun Moorthy, Alan Ma |
Shirley Wu |
| Default |
Robot Trains Robot: Automatic Real-World Policy Adaptation and Learning for Humanoids |
Kaizhe Hu, Haochen Shi, Yao He |
Zhen Wu |
| Default |
Stylized user preference alignment with Direct Preference Optimization (DPO) |
Justine Breuch, Rafael Cardoso Ferreira |
Jinny Chung |
| Custom |
Reinforcement Learning in Cryptocurrency Trading |
Alex Bloom, Michael Liu |
Xingze Dai |
| Custom |
SigmaGo: Five Steps Stones Ahead |
Emily Xia, Eric Chen, Robin Li |
Sri Jaladi |
| Custom |
Real-Time Vision-Language-Action Control for Open-Source Quadruped Robots |
Sagar Manglani |
Haoyi Duan |
| Custom |
Reinforcement Learning for Sea-surface UAV Tracking of Underwater Submarines |
Maxime de Belloy, Varun Agarwal |
Zhen Wu |
| Custom |
Adaptive action chunk selector |
Ruopei Chen, Ke Wang, Yazhou Zhang |
Marcel Torne |
| Custom |
Improving Multi-Agent Path Planning via Indirect Communication in Cooperative Tasks |
Abhinav Shaw |
Shirley Wu |
| Custom |
Compression of Thought |
Andrew Lanpouthakoun |
Xingze Dai |
| Custom |
Reinforcement Learning for Predicting Future Stock Performance |
Jonathan Larkin, Ram Komarraju, Tamika Bassman |
Pulkit Goel |
| Custom |
Goal based Long-Horizon Multi-task Robot aided by LLM or VLP with Action Primitives |
Harish Balasubramaniam |
Sirui Chen |
| Custom |
Direct Preference Optimization for Low-Level Actions in Robotic and Simulation Learning |
Kenneth Ma, Parker Stewart, Thomas Yim |
Marcel Torne |
| Custom |
Adversarial Strategy Generation for RL Poker Agents Using LLMs |
Ben Felter, Travis Senf, Alex Michael |
Annie Chen |
| Custom |
Toward Whole-Body Locomotion for Humanoid Robot |
Tae Yang, Zhicong Zhang |
Zhen Wu |
| Custom |
A Novel Approach Using Implicit Q-Learning to Optimize ER Patient Triage |
Aadhav Prabu |
Jinny Chung |
| Custom |
Adversarial RL for Hard-Negative Code Generation |
Eric Li, Ella Mao |
Jinny Chung |
| Custom |
Logits and Labyrinths: Using Meta RL to Play Text Based Adventure Games |
Javokhir Arifov, Philip Baillargeon, Nathanael Cadicamo |
Xingze Dai |
| Custom |
MergeRL: Preference-Driven Reinforcement Learning for 2048 |
Andy Liang, Abhinav Sinha, Jeremy Tian |
Sri Jaladi |
| Custom |
Dynamics Modeling between Learnable State Space Subsets for Data Efficient Reinforcement Learning |
Hyun Dong Lee, Kyle Ellefsen |
Jubayer Ibn Hamid |
| Custom |
Adaptive Trade Execution using Deep Reinforcement Learning |
Deep Learni |
Andy Tang |
| Custom |
AI Health Database: Structuring Medical Notes and Imitation Learning for Clinical Decision Support |
Jingdong Xiang |
Jensen Gao |
| Default |
Strategies for Improving Math Reasoning: SFT, RLOO, Synthetic Data, and Test-Time Inference |
Allison Jia, John Hsu, Jacob Faierman |
Andy Tang |
| Custom |
Comparative Study of Zero-Shot LLM Planning vs. Reinforcement Learning in Procgen |
James Cheng, Andy Ouyang, Nishikar Paruchuri |
Andy Tang |
| Default |
Learning From Failure - Loss Based Curriculum Learning |
Tomas Coghlan, Ismail Mardin, Mattheus Wolff |
Joy He-Yueya |
| Custom |
Reinforcement Learning in Tetris with Multi-Agent Systems |
Eric He, Deren Ji, Karl Songcuan |
Jinny Chung |
| Default |
RL fine-tuning of Language Models with Synthetic Data |
Daniel Sorvisto |
Sirui Chen |
| Custom |
3PO - Causal Model-Based Reinforcement Learning Agent for Adaptive Pricing and Promotion Planning |
Minha Hwang |
Daniel Shin |
| Custom |
Rapid Feedback Loop Mitigation for Fair Policing |
Vyoma Raman |
Daniel Shin |
| Custom |
Implementing and Improving the Seminal Automated Red-Teaming RL Formulation |
Allie Griffith, Emma Casey |
Yash Kankariya |
| Default |
RL on Quarto: Experimenting with Dense Reward Environments |
Ishvi Mathai, Humishka Zope, Aditri Patil |
Prateek Varshney |
| Custom |
A Comparative Study of Deep Reinforcement Learning and Expectimax Search for the Game 2048 |
Soham Konar, Danny S. Park, Andrew Wu |
Sri Jaladi |
| Custom |
Fine-tuning Rule Merging for Efficient Math Reasoning |
Yifan Zhang |
Prateek Varshney |
| Default |
Augment and Align: Leveraging LLMs for Improved Preference Data in DPO |
Li Miao, Haoran Qi, Yue Shen |
Annie Chen |
| Custom |
Incentivizing Unsafe Reasoning in Chain-of-Thought |
Katherine Worden, Jeong Shin |
Sirui Chen |
| Custom |
Advancing Multi-Agent Reasoning in Open-Face Chinese Poker |
Alice Guo, Ramya Iyer, Isabella Lee |
Pulkit Goel |
| Custom |
Optimizing 3D Scene Agent Exploration for Novel View Generation |
Brian Lee |
Bassem Akoush |
| Custom |
Simulation-Based Policy Training for Costly-Data Scenarios |
Ethan Bogle |
Jinny Chung |
| Default |
Instruction Following with Qwen2.5: Multi-Objective Reinforcement Learning |
Gabrielle Belanger, George Dimopoulos, Nahome Hagos |
Haoyi Duan |
| Default |
DPPO: Direct Preference and Penalization Optimization |
Pengwei Sun |
Xingze Dai |
| Custom |
Robotic Tray Sorting with Soft Actor-Critic and Task-Conditioned Learning |
Keyuan Wu |
Sirui Chen |
| Custom |
Gaming the Video Against Yourself |
Joshua Boisvert |
Marcel Torne |
| Custom |
Uncertainty-Aware Planning for Off-road Autonomous Driving with Vision-Conditioned Learned Dynamics |
Aaron Feldman |
Annie Chen |
| Custom |
TANTALUS |
Nathaniel Voorhies |
Marcel Torne |
| Default |
Aligning Language Models with Multi-Objective Reinforcement Learning |
Maoan Wang, Yawen Guo |
Jubayer Ibn Hamid |
| Default |
Learning Obstacle-Avoiding Drone Navigation with PPO and SAC |
Claire Du |
Yash Kankariya |
| Custom |
Analysis of RL Architectures for Delayed Rewards in Super Smash Brothers Melee |
Danica Xiong, Tony Xia |
Jensen Gao |
| Custom |
Training an Expert Negotiator: RL-Based Fine-Tuning of LLMs to Improve Social Negotiation Skills |
Akaash Kolluri, Sally Zhu |
Fengyu Li |
| Custom |
Recipes for Language Model Reasoning through Reverse Curriculum Reinforcement Learning |
Jack Hsieh, Dillon Nguyen, Ethan Zhang |
Fengyu Li |
| Custom |
IaC-DPO: Improving Infrastructure-as-Code Generation with DPO-Refined Controller Models |
Sophia Zhang, Aditya Gupta |
Bassem Akoush |
| Custom |
Reinforcement Learning for Practical Wildfire Suppression with UAV Agents |
Amy Guan, Emily Tianshi, Ryne Reger |
Xingze Dai |
| Default |
Self-Improvement variants on DPO |
Yiyang Hao |
Sri Jaladi |
| Custom |
BountyBench: The Design of Environments and Rewards for Cybersecurity Agents |
Andy Zhang, Riya Dulepet |
Joy He-Yueya |
| Custom |
Reward Modeling and Policy Optimization for Volleyball Rally Decision-Making |
Rishi Alluri, Will Furlow |
Haoyi Duan |
| Default |
Many Minds, One Goal: Enhancing DPO in a Multi-Agent Framework |
Yasmine Mabene |
Sergio Charles |
| Custom |
MARIO: Reinforcement Learning on Image Observations |
Nika Zahedi, Nils Kuhn, Evelyn Yee |
Pulkit Goel |
| Custom |
Autonomous Drone Navigation for First Response |
Victor Greenberg, Carlos Hernandez |
Marcel Torne |
| Custom |
Learning from Experts: Three Stage Training for Multi-Action Physics-Based Control |
Andi Xu, Eric Chen |
Sirui Chen |
| Custom |
Robust Aerodynamic Shape Optimization on NIGnets using Deep Reinforcement Learning |
Atharva Aalok |
Sergio Charles |
| Default |
Default Project: Supervised Fine Tuning of Qwen2.5-0.5B on the Countdown Dataset |
Anushree Aggarwal, Saif Moolji, Bryce Tiglon |
Pulkit Goel |
| Custom |
Learning CAD Program Generation using Reinforcement Learning |
George Nakayama |
Xingze Dai |
| Custom |
Neural Decoding of Heard Speech Using RL-tuned LLMs |
Ricky Rojas, Ramani Tyagi |
Shirley Wu |
| Custom |
Sequential Reinforcement Learning on Economic Discourse for Real-Time |
Sarang Goel, Chirag Maheshwari |
Sri Jaladi |
| Custom |
Morphology-Aware Imitation Learning for Cross-Robot Generalization |
Jermaine Zhao |
Bassem Akoush |
| Custom |
Exploration in a Reward Uncertain Environment |
Denis Liu, Victor Li |
Anikait Singh |
| Custom |
Bluffing with Precision: LLM-Guided Strategy and Opponent Modeling in Multi-Agent Poker |
Sara Kothari, Yanny Gao |
Fengyu Li |
| Custom |
PPO Reinforcement Learning for Pipetting Laboratory Automation |
Gurmenjit Kaur Bahia, Ashley Chen |
Zhen Wu |
| Custom |
Reward Densification For RL in Multi-hop Calculators |
Coco Xu, Jenny Chen, Yolanda Wang |
Daniel Shin |
| Custom |
Vision-Based Autonomous Landing through Imitation Learning |
Mike Timmerman |
Pulkit Goel |
| Default |
Off-Policy Finetuning for LLM Math Reasoning |
Althea Hudson, Narvin Phouksouvath |
Ashish Rao |
| Custom |
RL4KG-RAG: Reinforcement Learning for Knowledge Graph Optimization in Open-Domain QA |
Banda Gayatri Srujana |
Haoyi Duan |
| Custom |
Real-Time Edge Deployment of Vision and RL Policy on a Quadruped Robot |
Frances Raphael, Saron Samuel, Yohannes Aklilu |
Jinny Chung |
| Custom |
Epistemic Uncertainty Estimation for Human-in-the-Loop Reinforcement Learning |
Ngorli Paintsil, Natalie Greenfield |
Annie Chen |
| Custom |
ACT-RL: Efficient Fine-Tuning through Reinforcement Learning for Robotic Manipulation |
Rohan Sikand, Diego Stone, Andre Yeung |
Haoyi Duan |
| Default |
Adaptive Test-Time Inference Extensions for Fine-Tuned Instruction-Following Models |
Diego Valdez Duran |
Sri Jaladi |
| Custom |
Test-Time Training for Efficient RL Sequence Modeling |
Alexander Waitz, Vrushank Gunjur, Kenny Dao |
Sri Jaladi |
| Custom |
Decision-Focused Offline Deep Reinforcement Learning for Healthcare Policy Optimization |
Praneet Bhoj, Ali Eshragh, Yuexing Li |
Jubayer Ibn Hamid |
| Default |
Exploratory Preference Optimization and Synthetic Dataset Generation for Preference Alignment |
Pranshu Chaturvedi, Parth Shroff |
Anikait Singh |
| Default |
Mastering Mathematical Reasoning in Large Language Models via Self-Play and Reinforcement Learning |
Fan Wang |
Joy He-Yueya |
| Default |
Improving Chain-of-Thought Reasoning in LLMs via Generative Reward Modeling |
Akhilesh Balasingam, Axel Wennstrom, Vishal Jain |
Jinny Chung |
| Custom |
Latent Observation Forecasting for Long-Horizon Imitation Learning |
Annmaria Antony, Rebecca Joseph, Mikul Rai |
Jubayer Ibn Hamid |
| Custom |
Refining and Expanding the Self-Consistency Strategy |
Charlotte Nicks, Teddy Ganea, Eli Myers |
Annie Chen |
| Custom |
ARPO: Adaptive Reward-driven Prompt Optimization |
Arnav Singhvi, Shreyas Agarwal |
Shirley Wu |
| Custom |
Towards a General Social Deduction Reinforcement Learning Model |
Isaac Zhao, Wesley Tjangnaka |
Fengyu Li |
| Custom |
Exploring Video Generation for Robot Learning |
Ashna Khetan, Daphne Liu, Poonam Sahoo |
Jensen Gao |
| Default |
Cooperative Self-Improvement: Hint-Augmented Self-Play for Math Learning |
Bar Weiner, Aadi Nashikkar |
Sri Jaladi |
| Custom |
Comparative Performance and Stability Analysis of PPO and DDPG Agents on Inverse Design of Meta-materials for Demultiplexing |
Selin Ertan, Matthew Villescas |
Sergio Charles |
| Custom |
Swing For the Fences: Deep Reinforcement Learning for Batting Practice |
Chase Joyner, Mack Smith |
Yash Kankariya |
| Default |
TSCS: A Two-Staged, Curriculum-Based Synthetic Data Approach to Improving Countdown Performance |
Ayush Alag |
Jubayer Ibn Hamid |
| Custom |
Smart Strategy: Dynamic Ordering for Multi-Agent RL Updates |
Michael Zhang, Karthik Vetrivel, Arjun Jain |
Marcel Torne |
| Default |
Leveraging RL to Improve LLM Response Generation |
Ethan Hellman |
Haoyi Duan |
| Default |
What’s On My Mind: Modelling Latent Human Thoughts via Reasoning Traces |
Agam Bhatia |
Xingze Dai |
| Custom |
ChokoZero |
Eugene Francisco, Bodo Wirth, Naveen Kannan |
Annie Chen |
| Default |
RL Fine-Tuning of Language Models |
June Zheng, Yihan Zhao, Zixin Li |
Ashish Rao |
| Custom |
MEMBOT: Memory-Based Robot in Intermittent POMDP |
Eyan Noronha, Yousef (Youzhi) Liang |
Sergio Charles |
| Custom |
Verbal Reinforcement Learning for Multi-Agent Systems |
Aakriti Lakshmanan, Rohan Davidi, Sathvik NaLLaMalli |
Shirley Wu |
| Default |
Fine-Dining with Fine-Tuning: Constraint-Driven Recipe Bots |
Aryan Sahai, Marcus Lintott, Regina Sevilla |
Jubayer Ibn Hamid |
| Default |
Navigating the Pareto Frontier: Multi-Objective Optimization for Fine-Tuning Language Models |
Idil Defne Cekin |
Sri Jaladi |
| Default |
SFT, DPO & RLOO on UltraFeedback & Countdown with Off-Policy Replay |
Billy Gao |
Haoyi Duan |
| Custom |
Sim2Real: Simulation to Reality Transfer for Autonomous Driving World Models |
Jack Liu, Ryan Catullo, Mac Ya, Sunny Yu |
Haoyi Duan |
| Custom |
On-the-Fly Adaptation for Out-of-Distribution Robustness in Reinforcement Learning |
Emily Liu, Flora Yuan |
Annie Chen |
| Custom |
Forest or Field: It Drone Matter |
Sarah Barragan, Karan Bhasin, Sukrut Oak |
Fengyu Li |
| Custom |
Human Chess: A Novel Searchless RL-based Chess Agent Capable of Multi-ELO Human-Like Play |
Prerit Choudhary, Rikhil Vagadia, Ankush Dhawan |
Pulkit Goel |
| Custom |
Step-wise Policy for Rare-tool Knowledge (SPaRK) |
Gabriel Bo, Koa Chang, Justin Gu |
Pulkit Goel |
| Default |
Learning to Think: Integrating Test-Time Compute into RL Fine-Tuning |
André Natal, Yacine Dolivet, Thomas Huang |
Sirui Chen |
| Custom |
Reinforcement Learning for Covered Path Planning on Robot Vacuum |
Ngoc Vo, Yiran Fan, Siqi Ma |
Zhen Wu |
| Default |
Multi-Objective Exploration: A Novel Reinforcement Learning Method to Solve Instruction Following and Mathematical Reasoning Problems |
Hugo Nathanael Yuwono |
Sergio Charles |
| Default |
Train smarter LLMs: balancing Data Quality, Fine-Tuning and Reinforcement Learning |
Gabriel Mesquida Masana, François Chesnay |
Ashish Rao |
| Default |
Improving Reasoning with Multi-Agent Systems |
Benjamin Marks |
Prateek Varshney |
| Custom |
Training Super Smash Bros. Melee Agents |
Matthew Lee, William Hu, Samuel Do |
Daniel Shin |
| Custom |
Reinforcement Tuning Open Source LLMs for Kernel Generation |
Aksh Garg, Jeffrey Heo, Megan Mou |
Fengyu Li |
| Custom |
Towards a General Social Deduction Reinforcement Learning Model |
Isaac Zhao, Wesley Tjangnaka |
Fengyu Li |
| Default |
Precise RLAIF: Improving Instruction Following with Sentence-Level AI Feedback |
Marcelo Peña |
Jinny Chung |
| Default |
Improving LLM Mathematical Reasoning Capabilities Using External Tools |
Jack Albright, Sheden Andemicael |
Anikait Singh |
| Custom |
Investigating Reward Hacking When Using Vision-Language Models as Reward Models (VLM-RMs) |
Kai Fronsdal, Emma Sun, Zoe Quake |
Marcel Torne |
| Custom |
Granularity in Reward Shaping for Automated Theorem Proving |
Marcelo Sena |
Annie Chen |
| Custom |
Beyond Single Rewards: Multi-Objective Scalarization in Few-Shot Preference Learning |
Gabriel SantaCruz |
Fengyu Li |
| Custom |
Graph-based Stock Market RL Agent |
Nevin Aresh |
Prateek Varshney |
| Custom |
Fine-Tuning Vision-Language Model with Reinforcement Learning for Visual Question Answering |
Dayoung Kim |
Marcel Torne |
| Default |
RL Fine-Tuning of Large Language Models |
Gerardus de Bruijn, Nareauphol Liu, Dev Jayram |
Ashish Rao |
| Default |
Synthetic Data Augmentation for LLM Training |
Ben Gonzalez-Maldonado, Diego Padilla, Emily Park |
Jinny Chung |
| Custom |
A Truly Hollow Knight: Resource Constrained Visually Dense Boss Fights |
Anthony Maltsev |
Pulkit Goel |
| Default |
Fine-Tuning of Large Language Models using Retrospective Exploration with Critic-Augmented Progress |
Caroline Van, Natalie Wang |
Sirui Chen |
| Default |
Debate for Multiagent Large Language Model Tuning |
Andrew Wang |
Pulkit Goel |
| Custom |
RL for Autonomous Drone Navigation |
Guilherme Bonfim, Matteo Tucci, Sumedha Kethini |
Marcel Torne |
| Custom |
Towards Size-Invariant Policy Learning in Grid Environments via Curriculum-Guided Transfer |
Yiling Huang, Wei Liu |
Haoyi Duan |
| Custom |
Enhancing LLM Reasoning on External Knowledge |
Claire Tang |
Sergio Charles |