| Custom |
Outstanding Project A Semi-Decentralized Approach to Scalable Multiagent Control |
Avi Singh, Mahdi Al-Husseini |
Rahul Chand |
| Custom |
Outstanding Project EXPO-FT: Sample-Efficient Reinforcement Learning Finetuning for Vision-Language-Action Models |
Kuo-Han Hung |
Perry Dong |
| Custom |
Outstanding Project Hybrid Reinforcement Learning for Chip Macro Placement |
Moritz Schreyoegg, Jack Herrmann, Daniel Hagenlocker |
Anushree Aggarwal |
| Default |
Outstanding Project SFT Augmentation and Replay-Based RL for Countdown Reasoning |
Aizada Nurdinova, Ellie Sampson, Adhi Daiv |
Anikait Singh |
| Custom |
Honorable Mention Beyond Test Scores: Reward Design for Offline RL in Personal Finance Tutoring |
Stella Wu, Daniel Mark Argento |
Alex Nam |
| Custom |
Honorable Mention CARVE: Concept Avoidance via Reward-shaped Visual Erasure |
Febie Jane Lin, Fabio Ibanez, Chris Stanulet |
Rahul Chand |
| Custom |
Honorable Mention π-Drive: Reinforcement Post-Training Turns a Manipulation VLA into a Real-Time Driving Policy |
Felipe Laufer Barbosa, Mark Music, Alex Jihun Kim |
Tian Gao |
| Default |
Honorable Mention Frontier Curriculum and Adaptive Test-Time Compute for Efficient RLOO |
Andy Sung Jae Kim, Marco Antonio Vizcarra Tovar |
Anikait Singh |
| Custom |
Honorable Mention MARC: Multi-Agent Role Coordination |
Olufeolu Oluwapelumi Kolawole, Karn Kaura, Nihar Mudigonda |
Marcel Torne |
| Custom |
Honorable Mention Point and Pick: Bounding-Box Conditioned Diffusion Policies and Offline RL for Target-Specific Robot Manipulation |
Raul Garreta Tompson, Joshua Alexander Bowden, Swaroop Pal |
Marcel Torne |
| Default |
Honorable Mention Reading vs. Writing a Near-Oracle Internal Verifier: How RL Design Determines Whether a Correctness Probe Is Safe |
Abraham Yeung, Anagha Ramaswamy |
Anubha Mahajan |
| Custom |
Honorable Mention REFINE: Reinforcement-based EHR Feature Induction and Editing |
Ayeeshi Lakshmi Poosarla, Ryan Arya Nayebi |
Anushree Aggarwal |
| Custom |
Honorable Mention SciencePRM: Process-Reward RL (GRPO) for the Scientific Validity of Intermediate Reasoning Steps |
Zijian (Carl) Ma |
Joy He-Yueya |
| Custom |
A Contextual Bandit for Cheap-vs-Expensive Code Generation |
Mauricio Berlanga Carrillo |
Rahul Chand |
| Default |
A Self-Play Algorithm for Countdown |
Nikhil Raman, Nadim Isaac |
Max Du |
| Custom |
Action-Token Pruning for Sample-Efficient RL Fine-Tuning of Robot Policies |
Elvin Fu, Cole Donald Van Hersett |
Tyler Lum |
| Default |
Active RLOO: Online Filtering with Adaptive K |
Zhengmao Liu |
Max Du |
| Default |
Adaptive Competence-Boundary Curricula for Countdown with RLOO |
Daniel Steven Schreck, Sheng-Yong Niu, Karishma Aggarwal |
Max Du |
| Default |
Adaptive Curriculum Learning for Reinforcement-Trained Reasoning |
Agam Iheanyi-Igwe, Olatayo Anthony Sobomehin |
Riya Karumanchi |
| Default |
Adaptive Curriculum Learning for RL-Based Arithmetic Reasoning |
Yu Chi Hsu, Ryan He |
Shengqu Cai |
| Default |
Adaptive Curriculum Learning for RLOO on Countdown |
Jerry Jiayu Li |
Yash Kankariya |
| Default |
Adaptive Curriculum Learning with Difficulty-Conditioned Entropy Regularization for RL Fine-Tuning of Large Language Models |
Mahmoud Elgenedy |
Shengqu Cai |
| Default |
Adaptive Curriculum RL via LLM-Guided Complexity Scoring |
Louis Weisdorf, Laszlo Bollyky |
Riya Karumanchi |
| Default |
Adaptive Curriculum RLOO for Verifiable Arithmetic Reasoning in Language Models |
Kelechi Onuoha |
Abhijnya Bhat |
| Default |
Adaptive Difficulty Scheduling: A Competency-Gated Curriculum for Small-LLM Reasoning |
Donna Choi |
Ke Wang |
| Default |
Adaptive Difficulty-Aware Curriculum Learning for RLOO |
Darren Chan, Jayna Grace Huang, Sophie Zhang |
Anikait Singh |
| Custom |
Adaptive Multi-Turn Red-Teaming for Mental Health Adjacent Language Model Safety |
Anya Han Zhang |
Anushree Aggarwal |
| Default |
Adaptive Test-Time Sampling for RL-Fine-Tuned Reasoning Policies |
Andrea Ji Woo Nam Song |
Anikait Singh |
| Default |
Advantage-Driven Synthetic Curriculum for Reinforcement Learning based Fine-Tuning of Large Language Models |
Nate Demchak, Pravin Ravishanker, Haosheng Li |
Anikait Singh |
| Default |
Adversarial Co-evolution of a Difficulty Judge and an LLM Generator with an RLOO Policy for Cold-Start Countdown Reasoning |
Viveak Ravichandiran |
Riya Karumanchi |
| Custom |
Algorithm Sequencing and Curriculum Learning in Deep Reinforcement Learning |
Ryan D'Cunha, Ethan Hersch, Abhinav Chinta |
Ifdita Hasan Orney |
| Custom |
Alpha-Informed Optimal Trading Execution: Reinforcement Learning with Domain Informed Priors |
Ryan Nathan Padnis |
Ethan Hsu |
| Custom |
An Actor-Critic Neural Reachability Solver for High-Dimensional Zero-Sum Games |
Zeyuan Feng |
Rahul Chand |
| Custom |
An RL Framework for Persistent Memory Attacks on LLM Agents |
Mihir S Menon, Zihan Wang, Aarav Arora |
Joonwon Kang |
| Custom |
Architecture-Dependent Effects of Data Curation in Robot Manipulation Imitation Learning |
Kailana Baker-Matsuoka |
Jonathan Yang |
| Custom |
Augmenting Cybersecurity Capabilities with Adversarial RL and Autoregressive Action Parameterization in Small LLMs |
Anna Wu, Ethan Jun-shen Ho |
Ethan Hsu |
| Custom |
Automated Red Teaming using Reinforcement Learning |
Nubia Elena Correa, Abi Lopez, Annie Gisselle Villalta |
Joonwon Kang |
| Custom |
Bandits for Cold-Start: Exploration vs. Exploitation in Recommendation |
Ela Naz Sigin, Anastasiya Masalava, Eva Casto |
Anushree Aggarwal |
| Custom |
Begging for DEI: Increasing LLM Math Performance by Increasing Diversity |
Tyler Kinh Ho |
Joy He-Yueya |
| Custom |
BLIND: Bipedal Locomotion with Intermittent Navigation Data for Environmental Hazards |
Iris Zixiao Xu, Jamin Jia-Ming Xie, Eric Liang |
Tian Gao |
| Custom |
Body Schema Pretraining for Sample-Efficient Reinforcement Learning in Robotic Control |
Rishabh Malviya |
Tian Gao |
| Default |
Bounded Proximity Rewards: Accelerating Post-Training Reinforcement Learning for Arithmetic Reasoning without Policy Divergence |
Ian Yue-Ran Chen |
Yash Kankariya |
| Default |
Bridging the Gap: Optimal Transport-Guided Curriculum Learning |
Cole A Citrenbaum, Noah Cowan, Aymen Echarghaoui |
Yash Kankariya |
| Default |
Budget-Conditioned Tool Use for Countdown Reasoning |
Luca De Donno |
Yash Kankariya |
| Custom |
Calibrated Reinforcement Learning for LLM-Guided Perturbation Screen Design |
Anton Thieme |
Megha Srivastava |
| Default |
Can Teacher-Generated Curriculum Strategies Improve LLM Reasoning? |
Amrita Malhotra, Giselle Rivera, Zofia Dudek |
Anikait Singh |
| Custom |
City Builder: Does Induced Demand Make Transit Network Design a Reinforcement Learning Problem? |
Daniel Marcelo Mottesi |
Joy He-Yueya |
| Custom |
CKR: A Novel Baselining Improvement for T2I-Model Reinforcement Learning Using Noisy Rewards |
Ethan Zhang, Tatiana Zhang, Jenny Wei |
Tyler Lum |
| Custom |
Collapse-Resistant Constitutional AI for Small Language Models through Synthetic Revision Filtering |
Xue Zhang |
Alex Nam |
| Custom |
Compressed Memory via LoRA Fine-Tuning for Vision-Language-Action Models |
Nathaniel D Laurent, Matthew Michael Musson |
Tian Gao |
| Custom |
Controlling a Fusion Reactor with Reinforcement Learning |
Deniz Zeren Yilmaz, Sameer Agrawal, Siddharth M. Bhatia |
Marcel Torne |
| Custom |
Cooperative Fine-Tuning of Pretrained Vision-Language-Action Policies: Centralization, Communication, and Inference Recipes on TwoArmTransport |
Kyler Shu, Purushotham Mani, John Tucker |
Joonwon Kang |
| Default |
Cost-Aware Tool-Integrated Reasoning for Countdown Arithmetic |
Patrick Wang |
Abhijnya Bhat |
| Default |
Countdown as an Agentic Optimization Problem |
Tao Sun |
Yash Kankariya |
| Custom |
Critic-Targeted Exploration: Learned Per-Rollout Targeting of Entropy Bonuses in GRPO |
Hlumelo Notshe |
Joy He-Yueya |
| Custom |
Curiosity-Driven Memory-Augmented Reinforcement Learning for Adaptive Robot Tasks |
Jevon Mao, Yifan Geng, Alexia Huang |
Marcel Torne |
| Default |
Curriculum Learning for Countdown Reasoning in RL Fine-Tuning: Static Schedules Help, Adaptive Frontiers Forget |
Fengzhou Li |
Yash Kankariya |
| Default |
Curriculum Learning with Success-Rate-Driven Adaptive Gaussian Scheduler |
Andrew Tin-Lok Lee |
Anubha Mahajan |
| Default |
Curriculum Sampling for RLOO Fine-Tuning on Countdown |
Catherine M Zhang, Nora Menon |
Anubha Mahajan |
| Custom |
Curriculum Strategies for Bimanual Dexterous Piano Playing in RoboPianist |
Gabrielle Marie Walrath, Irene Ai Lin, Ethan Tamer Farah |
Megha Srivastava |
| Custom |
Decentralized Q-Learning of the Social Optimum in Strategic Experimentation |
Farzad Pourbabaee |
Joy He-Yueya |
| Custom |
Deep Reinforcement Learning for Campus-Scale Vehicle-to-Grid Fleet Scheduling |
Lisa Li, Yaqi Fan |
Joy He-Yueya |
| Custom |
Deep Reinforcement Learning for User Welfare Optimization on Recommendation Systems with Competing Content Creators |
Julia T Isaac, Tanush Talati |
Alex Nam |
| Custom |
Dense Language Rewards For Reasoning |
Diya Janvi Kejriwal, Andrew Su, Gurmeher Kaur |
Marcel Torne |
| Default |
Dense Rewards for RLVR on Countdown via ASTs |
Ahmed Sherif Ahmed Elbakry Mohamed |
Ke Wang |
| Custom |
Dense Step-Level Rewards via Process Reward Models for Mathematical Reasoning |
Dhruv Arcot |
Joy He-Yueya |
| Custom |
Deployment-Time Risk Scoring for a Safe RL Agent under Distribution Shift |
Luis Marc Botin-Sanz de Sautuola |
Anushree Aggarwal |
| Custom |
DiSCo: Distilled Steering via Consolidation for Robot Diffusion Policies |
William Z Liu, Sakthivel Sivaraman, Jerry Gu |
Max Du |
| Default |
Diversity-Aware RLOO for Pass@k Reasoning in Countdown |
Jane Yang, Will Nathaniel Hansen |
Anubha Mahajan |
| Default |
Diversity-Preserving On-Policy Distillation and Test-Time Verification for Countdown Reasoning |
Juntao Cheng, Qi Wu, Zhuoang Tao |
Riya Karumanchi |
| Custom |
Do generated futures help robot policies through representation alignment or explicit conditioning? |
Minyeong Kim |
Jonathan Yang |
| Custom |
Do Persona Vectors Track Sycophancy Under RL Fine-Tuning? |
Artur Barbosa Carneiro |
Abhijnya Bhat |
| Custom |
Does LoRA Rank Requirement Scale with Reward Density? An Empirical Study of Policy-Gradient Post-Training |
Mark William Gernitis |
Tyler Lum |
| Custom |
Effect of Stigmergic Potential Fields on Drone Swarm Coordination |
Kevin Michael Porter |
Tyler Lum |
| Custom |
Emergent Cooperation in Multi-Agent Pandemic Resource Allocation |
Riya Karthik Narain, Brooke Hunter Ballhaus |
Alex Nam |
| Custom |
Encouraging Taxonomy-Based Diversity within RL Automated Multi-Turn Red-Teaming |
Melvin Liam Poon Keat, Kenna Zeng |
Alex Nam |
| Default |
Enhancing RLOO with Dense Symbolic Rewards and Bandit-Driven Curricula |
Irawadee Thawornbut, Rinnara Sangpisit |
Abhijnya Bhat |
| Custom |
Enhancing SINGER: Onboard Visual Drone Navigation through Iterative Imitation Learning and Reinforcement Fine-Tuning |
Erwin Giovanbattista Marcel R. Poussi |
Rahul Chand |
| Default |
Evaluating RL efficiency improvement methods including Synthetic Data Augmentation and Any-Generation Reward Optimization for Mathematical Reasoning on Countdown Tasks |
Lin Yuan, Chloe Yuri Jeon, Kelvin Kahiu Waititu |
Anubha Mahajan |
| Custom |
Exploratory Coding with Poly-EPO |
Eli Thomas Wandless |
Ifdita Hasan Orney |
| Custom |
Failing to Grasp the Point: Hierarchical Reinforcement Learning for Grasping Tasks |
Ashwin Mahendran, Anjali Sreenivas, Arianna Liang Cao |
Tian Gao |
| Custom |
Fault-Adaptive Locomotion via Implicit Damage Detection in MuJoCo Ant |
Sidhanth Mishra |
Tian Gao |
| Default |
Feature-Space Curriculum Learning for Countdown Reasoning |
Vishaal Samir Saraiya |
Yash Kankariya |
| Custom |
FlowMPC: Improving Flow Matching policies with World Models |
Chandon Robert Hamel |
Joonwon Kang |
| Custom |
Forced Grounding: Diagnosing and Repairing Language Neglect in Imitation-Learned Robot Policies via RL |
Elana N Chen, Hayden Kwan, William Charles Rose |
Tyler Lum |
| Custom |
Foveated Vision via Prediction Error–Augmented Reinforcement Learning |
Brion Qi Ye |
Joonwon Kang |
| Custom |
Fragment Assembly as Goal-Reaching: An RL Approach to Targeted Molecular Design |
Asmani Yamin, Megan Santhumayor, Katie Liu |
Perry Dong |
| Custom |
Frequency Reparameterization and Anchored Residuals for Flow Matching Robot Policies |
Alan Zhao, Juhyun Jung |
Joonwon Kang |
| Custom |
From Contact to Return: Curriculum and Predictive Shaping for Humanoid Table Tennis |
Kyle Ian Schmoyer, Hannah Gabriella Clay, Shane Robinson Mion |
Jonathan Yang |
| Default |
From Structured Backtracking to Targeted Failure Correction for Robust Mathematical Reasoning |
Christine Li, Eric Xia |
Riya Karumanchi |
| Custom |
From Text to Torque: Improving RL Tracking of Text-Generated Humanoid Motions |
Kuzey Kantarcioglu, Benji Warburton |
Tyler Lum |
| Default |
Frontier-Weighted SEC: A Case Study in Curriculum Learning for RL Fine-Tuning of Language Models |
Garrett Alarcon |
Riya Karumanchi |
| Custom |
Generative and Hierarchical Imitation Learning for Marine Trajectory Control in Stochastic Ocean Currents |
Omar Eduardo Jimenez Lopez |
Jonathan Yang |
| Custom |
Goal Conditioned Behavior Cloning for Robot Social Navigation |
Mete Gumusayak |
Tyler Lum |
| Custom |
GoodLiars: A Multi-Turn Extension of Reinforcement Learning-Based Belief Disruption |
Emma Marie Beharry, Abel Philip John, Elizabeth Michelle Gallagher |
Alex Nam |
| Default |
Granularity-Aware Off-Policy Fine-Tuning of LLMs via Expert Demonstrations |
Kevin Song, Sukeerth Ramkumar, Yina Jian |
Shengqu Cai |
| Custom |
Graphs and Meta Reinforcement Learning for Portfolio Management |
Dhruv Manani, Churan He |
Joy He-Yueya |
| Custom |
GRPO Didn’t Pass the Bar (But the Harness Did): Harness-Guided Post-Training for Legal Agents |
Leon Reilly, Duy Nguyen |
Joy He-Yueya |
| Custom |
Hierarchical RL for Cost Aware Protein Engineering Campaigns in Cloud Labs |
Emi Maria Mathew |
Rahul Chand |
| Custom |
How Does Scaffolding Affect Cross-Environment Generalization in LLM RL Fine-Tuning? |
Alfred Sven Bertil Sjoeqivst, Hana Mengyao Liu |
Alex Nam |
| Default |
How Much Stale Rollout Reuse Can Verifier-Based RLOO Tolerate? A Semi-Off-Policy Replay Study on Countdown Reasoning |
Amulya Parthasarathy |
Max Du |
| Custom |
Hybrid Advantage Shaping with Goal-Aware Attention for Per-Turn Credit Assignment in LLM-Agent Reinforcement Learning |
Max Luis Rodriguez, Samantha Malowane Leventis, Joseph Li |
Perry Dong |
| Custom |
Improving Fine-Grained Manipulation by World Action Models via Online World Model-Based Feedback Planning |
Albert Kui Lin |
Tyler Lum |
| Custom |
Improving Lean Premise Retrieval via RL and Distillation |
Alex Lopez, Kevin Rizk, Fred Rajasekaran |
Tyler Lum |
| Default |
Improving Mathematical Reasoning in Small Language Models via Curriculum Learning and Iterative Execution Feedback |
Xinyi Ai, Jiayu Sui |
Yash Kankariya |
| Default |
Improving Tool Internalization for Small Models: Annealed Tool Access for RLOO on Countdown |
Maty Bohacek, Jason Boxi Zhang |
Ke Wang |
| Custom |
Intelligence per Joule as a First-Class Post-Training Objective |
Cynthia Wang, Ravenor Carroll Davion |
Rahul Chand |
| Custom |
Interoception: Teaching LLMs to Reason on a Wallclock Budget |
Harshvardhan Singh |
Joy He-Yueya |
| Default |
IPO + RLOO Alignment Report |
Donnie Brooks Raymond, Matthew Darshan Torre |
Anubha Mahajan |
| Custom |
Is Exploration Helpful? Evaluating Transfer of Open-Ended Traces to Assignment-Based Code Edits |
Karthik Vinay Seetharaman, Tushar Dalmia, Aaryan Shah |
Megha Srivastava |
| Default |
Joint Optimization of Task Difficulty and Diversity for Fine-Tuning LLMs under Sparse Rewards |
Jiaming Shen, Jiaxin Fang, Jenny Jin |
Abhijnya Bhat |
| Default |
Learnable Curricula via Self-Play for Verifiable Reasoning Tasks |
Kai Wen |
Abhijnya Bhat |
| Custom |
Learned Forgetting: Task-Conditioned Visual Memory Selection via Reinforcement Learning |
Han Shaun Lee |
Marcel Torne |
| Custom |
Learning Adaptive Tutor Policies for Conversational Language Learning via Offline Reinforcement Learning |
Aditya Bora |
Anubha Mahajan |
| Custom |
Learning from Failure: Natural Language Feedback for Reusing Failed GRPO Trajectories |
Chenyue Li |
Ethan Hsu |
| Custom |
Learning from Heterogeneous Data |
Sydney Yan, Tracy Y Wei, Evy Zhu Shen |
Tyler Lum |
| Custom |
Learning Interpretable Code Explanations of LLM Behavior |
Joseph Tey, Nick Jiang |
Abhijnya Bhat |
| Custom |
Learning Latent Action World Models for Robot Control from Unlabeled Video |
Seraph Kai Yang |
Anubha Mahajan |
| Custom |
Learning Priority Functions for Graph-Based Exploration in ARC-AGI-3 |
Kyle Avery Feinstein, Steve Roy Mendeleev, Ryan Bookman |
Ethan Hsu |
| Custom |
Learning Structured Trust Policies from Uncertainty, Advisor Signals, and Agreement |
Jaden Chen, Gia Grace Ancone |
Ifdita Hasan Orney |
| Custom |
Learning to Explore Through Information-Directed Bayesian Optimal Experimental Design |
Lucia Zheng |
Ethan Hsu |
| Default |
Learning to Teach for Test-Time Reasoning |
Selim Emir Can, Mete Erdogan |
Ke Wang |
| Default |
Learning to Teach, Teaching to Learn |
Isaac Wooman Park, Sam Lustgarten |
Anikait Singh |
| Default |
Learning to Use Tools: Reinforcement Learning for Tool-Integrated Mathematical Reasoning |
Zi Wang, Minghui Xu |
Ke Wang |
| Custom |
Learning Transfer in Multitask Agents |
Malti Mohan John |
Joonwon Kang |
| Custom |
Learning When to Use Tools: Cost-Aware RL for Agentic Reasoning |
Vikram Garrett Srinivasan |
Anushree Aggarwal |
| Default |
Learning with a Curriculum: Enhancing LLM Math Reasoning via Hint-Based RL Fine-Tuning |
Luan Lam |
Anubha Mahajan |
| Custom |
Librarian Models: Anticipatory Filesystem Construction via Reinforcement Learning |
Teresa Zhang |
Ifdita Hasan Orney |
| Default |
Making RLOO Learn From Better Signals Curriculum Scheduling and Verification Commit Contrast for Countdown Reasoning |
Zayn Malhotra, Ziyi Ding |
Anikait Singh |
| Default |
MaxRL with Imperfect Reward Signals on Countdown |
Petru Cristian Budianu, Nicolas Bejar Arambula |
Abhijnya Bhat |
| Custom |
Meaningless Trivia, Meaningful Compression: Near-Free Token Efficiency in RL for Reasoning |
Chung Fat Wong, Anna Grebenchtchikova |
Joonwon Kang |
| Default |
Memory as an Action Space: Adaptive Retrieval in Small Language Models for Medical Reasoning |
Renn Su, Summer Olivia Royal |
Riya Karumanchi |
| Custom |
Memory Pretraining for Vision-Language-Action Model |
Pengyu Mo, Haowen Wang, Zhen Jia |
Marcel Torne |
| Custom |
Memory-Augmented VLM Planners for Long-Horizon VLA Control via RL |
Krish Sharma, Lucas Burgett |
Marcel Torne |
| Custom |
MiniHedgemony: Asymmetric Reward Structures in Self-Play Wargames |
Alex Lin Wang, Dario Gaitzi Soatto |
Tian Gao |
| Custom |
MIRAGE: Model Imagined Reachability for Augmented Graph Expansion |
Abhinav Sattiraju, Samrat Sahoo |
Ifdita Hasan Orney |
| Custom |
Model-Based Reinforcement Learning for Particle Accelerator Control |
Ryan Wu |
Anushree Aggarwal |
| Custom |
Multi-Move Refinement Reinforcement Learning with D4 Spatial Equivariance for 3D Chip Placement |
Yize Liu |
Perry Dong |
| Default |
Multi-Stage Curriculum GRPO for Countdown |
Timothy Yu, Yash Ranjith |
Ke Wang |
| Custom |
Multi-Timescale Language Memory for a Frozen VLA Controller |
Po-Yun Cheng, Wayne Chu |
Marcel Torne |
| Default |
Multimodal LLM Self-Play |
Deepika Dandeboyina |
Shengqu Cai |
| Custom |
Multimodal RLPD for Industrial Robotic Cable Insertion |
Jehan Shah |
Rahul Chand |
| Custom |
Off Policy or On Policy? Multi-Agent Reinforcement Learning for Drone Swarm Coordination |
Jett Crist Carruth, Alex Tadken Shaffer |
Anushree Aggarwal |
| Default |
Off-Policy Sampling for RLOO: When Does Reusing Rollouts Help? |
Kennaissa Kebeto Nabi, Henok Mikael Tewolde |
Ke Wang |
| Custom |
Offline Model-Based Reinforcement Learning for Energy-Efficient GPU Data-Center Cooling |
Naomie Sandra Chien |
Perry Dong |
| Custom |
Offline RL for Adaptive Vocabulary Selection in Conversational Language Tutoring |
Onyinyechi Nichole Okoye |
Ifdita Hasan Orney |
| Custom |
Parallel Deep-RL Agents for Roblox Obstacle-Course Navigation: From Single-Course Memorization to Generalizing Across Procedurally-Composed Courses |
Alex Li, Cheney Sang, Aidan Whitedeer |
Joonwon Kang |
| Custom |
Personalizing Slide Layouts: A Case Study in RL Reward and Context Bottlenecks |
Elijah Song, Ryan Minh-Tri Le |
Anushree Aggarwal |
| Custom |
PFP: A Perception-Factored Policy for Robust and Efficient SO-101 Manipulation |
Shobhit Agarwal, Amirreza Zeinali |
Rahul Chand |
| Custom |
Physics-Blind Reward Hacking: Exposing and Mitigating Safety Failures in LLM-Generated Reward Functions for Robotic Manipulation |
Zichen Yuan, Sophia Huang |
Tian Gao |
| Custom |
PipelineRL: Limits of Asynchronous Reinforcement Learning for Long-Horizon Trajectories |
Henry Bosch, Shurui Liu |
Jonathan Yang |
| Custom |
PlayGrader: Coaching the Coaches with Deep RL |
JP Paul McAnally |
Jonathan Yang |
| Custom |
Pluralistic Alignment via Self-Distillation from Synthetic User Feedback |
Minsik Oh |
Rahul Chand |
| Custom |
Pose Under Pressure: Robustness of Pose-Derived Dense Rewards in Demonstration-Guided Reinforcement Learning |
Joseph Dehoney |
Ethan Hsu |
| Custom |
Preclinical HIV Drug Candidate Discovery with Reinforcement Learning |
Arda Dastan, Kevin Chen, Elijah Alexander Schacter |
Jonathan Yang |
| Default |
Preference Optimization and Curriculum RLOO for Countdown Reasoning |
Akshar Sarvesh |
Yash Kankariya |
| Custom |
Prisoner’s Lemma: Exploitability-Aware Reinforcement Learning for Online Strategic Adaptation |
Maanit Goel |
Marcel Torne |
| Default |
Process Rewards and Tool Use: Two Extensions to RLOO Fine-Tuning for Math Reasoning |
Michael Yang, Du Li |
Max Du |
| Default |
Process-Level Alignment and Value-Guided Stepwise Planning in Countdown Math Reasoning |
Dongyu Jia |
Ke Wang |
| Default |
Programmatic Reasoning for Countdown: Learning to Generate Executable Python-Style Verifications |
Henry Jingsong Zhou, Oleh Ivankiv |
Anubha Mahajan |
| Default |
Progress-Aware Prompt Sampling for Verifier-Based RL Fine-Tuning |
Jason Yan |
Yash Kankariya |
| Default |
Progressive Rationality: Enhancing LLM Mathematical Reasoning via Numerical-Target-Curriculum-SFT in the Countdown Task |
Meng-Chin Wang |
Anubha Mahajan |
| Default |
Prompt Distribution Design for RLOO Countdown Reasoning |
Yikai Cao, Zhibo Dai |
Anubha Mahajan |
| Custom |
RadOncReason: Reinforcement Learning with Verifiable Guideline Rewards for Clinical Reasoning in Radiation Oncology |
Hailemariam Teshome |
Jonathan Yang |
| Custom |
RECAP-Ψ: Advantage-Conditioned Fine-Tuning for Open-Source Humanoid VLAs |
Aaditya Shah, Karthik Pythireddi, Jonathan Manfu Lu |
Joonwon Kang |
| Custom |
Reducing Citation Hallucinations in Large Language Models |
Rushank Goyal |
Ifdita Hasan Orney |
| Custom |
Reducing Scalar Rewards to Binary Success: General Off-Policy Learning with Success Functions |
Armaan Alan Abraham |
Jonathan Yang |
| Custom |
Refine and Compose: Mahalanobis Action Barriers over Demo-free Contrastive RL Primitives |
Dylan Zhou, Anuva Banwasi, Jiaye Zou |
Tyler Lum |
| Custom |
Reinforcement Learning for Clinical Site-of-Care Triage in a Sepsis Simulator |
Yun Dong, Saimai Lau, Liane Ozoemelam |
Anushree Aggarwal |
| Custom |
Reinforcement Learning for Compiled-CNOT-Efficient VQE Circuits |
John William Carlson, Josh Joseph |
Joonwon Kang |
| Custom |
Reinforcement Learning for Dynamic Beam Steering in Plasma Metamaterials |
Susan Zhang |
Megha Srivastava |
| Custom |
Reinforcement Learning for Figgie: Learning Negotiation as a First-Class Skill |
Daniel Li Yang |
Ethan Hsu |
| Custom |
Reinforcement Learning for Fog of War Chess with Action Space Pruning |
Sandeep Sethuraman, Kuba Hashemian, Leon Junliang Liu |
Anushree Aggarwal |
| Custom |
Reinforcement Learning for Geothermal Drilling Optimization |
Devan Shaan Agrawal |
Perry Dong |
| Custom |
Reinforcement Learning for Mental Health Interventions Using Unlabeled Smartphone Data |
Elisabeth A Holm, Juan Pablo Gonzalez Pacheco, Alfred Yu |
Alex Nam |
| Custom |
Reinforcement Learning for Noise-Aware Quantum Circuit Compilation |
Vinav Shah, Abhishek A Shah |
Rahul Chand |
| Default |
Reinforcement Learning for Self-Guided Context Compression in Mathematical Reasoning |
Jerry Wang, Ryan Wang |
Ke Wang |
| Custom |
Reinforcement Learning for Terminal-Area Air Traffic Control |
Jerry Yin |
Anushree Aggarwal |
| Default |
Replay-Augmented RLOO: Restoring Within-Group Reward Variance in Sparse-Reward Policy Gradients |
Hao Xu |
Shengqu Cai |
| Custom |
Replay-Aware Curriculum Learning for RoboPianist |
Shekhar Sharma |
Megha Srivastava |
| Custom |
Reproducible Top-K PPO for S&P 500 Portfolio Selection: Risk-Adjusted Gains, Seed Variance, and the Limits of Return-Maximizing RL |
Andres Felipe Restrepo |
Rahul Chand |
| Custom |
Residual Reinforcement Learning for Robotic Manipulation of Wire-like Objects |
Bautista Guerra, Andrew Yuxuan Liang, Alexander Tarvo |
Joonwon Kang |
| Custom |
Reversing Emergent Misalignment Using Simple Self-Distillation |
Justin Yizhou Huang, Adam Joseph Banks |
Rahul Chand |
| Custom |
Reward Density and Process-Reward Hacking in a Code-Repair MDP: A Controlled Study of PPO Meta-Control over a Frozen Code LLM |
Jack Frederick Lofwall, Luca Thomas Wheeler |
Yash Kankariya |
| Custom |
Reward Design for Reinforcement-Learning Fine-Tuning of Navigation Policies |
Yunshan Wang |
Ethan Hsu |
| Custom |
Reward-Model-Calibrated Reinforcement Learning from Verifiable Rewards for Machine Learning Engineering |
Siddharth Sachdeva |
Joy He-Yueya |
| Custom |
RL & Domain Randomization for Volt-VAR Control in Electricity Distribution Grids |
Anish Chaudhuri, Aniket Mahajan |
Alex Nam |
| Default |
RL Fine-Tuning for Countdown Reasoning with Test-Time Verification and Curriculum Learning |
Howard Xiao, Weiwei Wu |
Abhijnya Bhat |
| Custom |
RL for Adaptive Tutoring: When Should a Tutor Intervene? |
Hoang D Nguyen, Peter Martin Alisky, Zhenghui Chen |
Anushree Aggarwal |
| Custom |
RL-Powered Hint Generation for Adaptive Math Tutoring: A Simulated Student Evaluation of RLOO and DPO Policies |
Prabu Ganesh Ravindren |
Rahul Chand |
| Default |
Sample Wide, Pick Smart: For a Fixed 0.5B Countdown Reasoner, Test-Time Selection Beats More Training |
Manat Kaur, Felipe Leite Teixeira |
Riya Karumanchi |
| Custom |
Sample-Efficient Atari RL with Self-Supervised Pretrained Visual Encoders |
Mark Mutugi Athiri |
Joonwon Kang |
| Default |
Sample-Efficient RLOO via Off-Policy Rollout Reuse |
Zhaoyang Li |
Abhijnya Bhat |
| Custom |
Scale-Based Curriculum Pretraining for Robotic Piano Performance |
Anna Luna Fisher Lopez, Justin Choo, Eric Martz |
Megha Srivastava |
| Default |
Scaling Test-Time Computation for Countdown Reasoning through Verifier-Guided Resampling |
Jiaxuan Sun, Angel Zhang |
Ke Wang |
| Default |
Scaling Test-Time Compute via Generative Verification in Constrained Parameter Regimes |
Mohammad Rehan Ghori |
Anubha Mahajan |
| Custom |
SEACTS: Sequential Evidence Acquisition for Cancer Target Selection |
Nathan Zhou, Ria Garg |
Megha Srivastava |
| Custom |
Seeking Disagreement: Online Credit Assignment with Delayed and Pseudo-Aggregated Rewards |
Haozhan Gao |
Perry Dong |
| Default |
Selective Entropy Shaping For RLOO: When Importance Weight Gating Hurts Exploration |
Syed Ashal Ali |
Shengqu Cai |
| Custom |
Self-improving Vision-Language Models: Reinforcement Learning over Visual Abstractions |
Khai Loong Aw, Zhang Bai-han |
Tyler Lum |
| Custom |
Self-Supervised Data Quality Scoring for Offline RL in Driving |
Roman Gasiorowski |
Ke Wang |
| Custom |
Sequential Injection Control for Optimal Stimulated Geologic Hydrogen Production through Deep Reinforcement Learning |
Spencer Zhang |
Riya Karumanchi |
| Custom |
Sequential Outfit Curation with Multi-Dimensional Aesthetic Rewards |
Nicole Cortes, Esidore Fajardo Eneinyang, Chloe Di Murdoch |
Anushree Aggarwal |
| Default |
SFT-Estimated Curriculum Learning for Rule-Based RLOO Fine-Tuning |
Vanessa Felix |
Anikait Singh |
| Custom |
SHIELD: Failure-Aware Policy Shielding for Frozen Vision-Language-Action Policies |
Tianhui Huang, Jacob Lokheen Lee, Daniel Contreras-Esquivel |
Ke Wang |
| Custom |
SimToolReal-RGB: Visuomotor Diffusion Policies for Dexterous Manipulation |
Cayden Gu, Karen Vo, Christine Zhang |
Tyler Lum |
| Custom |
SketchRL: Finetuning Generative Sketch Models with Visual Rewards |
Mallika Parulekar, Tia S Geri, Hannah Rachel Levin |
Ifdita Hasan Orney |
| Default |
Small Generative Verifiers for Inference-Time Scaling Across RL Training Frameworks |
Joshua Delgadillo, Jui Khankari |
Abhijnya Bhat |
| Custom |
SODA: Supervised Option Discovery for Dynamic Action Chunking |
Umar Padela, Neetish Sharma |
Tyler Lum |
| Custom |
SpecGen: RL-Driven Compiler Verification |
Monami Dutta Gupta |
Jonathan Yang |
| Default |
Static vs. Adaptive Curriculum Learning for RLOO Fine-Tuning of Language Models |
Norah Asemota |
Ke Wang |
| Custom |
Stay In Your Lane! Hierarchical RL for a Modified Asteroids Game |
Samantha Estrada |
Rahul Chand |
| Default |
Strategy-Diverse Synthetic Warm-Starts for RL Fine-Tuning on Countdown |
Shreyas C S, Anushka Rawat |
Max Du |
| Custom |
Strong Sub-Agents in a Monitored “Private” Channel Under a Weak RL Supervisor |
Andrew Samuel Park |
Perry Dong |
| Custom |
Structural Diversity Rewards for Verifiable Graph-Structured Reasoning |
Rutanshu Jhaveri |
Ifdita Hasan Orney |
| — |
Structured Action-Effect Observables for Residual RL under Hidden Actuator Drift |
Sarvesh R. Babu |
Abhijnya Bhat |
| Default |
Structured Q&A Reasoning for Language Models |
Anuj Jamwal, Srinidhi Bhat |
Ke Wang |
| Default |
Structured Search Traces for Process-Aware Countdown Training |
Parth Sheth, Yucheng Huang |
Max Du |
| Custom |
SUBTITLE-DPO: Verifiable-Reward Preference Optimization to Suppress Spurious Burned-in Subtitles in Audio-Visual Video Diffusion |
Yubo Ruan |
Perry Dong |
| Custom |
Survival Instinct: One-Staged RL for Quadruped Parkour Self-Play |
Edward Neo Lee, Shatong Zhu, Haoyue Xiao |
Tian Gao |
| Custom |
System-Aware Reward Shaping for the Pythia RL Prefetcher |
Esmee Cowing, Milly Wong, Tesvara Suliani Jiang |
Perry Dong |
| Default |
Systems-Aware Off-Policy RLOO: Amortizing Sampling Cost via K-Reuse |
Abhishek Bharani |
Abhijnya Bhat |
| Default |
Targeted Counterfactual Branching for Tool-Invocation Decisions in RL Fine-Tuning |
Andres Ernesto Garcia, Prakash Koukuntla, Joshua Hsieh |
Riya Karumanchi |
| Custom |
Teacher-Contrastive On-Policy Distillation |
Juntong Shi |
Perry Dong |
| Default |
Teaching Small Models to Teach |
Rosemary Mingrui Jiang |
Ke Wang |
| Default |
Test-Time Best-of-K Selection with Generative Verification for Countdown Reasoning |
Jingshu Liu |
Ke Wang |
| Default |
Test-Time Inference Scaling for RL Fine-Tuned Language Models |
Ava Kouhana, Julian Rodriguez Cardenas, Leah Balakrishnan |
Anubha Mahajan |
| Default |
Test-Time Selection and Curriculum Learning for RL Fine-Tuned Language Models on Countdown Reasoning |
Rydham Goyal, Rakshit Kaushik |
Anikait Singh |
| Custom |
The Affect of Opponent Pool Size on the Policy Stability of Compute Constrained Self-play Tasks |
Jonathan Andrew Lutch |
Tian Gao |
| Custom |
The Long Game: A Long-Horizon RL Study of Fairness in Financial Lending |
Brydie Sigg, Naomi Y Boneh, Christelle Chantal Millos-Lopez |
Riya Karumanchi |
| Default |
Tool-Integrated Reasoning for Countdown |
Mahmood Ishaq Alhusseini, Frank D'Agostino, Sebastian Beckett Fisher |
Anikait Singh |
| Default |
Tool-Integrated RLOO and Pass@K: Testing the Invisible Leash |
Isaiah Flores, Mia Xiao, Katherine Wang Xu |
Anubha Mahajan |
| Custom |
Topology-Aware Reinforcement Learning for Text-to-CAD Code Generation |
Gaurav Tyagi |
Ke Wang |
| Default |
Towards Advantage Shaping for Multi-tool Reasoning: A Preliminary Empirical Study |
Ricardo Alberto Carrillo Romero |
Anubha Mahajan |
| Custom |
Training Robot Policies with a Foundation Model Teacher |
Steven Feng, April Yang, Daniel Zou |
Tyler Lum |
| Default |
Using, Learning, and Removing Tools: A Study of Tool-Integrated Reasoning for Countdown |
Diego Sierra, Brianna Xie |
Max Du |
| Default |
Verifier-Based Reinforcement Learning for Countdown with Curriculum Training |
Yuyan Wu |
Riya Karumanchi |
| Default |
Verifier-Guided Recombination Search for Token-Efficient Test-Time Compute in Countdown |
Shyam Sai Bethina, Sahil Koita, Ananya Ganapathi |
Anubha Mahajan |
| Default |
When Can a Model Write Its Own Curriculum? A Diagnostic Study of Joint Conjecturer/Prover RLOO for Countdown |
Finn Staeblein, Nicholas Simon Allen |
Riya Karumanchi |
| Default |
When Does Curriculum Learning Help? A Knowledge-Aware Learning-Progress Curriculum for RLVR |
Louis de Germay de Cirfontaine, Arthur Gontier |
Riya Karumanchi |
| Custom |
When Does Execution Feedback Transfer? Minimal-Sufficient Feedback for Internalized RLVR |
Yucheng Yao |
Alex Nam |
| Custom |
When Does Privileged Self-Distillation Help GUI Grounding? A Teacher-Gap Analysis of Visual SDPO |
Ethan Charles Morgan |
Ethan Hsu |
| Custom |
When Does Specialist RL Help a Small-Model Multi-Agent Pipeline? A Cross-Benchmark Study on Long-Document QA |
Mert Karabiyik |
Alex Nam |
| Default |
Where RLOO Stops on Countdown: A Capability Ceiling at the Additive ↔ Multiplicative Boundary |
Aarohi Gupta |
Abhijnya Bhat |
| Default |
Zero Sum Game Framework for Group Advantage Simplification |
Joshua A Slagle |
Yash Kankariya |