news

Dec 30, 2025 Delighted to share that I will be interning as a Applied Scientist at Amazon Annapurna Labs starting mid-January, where I’ll be working on multi-turn reinforcement learning.
Dec 03, 2025 Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL.
Sep 18, 2025 Out paper On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization is accepted by NeurIPS 2025, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients.
Jul 10, 2025 Our work Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training has been selected as a Best Paper at the 2nd AI for Math Workshop @ ICML 2025. In this paper, we investigate how to guide the balance between exploitation and exploration in RL training using a free-launch token-level reward.