| Jan 26, 2026 | Two papers were accepted to ICLR 2026: one on token hidden rewards in reinforcement learning, and the other on resolving gradient explosion and vanishing in text-based models. Many thanks to my collaborators! |
| Dec 30, 2025 | Delighted to share that I will be interning as a Applied Scientist at Amazon Annapurna Labs starting mid-January, where I’ll be working on multi-turn reinforcement learning. |
| Dec 03, 2025 | Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL. |
| Sep 18, 2025 | Out paper On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization is accepted by NeurIPS 2025, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients. |