news | Wenlong Deng

Jan 26, 2026	Two papers were accepted to ICLR 2026: one on token hidden rewards in reinforcement learning, and the other on resolving gradient explosion and vanishing in text-based models. Many thanks to my collaborators!
Dec 30, 2025	Delighted to share that I will be interning as a Applied Scientist at Amazon Annapurna Labs starting mid-January, where I’ll be working on multi-turn reinforcement learning.
Dec 03, 2025	Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL.
Sep 18, 2025	Out paper On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization is accepted by NeurIPS 2025, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients.