News

May 04, 2026 Two papers are accepted by ICML 2026: one on training collapse in multi-turn reinforcement learning, and the other on mitigating attention distraction in vision-language models. Many thanks to all my collaborators for their support and contributions!
May 03, 2026 Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL.
Apr 08, 2026 Out For-Value is accepted by ACL Main 2026, where we delve into the learning dynamics of SFT and introduce a forward-only data valuation framework that enables scalable and efficient value estimation for both LLMs and VLMs. Code avaliable at github.
Jan 26, 2026 Two papers were accepted to ICLR 2026: one on token hidden rewards in reinforcement learning, and the other on resolving gradient explosion and vanishing in text-based models. Many thanks to my collaborators!