| May 04, 2026 | Two papers are accepted by ICML 2026: one on training collapse in multi-turn reinforcement learning, and the other on mitigating attention distraction in vision-language models. Many thanks to all my collaborators for their support and contributions! |
| May 03, 2026 | Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL. |
| Apr 08, 2026 | Out For-Value is accepted by ACL Main 2026, where we delve into the learning dynamics of SFT and introduce a forward-only data valuation framework that enables scalable and efficient value estimation for both LLMs and VLMs. Code avaliable at github. |
| Jan 26, 2026 | Two papers were accepted to ICLR 2026: one on token hidden rewards in reinforcement learning, and the other on resolving gradient explosion and vanishing in text-based models. Many thanks to my collaborators! |