Announcement_2
Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL.