news

Jul 10, 2025 Our work Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training has been selected as a Best Paper at the 2nd AI for Math Workshop @ ICML 2025. In this paper, we investigate how to guide the balance between exploitation and exploration in RL training using a free-launch token-level reward.
Jun 03, 2025 Check out our latest findings in On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients.
Jun 02, 2025 Delighted to share that I will be interning as a Research Scientist at Meta this summer.
Apr 01, 2025 Our paper MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs is online! We use Knowledge Graph(KG)as structured knowledge source to provide fact guidence on medical reasoning data generation.