news

Sep 18, 2025 Out paper On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization is accepted by NeurIPS 2025, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients.
Jul 10, 2025 Our work Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training has been selected as a Best Paper at the 2nd AI for Math Workshop @ ICML 2025. In this paper, we investigate how to guide the balance between exploitation and exploration in RL training using a free-launch token-level reward.
Jun 02, 2025 Delighted to share that I will be interning as a Research Scientist at Meta this summer.
Apr 01, 2025 Our paper MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs is online! We use Knowledge Graph(KG)as structured knowledge source to provide fact guidence on medical reasoning data generation.