Wenlong Deng
Open to Collaboration and Internship
My name is Deng Wenlong (邓文龙), I am a Ph.D. student in the Electrical and Computer Engineering department at the University of British Columbia, co-supervised by Prof. Xiaoxiao Li and Prof. Christos Thrampoulidis. I am broadly interested in machine learning and its application in healthcare. I have conducted research on LLM efficiency, deep learning-based medical image analysis and now I am working on improving model reasoning abilities on medical diagnosis and math solving.
Previously: I obtained my master’s degree in Electrical Engineering at EPFL in 2019, where I was fortunated been supervised by Prof. Alexandre Alahi on stereo vision. I received my bachelor’s degree in Electronic and Information Engineering (Honors) at UESTC in 2017.
news
| Dec 30, 2025 | Delighted to share that I will be interning as a Applied Scientist at Amazon Annapurna Labs starting mid-January, where I’ll be working on multi-turn reinforcement learning. |
|---|---|
| Dec 03, 2025 | Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL. |
| Sep 18, 2025 | Out paper On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization is accepted by NeurIPS 2025, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients. |
| Jul 10, 2025 | Our work Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training has been selected as a Best Paper at the 2nd AI for Math Workshop @ ICML 2025. In this paper, we investigate how to guide the balance between exploitation and exploration in RL training using a free-launch token-level reward. |
selected publications
- Agent ReasoningOn GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death SpiralarXiv preprint arXiv:2512.04220, 2025
- ReasoningOn the Effect of Negative Gradient in Group Relative Deep Reinforcement OptimizationNeurIPS, 2025
- ReasoningMedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs2025* Equal Contribution
- EfficiencyDARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned ModelsInternational Conference on Learning Representations (spotlight 5%), 2025
- Data ValueGMValuator: Similarity-based Data Valuation for Generative ModelsInternational Conference on Learning Representations, 2025* Equal Contribution
-
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningThe IEEE Conference on Computer Vision and Pattern Recognition, 2024 - MedicalLESS: Label-efficient Multi-scale Learning for Cytological Whole Slide Image ScreeningMedical Image Analysis , 2024
- MedicalOn Fairness of Medical Image Classification with Multiple Sensitive Attributes via Learning Orthogonal RepresentationsIn Information Processing in Medical Imaging (Accept rate 25%) , 2023
