Wenlong Deng

Open to Collaboration and Internship

My name is Deng Wenlong (邓文龙), I am a Ph.D. student in the Electrical and Computer Engineering department at the University of British Columbia, co-supervised by Prof. Xiaoxiao Li and Prof. Christos Thrampoulidis. I am broadly interested in machine learning and its application in healthcare. I have conducted research on LLM efficiency, deep learning-based medical image analysis and now I am working on improving model reasoning abilities on medical diagnosis and math solving.

Previously: I obtained my master’s degree in Electrical Engineering at EPFL in 2019, where I was fortunated been supervised by Prof. Alexandre Alahi on stereo vision. I received my bachelor’s degree in Electronic and Information Engineering (Honors) at UESTC in 2017.

news

Jan 26, 2026	Two papers were accepted to ICLR 2026: one on token hidden rewards in reinforcement learning, and the other on resolving gradient explosion and vanishing in text-based models. Many thanks to my collaborators!
Dec 30, 2025	Delighted to share that I will be interning as a Applied Scientist at Amazon Annapurna Labs starting mid-January, where I’ll be working on multi-turn reinforcement learning.
Dec 03, 2025	Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL.
Sep 18, 2025	Out paper On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization is accepted by NeurIPS 2025, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients.

selected publications

Agent Reasoning

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Wenlong Deng, Yushu Li , Boying Gong , and 3 more authors

arXiv preprint arXiv:2512.04220, 2025
Reasoning

Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training

Wenlong Deng, Yi Ren , Danica J Sutherland , and 2 more authors

In ICLR 2026 (ICML AI4Math Best Paper) , 2025
Reasoning

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization

Wenlong Deng, Yi Ren , Muchen Li , and 3 more authors

NeurIPS, 2025
Reasoning

MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs

Juncheng Wu* , Wenlong Deng*, Xingxuan Li , and 12 more authors

2025

* Equal Contribution

arXiv
Efficiency

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

Wenlong Deng, Yize Zhao , Vala Vakilian , and 3 more authors

International Conference on Learning Representations (spotlight 5%), 2025

arXiv
Data Value

GMValuator: Similarity-based Data Valuation for Generative Models

Jiaxi Yang* , Wenlong Deng*, Benlin Liu , and 2 more authors

International Conference on Learning Representations, 2025

* Equal Contribution

arXiv
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

Wenlong Deng, Christos Thrampoulidis , and Xiaoxiao Li

The IEEE Conference on Computer Vision and Pattern Recognition, 2024

arXiv
Medical

LESS: Label-efficient Multi-scale Learning for Cytological Whole Slide Image Screening

Beidi Zhao , Wenlong Deng, Zi Han , and 5 more authors

Medical Image Analysis , 2024

arXiv
Medical

On Fairness of Medical Image Classification with Multiple Sensitive Attributes via Learning Orthogonal Representations

Wenlong Deng, Yuan Zhong , Qi Dou , and 1 more author

In Information Processing in Medical Imaging (Accept rate 25%) , 2023

arXiv