Wenlong Deng
Open to Collaboration and Internship
My name is Deng Wenlong (邓文龙), I am a final year Ph.D. student in the Electrical and Computer Engineering department at the University of British Columbia, co-supervised by Prof. Xiaoxiao Li and Prof. Christos Thrampoulidis. I am broadly interested in machine learning, understanding models and their application in healthcare. I also have conducted research on trust-worthy machine learning (e.g. Bias and efficiency), deep learning-based medical image analysis and now I am working on improving model reasoning abilities on medical diagnosis and Tool use.
Previously: I obtained my master’s degree in Electrical Engineering at EPFL in 2019, where I was fortunated been supervised by Prof. Alexandre Alahi on stereo vision. I received my bachelor’s degree in Electronic and Information Engineering (Honors) at UESTC in 2017.
News
| May 04, 2026 | Two papers are accepted by ICML 2026: one on training collapse in multi-turn reinforcement learning, and the other on mitigating attention distraction in vision-language models. Many thanks to all my collaborators for their support and contributions! |
|---|---|
| May 03, 2026 | Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL. |
| Apr 08, 2026 | Out For-Value is accepted by ACL Main 2026, where we delve into the learning dynamics of SFT and introduce a forward-only data valuation framework that enables scalable and efficient value estimation for both LLMs and VLMs. Code avaliable at github. |
| Jan 26, 2026 | Two papers were accepted to ICLR 2026: one on token hidden rewards in reinforcement learning, and the other on resolving gradient explosion and vanishing in text-based models. Many thanks to my collaborators! |
Selected Publications
- Agent ReasoningOn Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-DisplacementICML, 2026
- Data ValueFor-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMsACL, 2026
- ReasoningToken Hidden Reward: Steering Exploration-Exploitation in GRPO TrainingICLR 2026 (ICML AI4Math Best Paper), 2025
- ReasoningOn the Effect of Negative Gradient in Group Relative Deep Reinforcement OptimizationNeurIPS, 2025
- ReasoningMedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs2025* Equal Contribution
- EfficiencyDARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned ModelsInternational Conference on Learning Representations (spotlight 5%), 2025
- Data ValueGMValuator: Similarity-based Data Valuation for Generative ModelsInternational Conference on Learning Representations, 2025* Equal Contribution
-
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningThe IEEE Conference on Computer Vision and Pattern Recognition, 2024 - MedicalLESS: Label-efficient Multi-scale Learning for Cytological Whole Slide Image ScreeningMedical Image Analysis , 2024
- MedicalOn Fairness of Medical Image Classification with Multiple Sensitive Attributes via Learning Orthogonal RepresentationsIn Information Processing in Medical Imaging (Accept rate 25%) , 2023
Talks
- Give a Talk at Northeastern University CS7150 on on-policy and off-policy Distillation, thanks for Jiaji’s Invitation!
Service
- 2024: PC member of FL@FM-IJCAI’24 and FL@FM-ICME’24
- 2023-now: Reviewer for NeurIPS, ICLR, ICML, TMLR, CVPR, ECCV, AISTATS, AAAI, and MICCAI
- 2026: Reviewer for ICML Position Paper
