Wenlong Deng
Open to Collaboration and Internship
My name is Deng Wenlong (邓文龙), I am a final year Ph.D. student in the Electrical and Computer Engineering department at the University of British Columbia, co-supervised by Prof. Xiaoxiao Li and Prof. Christos Thrampoulidis. I am broadly interested in machine learning, understanding models and their application in healthcare. I also have conducted research on trust-worthy machine learning (e.g. Bias and efficiency), deep learning-based medical image analysis and now I am working on improving model reasoning abilities on medical diagnosis and Tool use.
Previously: I obtained my master’s degree in Electrical Engineering at EPFL in 2019, where I was fortunated been supervised by Prof. Alexandre Alahi on stereo vision. I received my bachelor’s degree in Electronic and Information Engineering (Honors) at UESTC in 2017.
News
| Apr 08, 2026 | Out For-Value is accepted by ACL Main 2026, where we delve into the learning dynamics of SFT and introduce a forward-only data valuation framework that enables scalable and efficient value estimation for both LLMs and VLMs. |
|---|---|
| Jan 26, 2026 | Two papers were accepted to ICLR 2026: one on token hidden rewards in reinforcement learning, and the other on resolving gradient explosion and vanishing in text-based models. Many thanks to my collaborators! |
| Dec 03, 2025 | Our paper On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral is online! We investigate why GRPO fails within Search-R1 (a recent multi-turn agentic workflow powered by DeepSeek-R1), showing that LLD is also the root cause of GRPO failure in multi-turn, tool-integrated RL. |
| Sep 18, 2025 | Out paper On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization is accepted by NeurIPS 2025, where we delve into the learning dynamics of GRPO and conduct an in-depth analysis of negative gradients. |
Selected Publications
- Agent ReasoningOn GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death SpiralarXiv preprint arXiv:2512.04220, 2025
- Data ValueFor-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMsACL, 2026
- ReasoningToken Hidden Reward: Steering Exploration-Exploitation in GRPO TrainingICLR 2026 (ICML AI4Math Best Paper), 2025
- ReasoningOn the Effect of Negative Gradient in Group Relative Deep Reinforcement OptimizationNeurIPS, 2025
- ReasoningMedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs2025* Equal Contribution
- EfficiencyDARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned ModelsInternational Conference on Learning Representations (spotlight 5%), 2025
- Data ValueGMValuator: Similarity-based Data Valuation for Generative ModelsInternational Conference on Learning Representations, 2025* Equal Contribution
-
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated LearningThe IEEE Conference on Computer Vision and Pattern Recognition, 2024 - MedicalLESS: Label-efficient Multi-scale Learning for Cytological Whole Slide Image ScreeningMedical Image Analysis , 2024
- MedicalOn Fairness of Medical Image Classification with Multiple Sensitive Attributes via Learning Orthogonal RepresentationsIn Information Processing in Medical Imaging (Accept rate 25%) , 2023
Talks
- Give a Talk at Northeastern University CS7150 on on-policy and off-policy Distillation, thanks for Jiaji’s Invitation!
Service
- 2024: PC member of FL@FM-IJCAI’24 and FL@FM-ICME’24
- 2023-now: Reviewer for NeurIPS, ICLR, ICML, TMLR, CVPR, ECCV, AISTATS, AAAI, and MICCAI
- 2026: Reviewer for ICML Position Paper
