Announcement_4

Our work Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training has been selected as a Best Paper at the 2nd AI for Math Workshop @ ICML 2025. In this paper, we investigate how to guide the balance between exploitation and exploration in RL training using a free-launch token-level reward.