News

Dec 18, 2025 The paper, Exploration vs exploitation: Rethinking {RLVR} through clipping, entropy, and spurious reward, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin and Xi Chen was posted to ArXiv.
Dec 15, 2025 The paper, A direct second-order method for solving two-player zero-sum games, coauthored with David Yang, Yuan Gao and Christian Kroer was posted to ArXiv.
Nov 20, 2025 The paper, Non-convex self-concordant functions: Practical algorithms and complexity analysis, coauthored with Donald Goldfarb, Lexiao Lai and Jiayu Zhang was posted to ArXiv.
Sep 20, 2025 The new paper, Stepwise guided policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv.
Sep 18, 2025 The paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was accepted to NeurIPS 2025.