| Dec 18, 2025 | The paper, Exploration vs exploitation: Rethinking {RLVR} through clipping, entropy, and spurious reward, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin and Xi Chen was posted to ArXiv. |
| Dec 15, 2025 | The paper, A direct second-order method for solving two-player zero-sum games, coauthored with David Yang, Yuan Gao and Christian Kroer was posted to ArXiv. |
| Nov 20, 2025 | The paper, Non-convex self-concordant functions: Practical algorithms and complexity analysis, coauthored with Donald Goldfarb, Lexiao Lai and Jiayu Zhang was posted to ArXiv. |
| Sep 20, 2025 | The new paper, Stepwise guided policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv. |
| Sep 18, 2025 | The paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was accepted to NeurIPS 2025. |