News

Sep 20, 2025 The new paper, Stepwise guided policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv.
Sep 18, 2025 The paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was accepted to NeurIPS 2025.
Jan 27, 2025 The paper, Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization, coauthored with Chi Jin and Michael. I. Jordan was accepted to Journal of Machine Learning Research.