News

May 20, 2025 The new paper, Spectral policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv.
May 09, 2025 The new paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was posted to ArXiv.
Jan 27, 2025 The paper, Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization, coauthored with Chi Jin and Michael. I. Jordan was accepted to Journal of Machine Learning Research.