May 20, 2025 | The new paper, Spectral policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv. |
May 09, 2025 | The new paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was posted to ArXiv. |
Jan 27, 2025 | The paper, Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization, coauthored with Chi Jin and Michael. I. Jordan was accepted to Journal of Machine Learning Research. |