Sep 20, 2025 | The new paper, Stepwise guided policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv. |
---|---|
Sep 18, 2025 | The paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was accepted to NeurIPS 2025. |
Jan 27, 2025 | The paper, Two-timescale gradient descent ascent algorithms for nonconvex minimax optimization, coauthored with Chi Jin and Michael. I. Jordan was accepted to Journal of Machine Learning Research. |