News

Feb 03, 2026 The paper, Reward-free alignment for conflicting objectives, coauthored with Peter Chen, Xiaopeng Li and Xi Chen was posted to ArXiv.
Jan 26, 2026 The paper, Exploration vs exploitation: Rethinking RLVR through clipping, entropy, and spurious reward, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin and Xi Chen was accepted to ICLR 2026.
Sep 18, 2025 The paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was accepted to NeurIPS 2025.