| Feb 03, 2026 | The paper, Reward-free alignment for conflicting objectives, coauthored with Peter Chen, Xiaopeng Li and Xi Chen was posted to ArXiv. |
| Jan 26, 2026 | The paper, Exploration vs exploitation: Rethinking RLVR through clipping, entropy, and spurious reward, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li, Wotao Yin and Xi Chen was accepted to ICLR 2026. |
| Sep 18, 2025 | The paper, ComPO: Preference alignment via comparison oracles, coauthored with Peter Chen, Xi Chen and Wotao Yin was accepted to NeurIPS 2025. |