20250520_arxiv
The new paper, Spectral policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv.
Enjoy Reading This Article?
Here are some more articles you might like to read next: