20250920_arxiv

Created in September 20, 2025

2025

The new paper, Stepwise guided policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was posted to ArXiv.

Enjoy Reading This Article?

Here are some more articles you might like to read next:

Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra

Displaying External Posts on Your al-folio Blog

a post with TikZJax

a post with jupyter notebook

a post with custom blockquotes