20260310_tmlr

Created in March 10, 2026

2026

The paper, Stepwise guided policy optimization: Coloring your incorrect reasoning in GRPO, coauthored with Peter Chen, Xiaopeng Li, Ziniu Li and Xi Chen was accepted to TMLR.

Enjoy Reading This Article?

Here are some more articles you might like to read next:

Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra

Displaying External Posts on Your al-folio Blog

a post with TikZJax

a post with jupyter notebook

a post with custom blockquotes