[Theory] NOW: [TTIC Talks] 4/23 Talks at TTIC: Wei Xiong, University of Illinois

Brandie Jones via Theory theory at mailman.cs.uchicago.edu
Wed Apr 23 13:55:00 CDT 2025


*When:*        Wednesday, April 23rd at *2PM CT*


*Where:       *Talk will be given *live, in-person* at

                       TTIC, 6045 S. Kenwood Avenue

                       5th Floor, Room 530


*Virtually:*  via Panopto (livestream
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cb3c4fe4-0fba-4434-8f7a-b2bf015ac306>
)


*Who: *         Wei Xiong, University of Illinois

*Title:*           Self-rewarding correction for mathematical reasoning
*Abstract:  *We will present the self-rewarding reasoning large language
models (LLMs) in this presentation, which can simultaneously generate
step-by-step reasoning and evaluate the correctness of their outputs during
the inference time-without external feedback. This integrated approach
allows a single model to independently guide its reasoning process,
offering computational advantages for model deployment. We particularly
focus on the representative task of self-correction, where models
autonomously detect errors in their responses, revise outputs, and decide
when to terminate iterative refinement loops.

To enable this, we propose a two-staged algorithmic framework for
constructing self-rewarding reasoning models using only self-generated
data. In the first stage, we employ sequential rejection sampling to
synthesize long chain-of-thought trajectories that incorporate both
self-rewarding and self-correction mechanisms. Fine-tuning models on these
curated data allows them to learn the patterns of self-rewarding and
self-correction. In the second stage, we further enhance the models'
ability to assess response accuracy and refine outputs through
reinforcement learning with rule-based signals. Experiments with Llama-3
and Qwen-2.5 demonstrate that our approach surpasses intrinsic
self-correction capabilities and achieves performance comparable to systems
that rely on external reward models.

*Short Bio*:  Wei Xiong is a second-year Ph.D. candidate in computer
science at UIUC, working with Tong Zhang and Nan Jiang. He also
concurrently works with Gemini post-training team and FAIR alignment team
as a full-time or part-time research intern. His research interests focus
on the theoretical understanding of decision-making problems and the
practical algorithm designs inspired by the mathematical insights.

*Host: Zhiyuan Li <zhiyuanli at ttic.edu>*


-- 
*Brandie Jones *
*Executive **Administrative Assistant*
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL  60637
www.ttic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20250423/d907925d/attachment-0001.html>


More information about the Theory mailing list