<div dir="ltr"><div dir="ltr"><div><div class="gmail_default"><font face="georgia, serif" color="#000000"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"> Wednesday, April 23rd at <b style="background-color:rgb(255,255,0)">2PM CT</b><b> </b></font></font></font></div><div><div class="gmail_default"><div><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font color="#000000"><font style="vertical-align:inherit"><font face="georgia, serif" style="vertical-align:inherit"><b><span style="background-color:rgb(255,255,0)"><br></span></b></font></font></font></p><div class="gmail_default"><font face="georgia, serif" color="#000000"><b>Where: </b><span>Talk</span> will be given <font style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at</font></div><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"> <span>TTIC</span>, 6045 S. Kenwood Avenue</font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font color="#000000" face="georgia, serif"> 5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b><font face="georgia, serif" color="#000000"><br></font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Virtually:</b><span style="letter-spacing:0.2px"> via Panopto </span>(<a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cb3c4fe4-0fba-4434-8f7a-b2bf015ac306" target="_blank">livestream</a><span style="letter-spacing:0.2px">)</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> </font></font>Wei Xiong, University of Illinois</font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"></p><div><p style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><span style="letter-spacing:normal"><b>Title:</b> </span><span style="letter-spacing:normal"> </span><span style="letter-spacing:normal">Self-rewarding correction for mathematical reasoning</span></font></p><div><font face="georgia, serif" color="#000000"><b>Abstract: </b>We will present the self-rewarding reasoning large language models (LLMs) in this presentation, which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback. This integrated approach allows a single model to independently guide its reasoning process, offering computational advantages for model deployment. We particularly focus on the representative task of self-correction, where models autonomously detect errors in their responses, revise outputs, and decide when to terminate iterative refinement loops.</font></div><font face="georgia, serif" color="#000000"><br>To enable this, we propose a two-staged algorithmic framework for constructing self-rewarding reasoning models using only self-generated data. In the first stage, we employ sequential rejection sampling to synthesize long chain-of-thought trajectories that incorporate both self-rewarding and self-correction mechanisms. Fine-tuning models on these curated data allows them to learn the patterns of self-rewarding and self-correction. In the second stage, we further enhance the models' ability to assess response accuracy and refine outputs through reinforcement learning with rule-based signals. Experiments with Llama-3 and Qwen-2.5 demonstrate that our approach surpasses intrinsic self-correction capabilities and achieves performance comparable to systems that rely on external reward models.</font></div><div><font face="georgia, serif" color="#000000"><br></font></div><div><div><font face="georgia, serif" color="#000000"><b>Short Bio</b>: Wei Xiong is a second-year Ph.D. candidate in computer science at UIUC, working with Tong Zhang and Nan Jiang. He also concurrently works with Gemini post-training team and FAIR alignment team as a full-time or part-time research intern. His research interests focus on the theoretical understanding of decision-making problems and the practical algorithm designs inspired by the mathematical insights.</font></div><font face="georgia, serif" color="#000000"><br><b>Host: <a href="mailto:zhiyuanli@ttic.edu" target="_blank">Zhiyuan Li</a></b></font></div></div></div></div><br clear="all"></div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><b style="background-color:rgb(255,255,255)"><font color="#3d85c6">Brandie Jones </font></b><div><div><div><font color="#3d85c6"><b><i>Executive </i></b></font><b style="color:rgb(61,133,198)"><i>Administrative Assistant</i></b></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Toyota Technological Institute</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">6045 S. Kenwood Avenue</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Chicago, IL 60637</font></span></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><a href="http://www.ttic.edu" target="_blank">www.ttic.edu</a> </font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><br></font></span></div></div></div></div>
</div>