<div dir="ltr"><div><div class="gmail_default" style=""><font face="georgia, serif" color="#000000" style=""><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit">    Monday, January 6th at <b style="background-color:rgb(255,255,0)">11:30am CT</b><b> </b></font></font></font></div><div style=""><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font color="#000000"><font style="vertical-align:inherit"><font face="georgia, serif" style="vertical-align:inherit"><b><span style="background-color:rgb(255,255,0)"><br></span></b></font></font></font></p><div class="gmail_default" style=""><font face="georgia, serif" color="#000000"><b>Where:       </b>Talk will be given <font style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at</font></div><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000">                       TTIC, 6045 S. Kenwood Avenue</font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font color="#000000" face="georgia, serif">                       5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b><font face="georgia, serif" color="#000000"><br></font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Virtually:</b><span style="letter-spacing:0.2px">  via Panopto </span>(<a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=f5426f91-dc51-405a-a960-b248011c2bb1" target="_blank">livestream</a><span style="letter-spacing:0.2px">)</span><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b>         </font></font><span class="gmail_default" style=""></span><span class="gmail_default" style=""></span>Kaifeng Lyu<span class="gmail_default" style="">, UC Berkeley</span></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"></p><div style=""><p style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><span style="letter-spacing:normal"><b>Title:</b>          </span><span style="letter-spacing:normal"><span class="gmail_default" style=""></span></span><span style="letter-spacing:normal">Scaling Hyperparameters in Training Large Models with Theoretical Insights</span></font></p><div style=""><font face="georgia, serif" color="#000000"><b>Abstract:  </b><span class="gmail_default" style=""></span>Training large models is both resource-intensive and time-consuming, making it crucial to understand the quantitative relationship between model performance and hyperparameters. In this talk, I will present our works that leverage theoretical insights to tackle this issue from multiple aspects. First, distributed training requires large batch sizes to fully exploit data parallelism, but how should the learning rate be tuned as the batch size changes? Our work studies the SDE approximations for large-batch RMSprop and Adam and derives the Square Root Scaling Rule (SRSR): batch size ~ sqrt(learning rate). Second, training large models with many workers can introduce significant communication overhead. Local gradient methods, such as Local SGD, address this by allowing workers to compute locally for H steps before synchronizing with others. I will discuss a curious case in ImageNet-scale supervised learning: if H scales up quadratically as learning rate decays, Local SGD can lead to higher test accuracy compared to standard SGD running for the same number of steps. This quadratic scaling is backed by a sharpness-based implicit bias analysis of Local SGD. Finally, for LLM pretraining, we explore how to optimize learning rate schedules for any given training horizon. Our recent paper proposes an empirical law that describes how pretraining loss evolves with different learning rate schedules. By minimizing the predicted final pretraining loss over feasible schedules, we identify a schedule that outperforms the widely used cosine schedule.</font></div><div style=""><font face="georgia, serif" color="#000000"><br></font></div><div style=""><font face="georgia, serif" color="#000000"><b>Short Bio</b>: <span class="gmail_default" style=""></span>Kaifeng Lyu is a Postdoctoral Research Fellow at the Simons Institute for the Theory of Computing at UC Berkeley. He completed his Ph.D. in Computer Science at Princeton University in 2024, where he was advised by Sanjeev Arora. His research explores the theoretical and scientific foundations of deep learning and large language models. In Fall 2025, he will join the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University as a Tenure-Track Assistant Professor.</font></div><font face="georgia, serif" style="" color="#000000"><br><b style="">Host: <a href="mailto:zhiyuanli@ttic.edu" target="_blank" style=""><span class="gmail_default" style=""></span></a><a href="mailto:nati@ttic.edu" style="">N<span class="gmail_default" style="">ati Srebro</span></a></b></font></div></div><br clear="all"></div><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><b style="background-color:rgb(255,255,255)"><font color="#3d85c6">Brandie Jones </font></b><div><div><div><font color="#3d85c6"><b><i>Executive </i></b></font><b style="color:rgb(61,133,198)"><i>Administrative Assistant</i></b></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Toyota Technological Institute</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">6045 S. Kenwood Avenue</font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6">Chicago, IL  60637</font></span></div></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><a href="http://www.ttic.edu" target="_blank">www.ttic.edu</a> </font></span></div><div><span style="background-color:rgb(255,255,255)"><font color="#3d85c6"><br></font></span></div></div></div></div>