<div dir="ltr"><div><font face="arial, sans-serif" style="color:rgb(0,0,0)"><b>When:</b>      Friday, May 1st at 10:30am</font></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif"><br></font></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif"><b>Where:</b>     Zoom Virtual Talk (see details below)<br></font></div><div><div style="color:rgb(0,0,0)"><p style="margin-bottom:0.0001pt;line-height:normal"><font face="arial, sans-serif"><b>Who: </b>     <b>  </b><span>Shirley</span> Wu, University of Texas at Austin</font></p><table cellpadding="0" style="border-collapse:collapse;margin-top:0px;width:auto;letter-spacing:0.2px;display:block"></table></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif"><br></font></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif"><b>Title: </b>       Normalization Methods: Auto-tuning Stepsize and Implicit Regularization</font></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif"><br></font></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif"><b>Abstract:</b> Neural network optimization with stochastic gradient descent (SGD) require many interesting techniques including normalization methods such as batch normalization (Ioffe and Szegedy, 2015), and adaptive gradient methods such as ADAM (Kingma and Ba, 2014), to attain optimal performance. While these methods are successful, their theoretical understanding has only recently started to emerge. A significant challenge in understanding these methods is the highly non-convex and non-linear nature of neural networks.</font></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif"><br></font></div><p class="MsoNormal"><span style="color:black"><font face="arial, sans-serif">In this talk, I will present an interesting connection between normalization methods and adaptive gradient methods, and provide rigorous justification for why these methods require less hyper-parameters tuning. Meanwhile, I will talk about convergence results for adaptive gradient methods in general non-convex landscapes and two-layer over-parameterized neural networks.  Beyond convergence, I will also show a new perspective on the implicit regularization in these normalization algorithms.</font></span></p><p class="MsoNormal"><span style="color:black"><font face="arial, sans-serif"><br></font></span></p><p class="MsoNormal"><font face="arial, sans-serif"><span style="color:black"><b>Bio:</b>         </span><span style="color:rgb(0,0,0)">Xiaoxia (<span>Shirley</span>) Wu is a Ph.D. student at The University of Texas at Austin, advised by Rachel Ward. Previously, she was a research intern, mentored by Léon Bottou, at Facebook AI Research (FAIR) where she worked on batch/weight normalization. She was also a visiting student at Simons Institute for the Theory of Computing (UC Berkeley) in Fall 2018 and Summer 2019, and Institute for Advanced Study (Princeton) in Fall 2019. Her primary research interests lie in the area of optimization, including stochastic and robust optimization. Her current research is on understanding and improving the optimization methods for non-convex landscapes (neural networks), such as adaptive gradient methods and normalization methods. She was a recipient of the UT Austin Graduate School Fellowship.</span></font></p><div style="color:rgb(0,0,0)"><br></div><div style="color:rgb(0,0,0)"><b style="font-family:arial,sans-serif">Host:</b><span style="font-family:arial,sans-serif"> </span><a href="mailto:nati@ttic.edu" target="_blank" style="font-family:arial,sans-serif">Nati Srebro</a> </div><div style="color:rgb(0,0,0)"> <br></div><div style="color:rgb(0,0,0)"><font face="arial, sans-serif">----------------------------------------------------------------------------------------------------------</font></div></div><div style="color:rgb(0,0,0)"><br></div><div style="color:rgb(0,0,0)">Register in advance for this meeting:<br><a href="https://uchicago.zoom.us/meeting/register/tJwsf-CqpjMvG9E2RX-3yRQsaF-T-4eCeGQO" target="_blank">https://uchicago.zoom.us/meeting/register/tJwsf-CqpjMvG9E2RX-3yRQsaF-T-4eCeGQO</a><br><br>After registering, you will receive a confirmation email containing information about joining the meeting.</div><div></div><div><br></div>-- <br><div dir="ltr" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><b><font color="#0b5394">Alicia McClarin</font></b><div><div><font color="#0b5394"><i>Toyota Technological Institute at Chicago</i></font></div><div><div><font color="#0b5394"><i>6045 S. Kenwood Ave., </i></font><i style="color:rgb(11,83,148)">Office 504</i></div><div><i style="color:rgb(11,83,148)">Chicago, IL 60637</i><br></div></div><div><i style="color:rgb(11,83,148)">773-834-3321</i><i style="color:rgb(11,83,148)"><br></i></div><div><a href="http://www.ttic.edu/" target="_blank"><font color="#0b5394"><i>www.ttic.edu</i></font></a></div></div></div></div></div></div></div></div></div></div></div></div>