<div dir="ltr"><div class="gmail_default" style="font-family:georgia,serif;font-size:small;color:rgb(0,0,0)"><span style="letter-spacing:0.2px"><b>When:    </b>Wednesday, May 21st<b> at </b></span><b style="letter-spacing:0.2px"><span style="background-color:rgb(255,255,0)">11AM CT</span></b></div><div><div class="gmail_default"><div class="gmail_default"><b style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></b></div><div class="gmail_default"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Where:   </b>Talk will be given<span style="background-color:rgb(255,255,0)"> </span><span style="background-color:rgb(255,255,0)"><font style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font></span>at</font></div><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font color="#000000" face="georgia, serif">                    TTIC, 6045 S. Kenwood Avenue</font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font color="#000000" face="georgia, serif">                    5th Floor, Room 530<b>  </b></font></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></b></p><p class="MsoNormal" style="margin:0in;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Virtually: </b><span style="letter-spacing:0.2px">via Panopto (</span><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=5f290475-5b5e-4666-a579-b2db01422f65" target="_blank" style="letter-spacing:0.2px">Livestream</a><span style="letter-spacing:0.2px">)</span><br></font></p><div class="gmail_default"><b style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></b></div><div class="gmail_default"><font face="georgia, serif" color="#000000"><span style="letter-spacing:0.2px"><b>Who:      </b></span>Abhishek Panigrahi, Princeton</font></div><div class="gmail_default"><span style="letter-spacing:0.2px"><font face="georgia, serif" color="#000000"><br></font></span></div><div class="gmail_default"><font face="georgia, serif" color="#000000"><b style="letter-spacing:0.2px">Title:</b>       Efficient “curriculum-based” training: Theoretical modeling through synthetic testbeds</font></div><div class="gmail_default"><font face="georgia, serif" color="#000000"><br><b style="letter-spacing:0.2px">Abstract: </b><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">In the current age of deep learning, more compute typically means better performance. However, alternate strategies have emerged for training smaller models more efficiently by introducing structured supervision during training. In this talk, I’ll explore how </span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;font-weight:700;vertical-align:baseline">synthetic testbeds</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> help uncover the effectiveness of such methods—and reveal the role of </span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;font-weight:700;vertical-align:baseline">curriculum</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> in accelerating learning.</span></font></div><span id="gmail-docs-internal-guid-d31d0531-7fff-fca6-0ed6-cf1b5c8e2d4d"><font face="georgia, serif" color="#000000"><p dir="ltr" style="line-height:1.656;margin-top:12pt;margin-bottom:12pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">I will present two recent works. The first investigates </span><span style="background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">progressive distillation</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">, where student models learn not only from a final teacher checkpoint but also from its intermediate checkpoints. Using </span><span style="background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">sparse parity</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> as a testbed, we identify an </span><span style="background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">implicit curriculum</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> available only through these intermediate checkpoints—leading to both empirical speedup and provable sample complexity gains. We extend the underlying curriculum ideas to pre-training transformers on real-world datasets (Wikipedia and Books), where intermediate checkpoints </span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;text-decoration-line:underline;vertical-align:baseline">are found to</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> progressively capture longer-range context dependencies.</span></p><p dir="ltr" style="line-height:1.656;margin-top:12pt;margin-bottom:12pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">The second part focuses on </span><span style="background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">context-enhanced learning</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">, a gradient-based analog of in-context learning (ICL) where models are trained with extra contextual information provided in-context but removed at evaluation, with no gradient computations on this extra information.  In a </span><span style="background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">multi-step reasoning task</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">, we prove that context-enhanced learning can be </span><span style="background-color:transparent;font-weight:700;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">exponentially more sample-efficient</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> than standard training, provided the model is ICL-capable. We also experimentally demonstrate that it appears hard to detect or recover learning materials that were used in the context during training. This may have implications for data security as well as copyright.</span></p><p dir="ltr" style="line-height:1.656;margin-top:0pt;margin-bottom:0pt"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"><font size="1">References for the above works:</font></span></p><p dir="ltr" style="line-height:1.656;margin-top:0pt;margin-bottom:0pt"><font size="1"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">1. Progressive distillation induces an implicit curriculum. </span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">ICLR’25</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> (Oral). Abhishek Panigrahi*, Bingbin Liu*, Sadhika Malladi, Andrej Risteski, Surbhi Goel</span></font></p><font size="1"><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">2. On the Power of Context-Enhanced Learning in LLMs. </span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline">ICML'25</span><span style="background-color:transparent;font-variant-numeric:normal;font-variant-east-asian:normal;font-variant-alternates:normal;vertical-align:baseline"> (Spotlight). Xingyu Zhu*, Abhishek Panigrahi*, Sanjeev Arora</span></font></font></span><div><font face="georgia, serif" color="#000000"><br></font></div><div><div class="gmail_default"><font face="georgia, serif" color="#000000"><b>Bio:</b> <span style="background-color:transparent">I’m a fifth-year Ph.D. student in Computer Science at Princeton University, advised by Prof. Sanjeev Arora. My research centers on developing mathematical models to understand and improve the efficiency and robustness of training deep learning models. I am an Apple AI/ML Ph.D. scholar for the year 2025-26.</span></font></div><div><font color="#000000" face="georgia, serif"><br></font></div></div></div><div><div class="gmail_default"><b><font face="georgia, serif" color="#000000">Host: <a href="mailto:zhiyuanli@ttic.edu">Zhiyuan Li </a></font></b></div></div><font color="#888888"><br clear="all"></font></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><b><font color="#3d85c6">Brandie Jones </font></b><div><div><div><font color="#3d85c6"><b><i>Executive </i></b></font><b style="color:rgb(61,133,198)"><i>Administrative Assistant</i></b></div></div><div><font color="#3d85c6">Toyota Technological Institute</font></div><div><font color="#3d85c6">6045 S. Kenwood Avenue</font></div><div><font color="#3d85c6">Chicago, IL  60637</font></div></div><div><font color="#3d85c6"><a href="http://www.ttic.edu" target="_blank">www.ttic.edu</a> <br></font></div><div></div></div></div></div>