[Theory] TODAY: [Talks at TTIC] 5/21 Young Researcher Seminar Series: Abhishek Panigrahi, Princeton

Brandie Jones via Theory theory at mailman.cs.uchicago.edu
Wed May 21 09:00:00 CDT 2025


*When:    *Wednesday, May 21st* at **11AM CT*

*Where:   *Talk will be given *live, in-person* at

                    TTIC, 6045 S. Kenwood Avenue

                    5th Floor, Room 530


*Virtually: *via Panopto (Livestream
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=5f290475-5b5e-4666-a579-b2db01422f65>
)

*Who:      *Abhishek Panigrahi, Princeton

*Title:*       Efficient “curriculum-based” training: Theoretical modeling
through synthetic testbeds

*Abstract: *In the current age of deep learning, more compute typically
means better performance. However, alternate strategies have emerged for
training smaller models more efficiently by introducing structured
supervision during training. In this talk, I’ll explore how synthetic
testbeds help uncover the effectiveness of such methods—and reveal the role
of curriculum in accelerating learning.

I will present two recent works. The first investigates progressive
distillation, where student models learn not only from a final teacher
checkpoint but also from its intermediate checkpoints. Using sparse parity
as a testbed, we identify an implicit curriculum available only through
these intermediate checkpoints—leading to both empirical speedup and
provable sample complexity gains. We extend the underlying curriculum ideas
to pre-training transformers on real-world datasets (Wikipedia and Books),
where intermediate checkpoints are found to progressively capture
longer-range context dependencies.

The second part focuses on context-enhanced learning, a gradient-based
analog of in-context learning (ICL) where models are trained with extra
contextual information provided in-context but removed at evaluation, with
no gradient computations on this extra information.  In a multi-step
reasoning task, we prove that context-enhanced learning can be exponentially
more sample-efficient than standard training, provided the model is
ICL-capable. We also experimentally demonstrate that it appears hard to
detect or recover learning materials that were used in the context during
training. This may have implications for data security as well as copyright.

References for the above works:

1. Progressive distillation induces an implicit curriculum. ICLR’25 (Oral).
Abhishek Panigrahi*, Bingbin Liu*, Sadhika Malladi, Andrej Risteski, Surbhi
Goel
2. On the Power of Context-Enhanced Learning in LLMs. ICML'25 (Spotlight).
Xingyu Zhu*, Abhishek Panigrahi*, Sanjeev Arora

*Bio:* I’m a fifth-year Ph.D. student in Computer Science at Princeton
University, advised by Prof. Sanjeev Arora. My research centers on
developing mathematical models to understand and improve the efficiency and
robustness of training deep learning models. I am an Apple AI/ML Ph.D.
scholar for the year 2025-26.

*Host: Zhiyuan Li <zhiyuanli at ttic.edu>*

-- 
*Brandie Jones *
*Executive **Administrative Assistant*
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL  60637
www.ttic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20250521/93f57416/attachment.html>


More information about the Theory mailing list