[Theory] [TTIC Talks] 1/23 Talks at TTIC: Will Merrill, New York University

Thu Jan 16 09:00:00 CST 2025

*When:*        Thursday, January 23rd at *10**AM CT*

*Where:       *Talk will be given *live, in-person* at

                       TTIC, 6045 S. Kenwood Avenue

                       5th Floor, Room 530

*Virtually:*  via Panopto (livestream
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=de3ed27c-aea1-474e-ace4-b2610184c023>
)

*Who: *         Will Merrill, New York University

*Title:*         Theoretical Computer Science as a Lens to Understand and
Improve Large Language Models
*Abstract:  *Scaling up large language models has enabled tremendous
progress in NLP and deep learning, but how far can this paradigm be pushed?
In this talk, I will discuss my body of theoretical results on the
expressive power of language modeling architectures, and how these results
bear on this question. I will start with my theoretical result that
transformers (without chain of thought) can only express problems in the
complexity class uniform TC0 and thus cannot express many simple
computational problems including state tracking, evaluating compositional
formulas, and graph connectivity. I will then discuss my work
characterizing how chain of thought approaches can expand the expressive
power of transformers, as well as my work comparing the expressive power of
state-space models and transformers. Overall, these findings reveal a
fundamental tradeoff between parallelism and expressive power: the
parallelism so essential for scaling up transformer language models also
precludes them from expressing many simple computational problems. These
insights let us more precisely understand the limitations of transformers
and also provide a strong foundation upon which to develop novel language
modeling architectures and inference methods, forming a key part of my
future research agenda.

*Short Bio*: Will is a PhD student at the Center for Data Science at NYU
advised by Tal Linzen and is funded by an NSF Graduate Research Fellowship
and a Two Sigma PhD Fellowship. Will has also worked and interned at the
Allen Institute for AI and Google Research. A major focus of Will’s
research has been to characterize the computational power and limitations
of transformers, with an eye towards understanding how transformer language
models represent linguistic structure and solve reasoning problems. He has
also worked on understanding the foundations of distributional semantics
and helped train OLMo: one of the best fully open large language models.

*Host:  <zhiyuanli at ttic.edu> <nati at ttic.edu> <greg at ttic.edu>Nati Srebro
<http://n>*

-- 
*Brandie Jones *
*Executive **Administrative Assistant*
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL  60637
www.ttic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20250116/799b5da5/attachment-0001.html>