[Theory] REMINDER: [TTIC Talks] 12/6 Young Researcher Seminar Series: Dingli Yu, Princeton University
Brandie Jones
bjones at ttic.edu
Tue Dec 5 08:00:00 CST 2023
*When:* Wednesday, December 6th, 2023 at* 10:30 am CT *
*Where: *Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
*Virtually:* via Panopto *livestream*
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=aa08dc21-4f8c-44de-b1c7-b0ce01208743>
*Who: * Dingli Yu, Princeton University
*Title:* Feature Learning in Infinite-Depth Neural Networks
*Abstract:* By classifying infinite-width neural networks and identifying
the optimal limit, Tensor Programs IV and V demonstrated a universal way,
called μP, for widthwise hyperparameter transfer, i.e., predicting optimal
hyperparameters of wide neural networks from narrow ones. Here we
investigate the analogous classification for depthwise parametrizations of
deep residual networks (resnets). We classify depthwise parametrizations of
block multiplier and learning rate by their infinite-width-then-depth
limits. In recent where each block has only one layer, we identify a unique
optimal parametrization, called Depth-μP that extends μP and show
empirically it admits depthwise hyperparameter transfer. We identify
feature diversity as a crucial factor in deep networks, and Depth-μP can be
characterized as maximizing both feature learning and feature diversity.
Exploiting this, we find that absolute value, among all homogeneous
nonlinearities, maximizes feature diversity and indeed empirically leads to
significantly better performance. However, if each block is deeper (such as
modern transformers), then we find fundamental limitations in all possible
infinite-depth limits of such parametrizations, which we illustrate both
theoretically and empirically on simple networks as well as Megatron
transformer trained on Common Crawl.
*Bio:* Dingli Yu is a final-year Ph.D. candidate at the Computer Science
Department at Princeton University, advised by Professor Sanjeev Arora. His
research focuses on deep learning theory with an emphasis on its practical
application. His work contributes to the line of research on training
dynamics of overparametrized neural networks, centering around neural
tangent kernel (NTK) and feature learning under the Tensor Program
framework. His recent work also provides practical application of theory to
efficient and robust development of Large Language Models (LLMs).
*Host: Zhiyuan Li <zhiyuanli at ttic.edu>*
**********************************************************************************
The *TTIC Young Researcher Seminar Series *(
http://www.ttic.edu/young-researcher.php) features talks by Ph.D. students
and postdocs whose research is of broad interest to the computer science
community. The series provides an opportunity
for early-career researchers to present recent work to and meet with
students and faculty at TTIC and nearby universities.
--
*Brandie Jones *
*Executive **Administrative Assistant*
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL 60637
www.ttic.edu
Working Remotely on Tuesdays
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20231205/b98915d3/attachment.html>
More information about the Theory
mailing list