[Colloquium] TODAY Sadhika Malladi (Princeton) Deep Learning Theory in the Age of Generative AI

Tue Feb 4 08:54:28 CST 2025

Department of Computer Science and Data Science Institute Colloquium Presents

Sadhika Malladi
Princeton University
PhD Candidate, Computer Science

Tuesday, February 4th 
2:00pm - 3:00pm 
In-Person: John Crerar Library Rm 390

Title: Deep Learning Theory in the Age of Generative AI

Abstract: Large neural networks, like language models (LMs), have demonstrated remarkable success in executing complex tasks, but little is understood about why these models work and how various design choices affect model behavior. Performing thorough empirical ablations to understand modern-day training paradigms is generally computationally infeasible, underscoring the need for theory-driven insights and improvements. However, traditional theoretical analysis of deep networks usually requires restrictive assumptions that are far from practical settings.

In this talk, I will present flexible yet rigorous theoretical frameworks for understanding LM pre-training and fine-tuning, along with their algorithmic implications. For fine-tuning, I propose a formal understanding of fine-tuning that motivates the design of MeZO, a zeroth-order optimizer that reduces memory consumption by up to 12x while preserving performance. I will also discuss recent work exposing surprising failure modes of preference learning, a specialized form of fine-tuning used to steer LMs to exhibit desired behaviors. In the pre-training regime, I use stochastic differential equations (SDEs) to design principled and efficient hyperparameter selection algorithms for highly distributed training settings. I will conclude by exploring promising directions for co-developing deep learning theory and practice.

Bio: Sadhika Malladi is a final-year PhD student in Computer Science at Princeton University advised by Sanjeev Arora. Her research advances deep learning theory to capture modern-day training settings, yielding practical training improvements and meaningful insights into model behavior. She has co-organized multiple workshops, including Mathematical and Empirical Understanding of Foundation Models at ICLR 2024 and Mathematics for Modern Machine Learning (M3L) at NeurIPS 2024. She was named a 2025 Siebel Scholar.

Host:  Mike Franklin

---
Holly Santos
Executive Assistant to Hank Hoffmann, Liew Family Chair
Department of Computer Science
The University of Chicago
5730 S Ellis Ave-217   Chicago, IL 60637
P: 773-834-8977
hsantos at uchicago.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20250204/c1e41673/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: personal_photo.jpeg
Type: image/jpeg
Size: 25113 bytes
Desc: not available
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20250204/c1e41673/attachment-0001.jpeg>