[Colloquium] 2/25 Jiachen Wang (Princeton) Fueling Responsible AI Advancement with Data Attribution

Holly Santos via Colloquium colloquium at mailman.cs.uchicago.edu
Thu Jan 30 13:34:17 CST 2025


Department of Computer Science and Data Science Institute Presents

Jiachen Wang
PhD Candidate
Princeton University

Tuesday, February 25th
2:00pm - 3:00pm 
In-Person: John Crerar Library Rm 390

Title: Fueling Responsible AI Advancement with Data Attribution

Abstract: As artificial intelligence (AI) systems expand across society, understanding how training data shapes model behavior has become fundamental to building trustworthy AI. Data attribution techniques quantify the influence of individual training samples on machine learning models, enabling us to address pressing challenges around data quality, training efficiency, copyright disputes, and interpretability. 
In this talk, I will present our advances in developing theoretically rigorous yet practical data attribution methods. First, I will introduce Data Banzhaf, a data value notion derived from cooperative game theory that provides provably robust data influence estimation for any learning algorithms. While this provides a general framework, we then develop specialized techniques to analyze how data influence evolves during deep learning optimization. Through this lens, we uncover that examples from early and late training stages have an outsized impact on foundation model pretraining—insights that enable strategic data selection to reduce computational overhead while maintaining model performance.

Bio: Jiachen ("Tianhao") is a Ph.D. student at Princeton University, advised by Prof. Prateek Mittal. His research focuses on developing theoretical foundations and practical tools for trustworthy machine learning from a data-centric perspective. Most recently, he has been developing scalable, theoretically grounded data attribution and curation techniques for foundation models. His contributions have been recognized through multiple fellowships and oral/spotlight presentations at top AI/ML venues. He was selected as a Rising Star in Data Science in 2024.

Host: Raul Castro Fernandez


—

Holly Santos
Executive Assistant to Hank Hoffmann, Liew Family Chair
Department of Computer Science
The University of Chicago
5730 S Ellis Ave-217   Chicago, IL 60637
P: 773-834-8977
hsantos at uchicago.edu






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20250130/8bce2724/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: headshot_Jiachen_Wang_1.jpeg
Type: image/jpeg
Size: 15157 bytes
Desc: not available
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20250130/8bce2724/attachment-0001.jpeg>


More information about the Colloquium mailing list