[Theory] [TTIC Talks] Talks at TTIC: Jiaxin Shi, DeepMind
Brandie Jones via Theory
theory at mailman.cs.uchicago.edu
Mon Feb 24 13:00:00 CST 2025
*When:* Friday, February 28th at *2pm** CT*
*Where: *Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
*Virtually:* via Panopto (livestream
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=74ac5e95-1dd5-4440-b01e-b28e0122793b>
)
*Who: * Jiaxin Shi, DeepMind
*Title*: Discrete Generative Modeling with Masked Diffusions
*Abstract*: Modern generative AI has developed along two distinct paths:
autoregressive models for discrete data (such as text) and diffusion models
for continuous data (like images). Bridging this divide by adapting
diffusion models to handle discrete data represents a compelling avenue for
unifying these disparate approaches. However, existing work in this area
has been hindered by unnecessarily complex model formulations and unclear
relationships between different perspectives, leading to suboptimal
parameterization, training objectives, and ad hoc adjustments to counteract
these issues. In this talk, I will introduce masked diffusion models, a
simple and general framework that unlock the full potential of diffusion
models for discrete data. We show that the continuous-time variational
objective of such models is a simple weighted integral of cross-entropy
losses. Our framework also enables training generalized masked diffusion
models with state-dependent masking schedules. When evaluated by
perplexity, our models trained on OpenWebText surpass prior diffusion
language models at GPT-2 scale and demonstrate superior performance on 4
out of 5 zero-shot language modeling tasks. Furthermore, our models vastly
outperform previous discrete diffusion models on pixel-level image
modeling, achieving 2.75 (CIFAR-10) and 3.40 (ImageNet 64×64) bits per
dimension that are better than autoregressive models of similar sizes.
*Bio*: Jiaxin Shi is a research scientist at Google DeepMind. Previously,
he was a postdoctoral researcher at Stanford and Microsoft Research New
England. He obtained his Ph.D. from Tsinghua University. His research
interests broadly involve probabilistic and algorithmic models for learning
as well as the interface between them. Jiaxin served as an area chair for
NeurIPS and AISTATS. He is a recipient of Microsoft Research PhD
fellowship. His first-author paper was recognized by a NeurIPS 2022
outstanding paper award.
--
*Brandie Jones *
*Executive **Administrative Assistant*
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL 60637
www.ttic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20250224/bb76868c/attachment.html>
More information about the Theory
mailing list