[Colloquium] Xin Yuan Candidacy Exam/Mar 4, 2024
Megan Woodward
meganwoodward at uchicago.edu
Mon Feb 19 09:55:17 CST 2024
This is an announcement of Xin Yuan's Candidacy Exam.
===============================================
Candidate: Xin Yuan
Date: Monday, March 04, 2024
Time: 2 pm CST
Remote Location: https://uchicago.zoom.us/j/92873963052?pwd=VzdkbUhNazlJbDczQjJqVm1GMjVwQT09
Location: JCL 298
Title: Factorized Diffusion Architectures for Unsupervised Image Generation and Representation Learning
Abstract: Supervised deep learning yields powerful discriminative representations, and has fundamentally advanced many computer vision tasks. Yet, annotation efforts, especially those involving fine-grained labeling for tasks, can become prohibitively expensive to scale with increasing dataset size. This motivates unsupervised methods for visual representation learning and generation, which do not require any annotated data during a large-scale pre-training phase.
We develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Learning is driven entirely by the denoising diffusion objective, without any annotation or prior knowledge about regions during training. A computational bottleneck, built into the neural architecture, encourages the denoising network to partition an input into regions, denoise them in parallel, and combine the results. Our trained model generates both synthetic images and, by simple examination of its internal predicted partitions, a semantic segmentation of those images. Without any finetuning, we directly apply our unsupervised model to the downstream task of segmenting real images via noising and subsequently denoising them. Experiments demonstrate that our model achieves accurate unsupervised image segmentation and high-quality synthetic image generation across multiple datasets.
We extend the factorized diffusion paradigm to another challenging application: training a Neural Radiance Field (NeRF) model without any camera pose annotations. Specifically, during NeRF training, the camera extrinsic parameters and 2D reconstructions are generated simultaneously through a carefully designed computational bottleneck and differentiable volume renderer in a unified denoising diffusion process. After training finishes, a learned 3D model is available for novel-view 2D image generation from either a manually designed camera path or pure 2D noise input.
Advisors: Michael Maire
Committee Members: Michael Maire, Rana Hanocka, Anand Bhattad, and Greg Shakhnarovich
Paper link: https://arxiv.org/abs/2309.15726
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20240219/7adaa6fa/attachment.html>
More information about the Colloquium
mailing list