[Colloquium] Xin Yuan Candidacy Exam/Mar 4, 2024

Mon Feb 19 09:55:17 CST 2024

This is an announcement of Xin Yuan's Candidacy Exam.
===============================================
Candidate: Xin Yuan

Date: Monday, March 04, 2024

Time:  2 pm CST

Remote Location: https://uchicago.zoom.us/j/92873963052?pwd=VzdkbUhNazlJbDczQjJqVm1GMjVwQT09

Location: JCL 298

Title: Factorized Diffusion Architectures for Unsupervised Image Generation and Representation Learning

Abstract: Supervised deep learning yields powerful discriminative representations, and has fundamentally advanced many computer vision tasks. Yet, annotation efforts, especially those involving fine-grained labeling for tasks, can become prohibitively expensive to scale with increasing dataset size. This motivates unsupervised methods for visual representation learning and generation, which do not require any annotated data during a large-scale pre-training phase.

We develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Learning is driven entirely by the denoising diffusion objective, without any annotation or prior knowledge about regions during training. A computational bottleneck, built into the neural architecture, encourages the denoising network to partition an input into regions, denoise them in parallel, and combine the results. Our trained model generates both synthetic images and, by simple examination of its internal predicted partitions, a semantic segmentation of those images. Without any finetuning, we directly apply our unsupervised model to the downstream task of segmenting real images via noising and subsequently denoising them. Experiments demonstrate that our model achieves accurate unsupervised image segmentation and high-quality synthetic image generation across multiple datasets.

We extend the factorized diffusion paradigm to another challenging application: training a Neural Radiance Field (NeRF) model without any camera pose annotations. Specifically, during NeRF training, the camera extrinsic parameters and 2D reconstructions are generated simultaneously through a carefully designed computational bottleneck and differentiable volume renderer in a unified denoising diffusion process. After training finishes, a learned 3D model is available for novel-view 2D image generation from either a manually designed camera path or pure 2D noise input.

Advisors: Michael Maire

Committee Members: Michael Maire, Rana Hanocka, Anand Bhattad, and Greg Shakhnarovich

Paper link: https://arxiv.org/abs/2309.15726
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20240219/7adaa6fa/attachment.html>