[Colloquium] Xin Yuan Dissertation Defense/Jun 5, 2024

Megan Woodward via Colloquium colloquium at mailman.cs.uchicago.edu
Tue Jun 4 08:55:17 CDT 2024


This is an announcement of Xin Yuan's Dissertation Defense.
===============================================
Candidate: Xin Yuan

Date: Wednesday, June 05, 2024

Time:  3 pm CT

Remote Location: https://uchicago.zoom.us/j/94758197575?pwd=c2krT2JWUGVJM2Z1eXNmLzNVNzZ4Zz09

Location: JCL 298

Title: Interpretable Unsupervised Generative Learning via Factorized Architectures and Structured Bottlenecks

Abstract: In this thesis, we propose an innovative paradigm for constructing generative models, fundamentally rethinking the conventional framework used in image generation and representation learning. Our approach centers around designing a domain-specific architecture that enables unified, unsupervised image generation and representation learning. This architecture incorporates a meticulously engineered bottleneck data structure, which is crafted with an acute understanding of the specific requirements of the task at hand, the characteristics of the data involved, and the computational constraints inherent to the problem. This bottleneck structure is pivotal, as it directly addresses the tasks to be solved by facilitating a learning process that generates useful outputs without reliance on direct supervision. This stands in stark contrast to traditional methodologies, which typically involve training large-scale foundation models in a self-supervised manner and subsequently fine-tuning them on annotated data for specific downstream tasks. Our proposed method eliminates the need for such fine-tuning and does not require annotated data at any stage of the pre-training process.

To demonstrate the effectiveness and robustness of our proposed design, we have conducted extensive validation across a variety of challenging tasks, each chosen to test different facets of the model under diverse experimental settings. These tests are crucial for proving the versatility and applicability of our approach in real-world scenarios, showcasing its potential to handle complex, unsupervised learning tasks in two experimental settings:

For the first experimental setting, we develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Learning is driven entirely by the denoising diffusion objective, without any annotation or prior knowledge about regions during training. A computational bottleneck, built into the neural architecture, encourages the denoising network to partition an input into regions, denoise them in parallel, and combine the results. Our trained model generates both synthetic images and, by simple examination of its internal predicted partitions, a semantic segmentation of those images. Without any finetuning, we directly apply our unsupervised model to the downstream task of segmenting real images via noising and subsequently denoising them.

For the second experimental setting, we develop a new framework for generative lifting from 2D to 3D by training a Neural Radiance Field (NeRF) model without any camera pose annotations. Specifically, during NeRF training, the camera extrinsic parameters, serving as the local latent information are generated along with 2D reconstructions through a carefully designed computational bottleneck and differentiable volume renderer in a unified denoising diffusion process. After training finishes, the global latent information, a 3D model is learned for novel-view 2D image generation from either a manually designed camera path or pure 2D noise input.


Advisors: Michael Maire

Committee Members: Michael Maire, Rana Hanocka, Greg Shakhnarovich, and Anand Bhattad



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20240604/d3fbce08/attachment-0001.html>


More information about the Colloquium mailing list