[CS] Jacob Williams Dissertation Defense/Apr 29, 2025

via cs cs at mailman.cs.uchicago.edu
Mon Apr 28 17:17:11 CDT 2025


This is an announcement of Jacob Williams's Dissertation Defense.
===============================================
Candidate: Jacob Williams

Date: Tuesday, April 29, 2025

Time:  3 pm CST

Remote Location: https://uchicago.zoom.us/j/97524950961?pwd=a0W6jKtqWIKAlA26FmabuiRfUU6raK.1

Location: JCL 298

Title: Deep Learning and Generative Methods for NMR Spectroscopy

Abstract: In the era of big data and deep learning, the joining of machine learning (ML) methods with scientific inquiry is one of the most interesting and exciting frontiers. These techniques are already being used to create larger and more complex models, speed up experimentation, and identify new paths of discovery. The study of molecular structures has been a particular focal point for ML in science. As part of progress in drug discovery, protein function and materials sciences, computational methods are being applied to important tasks in understanding molecular interactions, identifying new molecules, and refining the structures of known molecules. A blend of experimental work and deep learning will bring tremendous advancements in these fields.

This dissertation is composed of four works which investigate deep learning and generative methods in their application to the study of molecular structure, particularly through the collection of data from Nuclear Magnetic Resonance (NMR) spectroscopy. NMR is a molecular measurement technique in which a molecule is placed in a large magnetic field and perturbed with RF waves. NMR can be crucial in the determination and verification of the structure of a molecule through the measurement of its NMR spectrum. Within these works, we make significant contributions to the study of molecular structures: 

New machine learning models for generating molecular conformers and for predicting NMR parameters, leading to state of the art performance.

An innovative training method to incorporate multiple sources of data, which allows models to correct for systematic errors in different sources of data. 

A new dataset for the study and future development of techniques in protein structure generation, including more accurate baselines for studying generated structures. 

A novel approach to generating small molecules conditioned on NMR spectra.

Throughout, we demonstrate how improvements in ML for science come not just from more advanced ML techniques, but also from the careful design of experiments and data collection that enhance these techniques.


Advisor: Rebecca Willett

Committee: Rebecca Willett, Risi Kondor, Ian Foster



More information about the cs mailing list