[Colloquium] Mudrakarta/Dissertation Defense/Aug 19, 2019

Thu Aug 1 13:26:24 CDT 2019

       Department of Computer Science/The University of Chicago

                     *** Dissertation Defense ***

Candidate:  Pramod Kaushik Mudrakarta

Date:  Monday, August 19, 2019

Time:  1:00 PM

Place:  John Crerar Library (JCL) 298

Title: Challenges in Modern Machine Learning: Multiresolution
Structure, Model Understanding and Transfer Learning

Abstract:
Recent advances in Artificial Intelligence (AI) are characterized by
ever-increasing sizes of datasets and the reemergence of neural
network models. The modern AI pipeline begins with building datasets,
followed by design and training machine-learning models, and finally
deployment in the real world. We tackle three challenges posed by this
technological advancement, one from each part of the pipeline: 1)
efficiently manipulating large matrices arising in real-world datasets
(e.g., graph Laplacians from social network datasets), 2) interpreting
deep-neural-network models, and 3) efficiently deploying hundreds of
deep-neural-network models on embedded devices.

Matrices arising in large, real-world datasets are oftentimes found to
have high rank (e.g., graph Laplacians), rendering common
matrix-manipulation approaches that are based on the low-rank
assumption ineffective. In the first part of this thesis, we build
upon Multiresolution Matrix Factorization (MMF), a method originally
proposed to perform multiresolution analysis on graphs, and can
consequently model hierarchical structure in symmetric matrices as a
matrix factorization. We provide a novel, parallel algorithm for
computing the factorization that can scale up to matrices with a
million rows and columns. We then showcase an application of MMF,
wherein we demonstrate a preconditioner that accelerates iterative
algorithms solving systems of linear equations. Among wavelet-based
preconditioners, the MMF-preconditioner consistently results in faster
convergence and is highly scalable. Finally, we propose a variant of
MMF that can accurately compress matrices by exploiting hierarchical
structure in them.

In the second part of the thesis, we address the black-box nature of
deep-neural-network models. The goodness of a deep-neural-network
model is typically measured by its test accuracy. We argue that it is
an incomplete measure, and show that state-of-the-art
question-answering models often ignore important question terms. We
perform a case study of a question-answering model and expose various
ways in which the network gets the right answer for the wrong reasons.
We propose a human-in-the-loop workflow based on the notion of
attribution (word-importance) to understand the input-output behavior
of neural network models, extract rules, identify weaknesses and
construct adversarial attacks by leveraging the weaknesses. Our
strongest attacks drop the accuracy of a visual question answering
model from 61.1% to 19%, and that of a tabular question answering
model from 33.5% to 3.3%. We propose a measure for overstability - the
tendency of a model to rely on trigger logic and ignore semantics. We
use a path-sensitive attribution method to extract contextual synonyms
(rules) learned by a model. We discuss how attributions can augment
standard measures of accuracy and empower investigation of model
performance. We finish by identifying opportunities for research:
abstraction tools that aid the debugging process, concepts and
semantics of path-sensitive dataflow analysis, and formalizing the
process of verifying natural-language-based specifications.

The third challenge pertains to real-world deployment of
deep-neural-network models. With the proliferation of personal devices
such as phones, smart assistants, etc., the grounds for much of the
human-AI interactions as shifted away from the cloud. While this has
critical advantages such as user privacy and faster response times, as
the space of deep-learning-based applications expands, limited
availability of memory on these devices makes deploying hundreds of
models impractical. We tackle the problem of re-purposing trained
deep-neural-network models to new tasks while keeping most of the
learned weights intact. Our method introduces the concept of a ``model
patch'' -- a set of small, trainable layers -- that can be applied to
an existing trained model to adapt it to a new task. While keeping
more than 98% of the weights intact, we show significantly higher
transfer-learning performance from an object-detection task to an
image-classification task, compared to traditional last-layer
fine-tuning, among other results. We show how the model-patch idea can
be used in multitask learning, where, despite using significantly
fewer parameters, we incur zero accuracy loss compared to single-task
performance for all the involved tasks.

Pramod Kaushik's advisors are Prof. Risi Kondor and Prof. Kevin Gimpel

Login to the Computer Science Department website for details,
including a draft copy of the dissertation:

 https://newtraell.cs.uchicago.edu/phd/phd_announcements#pramodkm

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=