[Theory] REMINDER: 9/23 TTIC Colloquium: Michael W. Mahoney, University of California at Berkeley
Mary Marre
mmarre at ttic.edu
Sun Sep 22 19:12:59 CDT 2019
*When:* Monday, September 23th at 11:00 am
*Where:* TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
*Who: * Michael W. Mahoney, University of California at Berkeley
*Title: * Why Deep Learning Works: Traditional and Heavy-Tailed
Implicit Self-Regularization in Deep Neural Networks
*Abstract:* Random Matrix Theory (RMT) is applied to analyze the weight
matrices of Deep Neural Networks (DNNs), including both production quality,
pre-trained models and smaller models trained from scratch. Empirical and
theoretical results clearly indicate that the DNN training process itself
implicitly implements a form of self-regularization, implicitly sculpting a
more regularized energy or penalty landscape. In particular, the empirical
spectral density (ESD) of DNN layer matrices displays signatures of
traditionally-regularized statistical models, even in the absence of
exogenously specifying traditional forms of explicit regularization.
Building on relatively recent results in RMT, most notably its extension to
Universality classes of Heavy-Tailed matrices, and applying them to these
empirical results, we develop a theory to identify 5+1 Phases of Training,
corresponding to increasing amounts of implicit self-regularization. For
smaller and/or older DNNs, this implicit self-regularization is like
traditional Tikhonov regularization, in that there appears to be a ``size
scale'' separating signal from noise. For state-of-the-art DNNs, however,
we identify a novel form of heavy-tailed self-regularization, similar to
the self-organization seen in the statistical physics of disordered
systems. This implicit self-regularization can depend strongly on the many
knobs of the training process. In particular, by exploiting the
generalization gap phenomena, we demonstrate that we can cause a small
model to exhibit all 5+1 phases of training simply by changing the batch
size. This demonstrates that---all else being equal---DNN optimization
with larger batch sizes leads to less-well implicitly-regularized models,
and it provides an explanation for the generalization gap phenomena.
Coupled with work on energy landscapes and heavy-tailed spin glasses, it
also suggests an explanation of why deep learning works. Joint work with
Charles Martin of Calculation Consulting, Inc.
Host: Avrim Blum <avrim at ttic.edu>
Mary C. Marre
Administrative Assistant
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL 60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*
On Mon, Sep 16, 2019 at 6:03 PM Mary Marre <mmarre at ttic.edu> wrote:
> *When:* Monday, September 23th at 11:00 am
>
>
>
> *Where:* TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>
>
>
> *Who: * Michael W. Mahoney, University of California at Berkeley
>
>
> *Title: * Why Deep Learning Works: Traditional and Heavy-Tailed
> Implicit Self-Regularization in Deep Neural Networks
>
> *Abstract:* Random Matrix Theory (RMT) is applied to analyze the weight
> matrices of Deep Neural Networks (DNNs), including both production quality,
> pre-trained models and smaller models trained from scratch. Empirical and
> theoretical results clearly indicate that the DNN training process itself
> implicitly implements a form of self-regularization, implicitly sculpting a
> more regularized energy or penalty landscape. In particular, the empirical
> spectral density (ESD) of DNN layer matrices displays signatures of
> traditionally-regularized statistical models, even in the absence of
> exogenously specifying traditional forms of explicit regularization.
> Building on relatively recent results in RMT, most notably its extension to
> Universality classes of Heavy-Tailed matrices, and applying them to these
> empirical results, we develop a theory to identify 5+1 Phases of Training,
> corresponding to increasing amounts of implicit self-regularization. For
> smaller and/or older DNNs, this implicit self-regularization is like
> traditional Tikhonov regularization, in that there appears to be a ``size
> scale'' separating signal from noise. For state-of-the-art DNNs, however,
> we identify a novel form of heavy-tailed self-regularization, similar to
> the self-organization seen in the statistical physics of disordered
> systems. This implicit self-regularization can depend strongly on the many
> knobs of the training process. In particular, by exploiting the
> generalization gap phenomena, we demonstrate that we can cause a small
> model to exhibit all 5+1 phases of training simply by changing the batch
> size. This demonstrates that---all else being equal---DNN optimization
> with larger batch sizes leads to less-well implicitly-regularized models,
> and it provides an explanation for the generalization gap phenomena.
> Coupled with work on energy landscapes and heavy-tailed spin glasses, it
> also suggests an explanation of why deep learning works. Joint work with
> Charles Martin of Calculation Consulting, Inc.
>
>
> Host: Avrim Blum <avrim at ttic.edu>
>
>
>
> Mary C. Marre
> Administrative Assistant
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL 60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20190922/7f1a02f8/attachment.html>
More information about the Theory
mailing list