[Colloquium] [TTIC Talks] 2/27 Talks at TTIC: Jason Lee, USC

Alicia McClarin amcclarin at ttic.edu
Tue Feb 26 10:02:41 CST 2019


When:     Wednesday, February 27th at *11:00 am*

Where:    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

Who:       Jason Lee, USC


*Title:       *On the Foundations of Deep Learning: SGD,
Overparametrization, and Generalization



*Abstract: *We provide new results on the effectiveness of SGD and
overparametrization in deep learning.



a) SGD: We show that SGD converges to stationary points for general
nonsmooth , nonconvex functions, and that stochastic subgradients can be
efficiently computed via Automatic Differentiation. For smooth functions,
we show that gradient descent, coordinate descent, ADMM, and many other
algorithms, avoid saddle points and converge to local minimizers. For a
large family of problems including matrix completion and shallow ReLU
networks, this guarantees that gradient descent converges to a global
minimum.



b) Overparametrization: We show that gradient descent finds global
minimizers of the training loss of overparametrized deep networks in
polynomial time.



c) Generalization:

For general neural networks, we establish a margin-based theory. The
minimizer of the cross-entropy loss with weak regularization is a
max-margin predictor, and enjoys stronger generalization guarantees as the
amount of overparametrization increases.



d) Algorithmic and Implicit Regularization: We analyze the implicit
regularization effects of various optimization algorithms on
overparametrized networks. In particular we prove that for least squares
with mirror descent, the algorithm converges to the closest solution in
terms of the bregman divergence. For linearly separable classification
problems, we prove that the steepest descent with respect to a norm solves
SVM with respect to the same norm. For over-parametrized non-convex
problems such as matrix sensing or neural net with quadratic activation, we
prove that gradient descent converges to the minimum nuclear norm solution,
which allows for both meaningful optimization and generalization guarantees.





Host:  Nathan Srebro <nati at ttic.edu>

-- 
*Alicia McClarin*
*Toyota Technological Institute at Chicago*
*6045 S. Kenwood Ave., **Office 510*
*Chicago, IL 60637*
*773-702-5370*
*www.ttic.edu* <http://www.ttic.edu/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20190226/f113ad07/attachment-0001.html>


More information about the Colloquium mailing list