<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><div class="" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);"><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><b class="" style="font-size: 14px;">University of Chicago </b></font><b class="" style="font-family: Arial; font-size: 14px;">and Toyota Technological Institute at Chicago</b></div><b class="" style="font-family: Arial; font-size: 14px;">Machine Learning Seminar Series</b><br class="" style="font-family: Arial;"><br class=""><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><font color="#000000" class=""><span class="" style="font-size: 14px;"><b class="">Friday, November 15, 10:30 - 11:30 am<br class=""></b></span></font><font face="Arial" class="">JCL, RM 390</font><br class="" style="font-family: Helvetica;"><b class="" style="font-size: 14px;"><br class=""></b></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><b class="" style="font-size: 14px;"><br class=""></b></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><b class="" style="font-size: 14px;">Greg Ongie</b></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><span class="" style="font-size: 14px;"><b class="">University of Chicago</b></span></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font class=""><br class=""><br class=""><b class="" style="font-family: Arial; font-size: 14px;">Title:</b></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font color="#000000" face="Arial" class=""><span class="" style="font-size: 15.333333015441895px;"><b class="">A function space view of overparameterized neural networks</b></span></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font class=""><b class="" style="font-family: Arial;"><font face="Arial" class="" style="font-weight: normal;"><b class="" style="font-size: 15.333333015441895px;"><br class=""></b></font></b></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><span class=""><font face="Arial" class="" style="font-size: 14px;"><b class="">Abstract:</b><br class=""></font><div class=""><font color="#000000" class=""><span class="" style="font-size: 14px;">Contrary to classical bias/variance tradeoffs, deep learning practitioners have observed that vastly overparameterized neural networks with the capacity to fit virtually any labels nevertheless generalize well when trained on real data. One possible explanation of this phenomenon is that complexity control is being achieved by implicitly or explicitly controlling the magnitude of the weights of the network. This raises the question: What functions are well-approximated by neural networks whose weights are bounded in norm? In this talk, I will give some partial answers to this question. In particular, I will give a precise characterization of the space of functions realizable as a two-layer (i.e., one hidden layer) neural network with ReLU activations having an unbounded number of units, but where the Euclidean norm of the weights in the network remains bounded. Surprisingly, this characterization is naturally posed in terms of the Radon transform as used in computational imaging, and I will show how tools from Radon transform analysis yield novel insights about learning with two and three-layer ReLU networks.</span></font></div><font face="Arial" class=""><span class=""><div dir="auto" class="" style="font-weight: bold; font-size: 14px; word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><span class=""><font face="Arial" class=""><b class=""><br class=""></b></font></span></font></div><div dir="auto" class="" style="font-weight: bold; font-size: 14px; word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><span class=""><font face="Arial" class=""><b class=""><br class=""></b></font></span></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><span class=""><font face="Arial" class=""><span class=""><div dir="auto" class="" style="font-family: Helvetica; word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><b class="" style="font-size: 14px;">Blake Woodworth</b></font></div><div dir="auto" class="" style="font-weight: bold; word-wrap: break-word; line-break: after-white-space;"><span class="" style="font-size: 14px;">TTIC</span></div><div dir="auto" class="" style="font-weight: bold; word-wrap: break-word; line-break: after-white-space;"><span class="" style="font-size: 14px;"><br class=""></span></div><div dir="auto" class="" style="font-weight: bold; word-wrap: break-word; line-break: after-white-space;"><span class="" style="font-size: 14px;"><br class=""></span></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><div dir="auto" class="" style="font-family: Helvetica; word-wrap: break-word; line-break: after-white-space;"><font class=""><br class=""><b class="" style="font-family: Arial; font-size: 14px;">Title:</b></font></div><div dir="auto" class="" style="font-weight: bold; word-wrap: break-word; line-break: after-white-space;"><font color="#000000" class=""><span class="" style="font-size: 15.333333015441895px;">The complexity of finding stationary points in convex and non-convex optimization</span></font></div><div dir="auto" class="" style="font-family: Helvetica; word-wrap: break-word; line-break: after-white-space;"><font class=""><b class="" style="font-family: Arial;"><font face="Arial" class="" style="font-weight: normal;"><b class="" style="font-size: 15.333333015441895px;"><br class=""></b></font></b></font></div><div dir="auto" class="" style="word-wrap: break-word; line-break: after-white-space;"><font face="Arial" class=""><span class=""><font class="" style="font-family: Helvetica; font-size: 14px;"><b class="">Abstract:</b><br class=""></font><div class=""><font color="#000000" class=""><span class="" style="font-size: 14px;">Non-convex optimization<font class=""> algorithms typically guarantee convergence to approximate stationary points of the objective. However, the fundamental complexity of finding such points is poorly understood, even in the convex setting, and especially in comparison to our very thorough understanding of the complexity of finding points with near optimal function value in the convex case. In this talk, I will discuss two recent papers in which we tightly bound the stochastic first-order oracle complexity of finding an approximate stationary point, first for the convex case and then for the non-convex case. An important implication of our work is that, in a certain sense, plain SGD is an optimal algorithm for stochastic non-convex optimization.</font><br class=""><font class=""><b class="">Host: Nati Srebro and Rebecca Willett </b>    </font></span></font></div><div class=""><font color="#000000" class=""><span class="" style="font-size: 14px;"><font class=""></font></span></font></div></span></font></div></div></span></font></span></font></div></span></font></span></font></div></div></div></body></html>