<html>


<head>


<meta http-equiv="Content-Type" content="text/html; charset=utf-8">


</head>


<body>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText">This is an announcement of Greg Pauloski's MS Presentation<br>


===============================================<br>


Candidate: Greg Pauloski<br>


<br>


Date: Wednesday, March 23, 2022<br>


<br>


Time:  3 pm CST<br>


<br>


Remote Location: <a href="https://uchicago.zoom.us/j/96526880582?pwd=V2pjQ3pvdFk3cWFNWWJyNW9SUnVDZz09">


https://uchicago.zoom.us/j/96526880582?pwd=V2pjQ3pvdFk3cWFNWWJyNW9SUnVDZz09</a><br>


<br>


M.S. Paper Title: Scalable Deep Neural Network Training with Distributed K-FAC<br>


<br>


Abstract: Scaling deep neural network training to more processors and larger batch sizes is key to reducing end-to-end training time; yet, maintaining comparable convergence and hardware utilization at larger scales is challenging. Increases in training scales


 have enabled natural gradient optimization methods as a reasonable alternative to stochastic gradient descent (SGD) and variants thereof. Kronecker-factored Approximate Curvature (K-FAC), a natural gradient method, has recently been shown to converge with


 fewer iterations in deep neural network (DNN) training than SGD; however, K-FAC’s larger memory footprint and increased communication necessitates careful distribution of work for efficient usage. This thesis investigates scalable K-FAC algorithms to understand


 K-FAC’s applicability in large-scale deep neural network training and presents KAISA, a K-FAC-enabled, Adaptable, Improved, and Scalable second-order optimizer framework. Specifically, layer-wise distribution strategies, inverse-free second-order gradient


 evaluation, dynamic K-FAC update decoupling, and more are explored with the goal of preserving convergence while minimizing training time. KAISA can adapt the memory footprint, communication, and computation given specific models and hardware to improve performance


 and increase scalability, and this adaptable distribution scheme generalizes existing strategies while providing a framework for scaling second-order methods beyond K-FAC. Compared to the original optimizers, KAISA converges 18.1–36.3% faster across applications


 with the same global batch size. Under a fixed memory budget, KAISA converges 32.5% and 41.6% faster in ResNet-50 and BERT-Large, respectively. KAISA can balance memory and communication to achieve scaling efficiency equal to or better than the baseline optimizers.<br>


<br>


Advisors: Kyle Chard and Ian Foster<br>


<br>


Committee Members: Kyle Chard, Ian Foster, and Zhao Zhang<br>


<br>


</div>


</span></font></div>


<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">


<div class="PlainText"><br>


<br>


</div>


</span></font></div>


</body>


</html>