[Colloquium] Sheikhi/MS Presentation/Nov 23, 2016

Wed Nov 9 14:25:53 CST 2016

This is an announcement of Samira Sheikhi's MS Presentation.

------------------------------------------------------------------------------
Date:  Wednesday, November 23, 2016

Time:  10:00 AM

Place:  Ryerson 277

M.S. Candidate:  Samira Sheikhi

M.S. Paper Title: Practical Newton-Type Distributed Learning using
Gradient Based Approximations 

Abstract:
We study distributed algorithms for expected loss minimization where
the datasets are large and have to be stored on different machines. In
these cases we mostly deal with minimizing the average of convex
functions considering each function to be the empirical risk of the
corresponding part of the data. In the distributed setting where the
individual data instances can be accessed only on the local machines,
there would be a series of rounds of local computations followed by
some communication among the machines. Since the cost of the
communication is usually higher than the local machine computations,
it is important to reduce it as much as possible. However, we should
not allow this to make the computation too expensive to become a
burden in practice. Using second-order methods could make the
algorithms converge faster and decrease the amount of communication
needed.

There are some successful attempts in developing distributed
second-order methods. Although these methods have shown fast
convergence, their local computation is usually expensive and could
enjoy more improvement for practical uses. In this study we modify an
existing approach, DANE (Distributed Approximate Newton), in order to
improve this aspect while maintaining the accuracy. We tackle this
problem by using iterative methods for solving the local subproblems
approximately instead of providing exact solutions for each round of
communication. We study how using different iterative methods affect
the behavior of the algorithm and try to provide an appropriate
tradeoff between the amount of local computation and the required
amount of communication. Moreover, we use a subsample from the data on
each machine for calculating the most expensive part of the algorithm
which is to calculate the full gradient. Our experiments show that
using subsamples of data for this part does not have a big influence
on the overall behavior. Our experiments compare this modification to
the existing distributed gradient based methods such as SGD, and
variance reduced options like SVRG and SDCA and demonstrate its
practical value.

Samira's advisor is Prof. Risi Kondor

Login to the Computer Science Department website for details:
 https://www.cs.uchicago.edu/phd/ms_announcements#ssheikhi

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey            margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)               (773) 702-6011
The University of Chicago      http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=