[Colloquium] 4/9 Talks at TTIC: Dmitrii Ostrovskii, Inria Paris

Alicia McClarin amcclarin at ttic.edu
Wed Apr 3 10:33:07 CDT 2019


*When: *   Tuesday, April 9th at 11:00 AM

Where:    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

Who:       Dmitrii Ostrovskii, Inria Paris

*Title:*       On Algorithmic Efficiency and Statistical Optimality in
Empirical Risk                 Minimization

*Abstract:* In this talk we will view the standard empirical risk
minimization (ERM) from two different perspectives, focusing first on
computationally efficient procedures to find approximate minimizers, and
then on the statistical optimality of ERM itself as an idealized procedure.

In the first part of the talk, I will show how stochastic mirror descent
can be exploited to efficiently train l1-regularized multi-class linear
classifiers, assuming a large number $k$ of classes, sample size $n$, and
number of features $d$. We consider the class of losses with explicit dual
representation, focusing on the multiclass hinge and logistic losses. I
will start with a common observation that regularized ERM with such losses
can be recast as a convex-concave saddle-point problem (CCSPP) with a
quasi-bilinear objective and specific geometry of the primal and dual
feasible sets. Such problems can be solved via primal-dual algorithms such
as mirror descent and mirror prox, and in the case of quasi-bilinear CCSPPs
both these algorithms can be accelerated via randomization. However, it has
remained unclear how to choose the proximal geometry in these algorithms to
control the additional loss of accuracy due to randomization. We fill in
this gap, showing that appropriately randomized mirror descent with a
particular choice of geometry converges essentially at the same speed (in
terms of the number of iterations) as its deterministic counterpart, while
having a drastically reduced iteration cost. In particular, we achieve the
*sublinear* iteration cost $O(d+n+k)$ for the multiclass hinge loss, and I
will also provide insight on how this could be extended to the multiclass
logistic loss.


The second part of the talk will be focused on the statistical optimality
of ERM. The inspiration comes from the classical statistical theory: the
rate $O(d/n)$ for the excess risk of ERM (or M-estimators in the
statistical terminology) holds under very general conditions when $d$ is
fixed, and $n$ tends to infinity, and is optimal in this regime. The
learning problem in this regime is fully characterized by the local
quadratic approximation of the excess risk in the true minimizer. However,
extending this result to the finite-sample regime is non-trivial, as it
requires to localize the estimator to the right neighborhood of the true
optimum, for which one has to impose some *global* conditions on the excess
risk. We will see that such global assumptions can be avoided when using
self-concordant cost functions (in the sense of Nesterov and Nemirovski
(1994) or Bach (2010)) which includes the logistic loss and some robust
regression losses. Combining self-concordance with some covariance
estimation results allows to localize the M-estimator to the
constant-radius Dikin ellipsoid of the true minimizer, which implies
$O(d/n)$ rate already when $n \gg d$. We will then briefly consider l1- and
l2-penalized cases.

*Host: *Nati Srebro <nati at ttic.edu>

-- 
*Alicia McClarin*
*Toyota Technological Institute at Chicago*
*6045 S. Kenwood Ave., **Office 510*
*Chicago, IL 60637*
*773-702-5370*
*www.ttic.edu* <http://www.ttic.edu/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20190403/f5fd42af/attachment-0001.html>


More information about the Colloquium mailing list