[Theory] REMINDER: 1/11 Talks at TTIC: Sitan Chen, MIT

Mary Marre mmarre at ttic.edu
Sun Jan 10 14:24:31 CST 2021


*When:*      Monday, January 11th at* 11:10 am CT*



*Where:*     Zoom Virtual Talk (*register in advance here
<https://uchicagogroup.zoom.us/webinar/register/WN_96FR8EkDRxS7o7c4oO4nxw>*)



*Who: *       Sitan Chen, MIT


*Title:        *Learning When Gradient Descent Fails

*Abstract: *What are the most powerful algorithms for taking labeled
examples and producing classifiers with good out-of-sample performance?
Stochastic gradient descent is the practitioner's tool of choice, and an
explosion of work in recent years has sought out theoretical justification
for the striking empirical successes of this approach. Yet one can also ask
an orthogonal question: are there natural learning problems where one can
do even better, and provably so?

In this talk, I will first overview my work on supervised learning in the
presence of corruptions, focusing on the classic problem of learning a
halfspace under Massart noise. This is a setting where an adversary can
arbitrarily corrupt a randomly chosen fraction of labels, and for which
minimizing any convex surrogate provably fails. We give a simple and
practical algorithm, which is also the first proper learning algorithm for
this problem to work without any assumptions on the input distribution.
Time permitting, I will mention some of our related results on generalized
linear models and contextual bandits.

In the second half, I will present progress on the well-studied problem of
learning a neural network over Gaussian space: given normally distributed
examples labeled by an unknown teacher network, output a network with high
test accuracy. All previous works on this pertain to depth-two networks and
make some assumption about the weights. We give the first algorithm that
works for arbitrary depths, makes no such assumptions, and runs in time
polynomial in the dimension for any bounded-size network. Notably, a large
class of algorithms, including gradient descent on an overparametrized
student network, provably cannot achieve such a guarantee.

Based on joint works with Adam Klivans, Frederic Koehler, Raghu Meka, Ankur
Moitra, and Morris Yau.

*Bio*: Sitan Chen is a fifth-year graduate student in theoretical computer
science at MIT advised by Ankur Moitra. His research focuses on designing
provable algorithms for fundamental learning problems, with a focus on
robustness, multi-index regression, and mixture models. He has been
supported by a Paul and Daisy Soros Fellowship and an MIT Presidential
Fellowship.

*Host: *Madhur Tulsiani <madhurt at ttic.edu>



Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*


On Tue, Jan 5, 2021 at 11:59 AM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*      Monday, January 11th at* 11:10 am CT*
>
>
>
> *Where:*     Zoom Virtual Talk (*register in advance here
> <https://uchicagogroup.zoom.us/webinar/register/WN_96FR8EkDRxS7o7c4oO4nxw>*
> )
>
>
>
> *Who: *       Sitan Chen, MIT
>
>
> *Title:        *Learning When Gradient Descent Fails
>
> *Abstract: *What are the most powerful algorithms for taking labeled
> examples and producing classifiers with good out-of-sample performance?
> Stochastic gradient descent is the practitioner's tool of choice, and an
> explosion of work in recent years has sought out theoretical justification
> for the striking empirical successes of this approach. Yet one can also ask
> an orthogonal question: are there natural learning problems where one can
> do even better, and provably so?
>
> In this talk, I will first overview my work on supervised learning in the
> presence of corruptions, focusing on the classic problem of learning a
> halfspace under Massart noise. This is a setting where an adversary can
> arbitrarily corrupt a randomly chosen fraction of labels, and for which
> minimizing any convex surrogate provably fails. We give a simple and
> practical algorithm, which is also the first proper learning algorithm for
> this problem to work without any assumptions on the input distribution.
> Time permitting, I will mention some of our related results on generalized
> linear models and contextual bandits.
>
> In the second half, I will present progress on the well-studied problem of
> learning a neural network over Gaussian space: given normally distributed
> examples labeled by an unknown teacher network, output a network with high
> test accuracy. All previous works on this pertain to depth-two networks and
> make some assumption about the weights. We give the first algorithm that
> works for arbitrary depths, makes no such assumptions, and runs in time
> polynomial in the dimension for any bounded-size network. Notably, a large
> class of algorithms, including gradient descent on an overparametrized
> student network, provably cannot achieve such a guarantee.
>
> Based on joint works with Adam Klivans, Frederic Koehler, Raghu Meka,
> Ankur Moitra, and Morris Yau.
>
> *Bio*: Sitan Chen is a fifth-year graduate student in theoretical
> computer science at MIT advised by Ankur Moitra. His research focuses on
> designing provable algorithms for fundamental learning problems, with a
> focus on robustness, multi-index regression, and mixture models. He has
> been supported by a Paul and Daisy Soros Fellowship and an MIT Presidential
> Fellowship.
>
> *Host: *Madhur Tulsiani <madhurt at ttic.edu>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL  60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20210110/8e8f1da0/attachment.html>


More information about the Theory mailing list