[Colloquium] REMINDER: 2/11 Talks at TTIC: Stephen Mussmann, Stanford University

Thu Feb 11 10:08:43 CST 2021

*When:*      Thursday, February 11th at* 11:10 am CT*

*Where:*     Zoom Virtual Talk (*register in advance here
<https://uchicagogroup.zoom.us/webinar/register/WN_I0AksS6uQc6AMDukDKcKJg>*)

*Who: *       Stephen Mussmann, Stanford University

*Title: *Bridging Theory and Practice in Active Learning

*Abstract: *The size of modern datasets has powered many recent successes
in machine learning, but also has made data collection expensive. Adaptive
data collection, or active learning, can require fewer labels both in
theory and practice. Unfortunately, in the active learning literature,
theory and practice are quite distinct and disconnected. In this talk, I
work to bridge this gap by presenting three projects: one connecting theory
and practice, one practical, and one theoretical. In the first project, we
show empirically and theoretically that the data efficiency of uncertainty
sampling is inversely proportional to the error of the optimal classifier
for logistic regression. In the second project, we then use this insight to
apply uncertainty sampling to an extremely imbalanced pairwise
classification task, paraphrase detection, where we achieve a fourteen-fold
reduction in the amount of data required to reach a particular performance
level. Finally, in the third project, for a classic formulation of active
learning, decision trees, we provide a tight analysis of the greedy
algorithm with a uniform prior, resolving a 20-year-old conjecture and
yielding a subexponential time algorithm.
*Bio:*
Steve Mussmann is a PhD candidate in Computer Science at Stanford
University advised by Percy Liang and in his sixth and final year. His
research goal is to develop and understand methods to adaptively collect
data more efficiently. He received his B.S. from Purdue University in 2015
and was supported during his PhD by a 2016 NSF GRFP fellowship.

*Host:* *Nathan Srebro* <nati at ttic.edu>

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*

On Wed, Feb 10, 2021 at 3:34 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*      Thursday, February 11th at* 11:10 am CT*
>
>
>
> *Where:*     Zoom Virtual Talk (*register in advance here
> <https://uchicagogroup.zoom.us/webinar/register/WN_I0AksS6uQc6AMDukDKcKJg>*
> )
>
>
>
> *Who: *       Stephen Mussmann, Stanford University
>
> *Title: *Bridging Theory and Practice in Active Learning
>
>
> *Abstract:*The size of modern datasets has powered many recent successes
> in machine learning, but also has made data collection expensive. Adaptive
> data collection, or active learning, can require fewer labels both in
> theory and practice. Unfortunately, in the active learning literature,
> theory and practice are quite distinct and disconnected. In this talk, I
> work to bridge this gap by presenting three projects: one connecting theory
> and practice, one practical, and one theoretical. In the first project, we
> show empirically and theoretically that the data efficiency of uncertainty
> sampling is inversely proportional to the error of the optimal classifier
> for logistic regression. In the second project, we then use this insight to
> apply uncertainty sampling to an extremely imbalanced pairwise
> classification task, paraphrase detection, where we achieve a fourteen-fold
> reduction in the amount of data required to reach a particular performance
> level. Finally, in the third project, for a classic formulation of active
> learning, decision trees, we provide a tight analysis of the greedy
> algorithm with a uniform prior, resolving a 20-year-old conjecture and
> yielding a subexponential time algorithm.
> *Bio:*
> Steve Mussmann is a PhD candidate in Computer Science at Stanford
> University advised by Percy Liang and in his sixth and final year. His
> research goal is to develop and understand methods to adaptively collect
> data more efficiently. He received his B.S. from Purdue University in 2015
> and was supported during his PhD by a 2016 NSF GRFP fellowship.
>
> *Host:* *Nathan Srebro* <nati at ttic.edu>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL  60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Thu, Feb 4, 2021 at 7:33 PM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When:*      Thursday, February 11th at* 11:10 am CT*
>>
>>
>>
>> *Where:*     Zoom Virtual Talk (*register in advance here
>> <https://uchicagogroup.zoom.us/webinar/register/WN_I0AksS6uQc6AMDukDKcKJg>*
>> )
>>
>>
>>
>> *Who: *       Stephen Mussmann, Stanford University
>>
>> *Title: *Bridging Theory and Practice in Active Learning
>>
>>
>> *Abstract:*The size of modern datasets has powered many recent successes
>> in machine learning, but also has made data collection expensive. Adaptive
>> data collection, or active learning, can require fewer labels both in
>> theory and practice. Unfortunately, in the active learning literature,
>> theory and practice are quite distinct and disconnected. In this talk, I
>> work to bridge this gap by presenting three projects: one connecting theory
>> and practice, one practical, and one theoretical. In the first project, we
>> show empirically and theoretically that the data efficiency of uncertainty
>> sampling is inversely proportional to the error of the optimal classifier
>> for logistic regression. In the second project, we then use this insight to
>> apply uncertainty sampling to an extremely imbalanced pairwise
>> classification task, paraphrase detection, where we achieve a fourteen-fold
>> reduction in the amount of data required to reach a particular performance
>> level. Finally, in the third project, for a classic formulation of active
>> learning, decision trees, we provide a tight analysis of the greedy
>> algorithm with a uniform prior, resolving a 20-year-old conjecture and
>> yielding a subexponential time algorithm.
>> *Bio:*
>> Steve Mussmann is a PhD candidate in Computer Science at Stanford
>> University advised by Percy Liang and in his sixth and final year. His
>> research goal is to develop and understand methods to adaptively collect
>> data more efficiently. He received his B.S. from Purdue University in 2015
>> and was supported during his PhD by a 2016 NSF GRFP fellowship.
>>
>> *Host:* *Nathan Srebro* <nati at ttic.edu>
>>
>>
>>
>> Mary C. Marre
>> Faculty Administrative Support
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue*
>> *Room 517*
>> *Chicago, IL  60637*
>> *p:(773) 834-1757*
>> *f: (773) 357-6970*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20210211/1a53d6bd/attachment-0001.html>