[Colloquium] REMINDER: 4/2 Talks at TTIC: Bradly Stadie, Vector Institute

Thu Apr 2 10:09:19 CDT 2020

*When:*      Thursday, April 2nd at 11:00 am

*Where:*     TTIC <Virtually>   *https://zoom.us/j/644963077*
<https://zoom.us/j/644963077>

*Who: *        Bradly Stadie, Vector Institute

*Title:         *Weakly Supervised Reinforcement Learning

*Abstract: *Consider a scenario wherein we are given the entire ImageNet
training set without access to labels. What unsupervised features should we
learn from this unlabeled data? In particular, which features should we
learn that might be useful for downstream transfer to classification at
test time? This problem, known as weakly supervised learning, has seen
tremendous advancement in the past 18 months thanks to methods like
Contrastive Predictive Coding.

In this talk, we will consider the analogous problem in a reinforcement
learning (RL) setting. In the absence of any guidance, which tasks should
an agent pursue during an unsupervised play phase? Can an agent learn
unsupervised behaviors that are useful for downstream transfer to real
tasks at test time? While the de facto method for solving this class of
problems has long been curiosity methods, we argue that naively applying
this approach leads to agents that don’t learn semantically meaningful
behaviors. In other words, curiosity doesn’t provide the right inductive
bias for downstream transfer.

We propose two methods to fix curiosity's shortcomings. First, we leverage
the notion of self-imitation learning to derive an analogue to curiosity in
an abstract imitation space. This form of curiosity encourages agents to
learn diverse behaviors with semantic differences. Second, we introduce the
notion of goals to our agents. Even when we never tell agents which goals
are interesting, we see that simply introducing the existence of goals
provides enough inductive bias for agents to learn a meaningful
distribution of interesting unsupervised behaviors. These behaviors can be
readily leveraged to achieve a variety of difficult tasks at test time,
including extremely challenging robotic navigation, manipulation, and
locomotion tasks.

*Host:* Matthew Walter <mwalter at ttic.com>

**********************************************************************************************************
*Matthew Walter* is inviting you to a scheduled Zoom meeting.

Topic: Talks at TTIC: Bradly Stadie, Vector Institute
Time: Apr 2, 2020 11:00 AM Central Time (US and Canada)

*Join Zoom Meetinghttps://zoom.us/j/644963077 <https://zoom.us/j/644963077>*

Meeting ID: 644 963 077

One tap mobile
+14086380968,,644963077# US (San Jose)
+16468769923,,644963077# US (New York)

Dial by your location
        +1 408 638 0968 US (San Jose)
        +1 646 876 9923 US (New York)
        +1 669 900 6833 US (San Jose)
        +1 253 215 8782 US
        +1 301 715 8592 US
        +1 312 626 6799 US (Chicago)
        +1 346 248 7799 US (Houston)
Meeting ID: 644 963 077

Find your local number: https://zoom.us/u/akrIgOS2N

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL  60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*

On Wed, Apr 1, 2020 at 2:17 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*      Thursday, April 2nd at 11:00 am
>
>
>
> *Where:*     TTIC <Virtually>   *https://zoom.us/j/644963077*
> <https://zoom.us/j/644963077>
>
>
>
> *Who: *        Bradly Stadie, Vector Institute
>
>
>
> *Title:         *Weakly Supervised Reinforcement Learning
>
> *Abstract: *Consider a scenario wherein we are given the entire ImageNet
> training set without access to labels. What unsupervised features should we
> learn from this unlabeled data? In particular, which features should we
> learn that might be useful for downstream transfer to classification at
> test time? This problem, known as weakly supervised learning, has seen
> tremendous advancement in the past 18 months thanks to methods like
> Contrastive Predictive Coding.
>
>
> In this talk, we will consider the analogous problem in a reinforcement
> learning (RL) setting. In the absence of any guidance, which tasks should
> an agent pursue during an unsupervised play phase? Can an agent learn
> unsupervised behaviors that are useful for downstream transfer to real
> tasks at test time? While the de facto method for solving this class of
> problems has long been curiosity methods, we argue that naively applying
> this approach leads to agents that don’t learn semantically meaningful
> behaviors. In other words, curiosity doesn’t provide the right inductive
> bias for downstream transfer.
>
>
> We propose two methods to fix curiosity's shortcomings. First, we leverage
> the notion of self-imitation learning to derive an analogue to curiosity in
> an abstract imitation space. This form of curiosity encourages agents to
> learn diverse behaviors with semantic differences. Second, we introduce the
> notion of goals to our agents. Even when we never tell agents which goals
> are interesting, we see that simply introducing the existence of goals
> provides enough inductive bias for agents to learn a meaningful
> distribution of interesting unsupervised behaviors. These behaviors can be
> readily leveraged to achieve a variety of difficult tasks at test time,
> including extremely challenging robotic navigation, manipulation, and
> locomotion tasks.
>
> *Host:* Matthew Walter <mwalter at ttic.com>
>
>
>
> **********************************************************************************************************
> *Matthew Walter* is inviting you to a scheduled Zoom meeting.
>
> Topic: Talks at TTIC: Bradly Stadie, Vector Institute
> Time: Apr 2, 2020 11:00 AM Central Time (US and Canada)
>
>
> *Join Zoom Meetinghttps://zoom.us/j/644963077
> <https://zoom.us/j/644963077>*
>
> Meeting ID: 644 963 077
>
> One tap mobile
> +14086380968,,644963077# US (San Jose)
> +16468769923,,644963077# US (New York)
>
> Dial by your location
>         +1 408 638 0968 US (San Jose)
>         +1 646 876 9923 US (New York)
>         +1 669 900 6833 US (San Jose)
>         +1 253 215 8782 US
>         +1 301 715 8592 US
>         +1 312 626 6799 US (Chicago)
>         +1 346 248 7799 US (Houston)
> Meeting ID: 644 963 077
>
> Find your local number: https://zoom.us/u/akrIgOS2N
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL  60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Fri, Mar 27, 2020 at 4:36 PM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When:*      Thursday, April 2nd at 11:00 am
>>
>>
>>
>> *Where:*     TTIC <Virtually>   *https://zoom.us/j/644963077*
>> <https://zoom.us/j/644963077>
>>
>>
>>
>> *Who: *        Bradly Stadie, Vector Institute
>>
>>
>>
>> *Title:         *Weakly Supervised Reinforcement Learning
>>
>> *Abstract: *Consider a scenario wherein we are given the entire ImageNet
>> training set without access to labels. What unsupervised features should we
>> learn from this unlabeled data? In particular, which features should we
>> learn that might be useful for downstream transfer to classification at
>> test time? This problem, known as weakly supervised learning, has seen
>> tremendous advancement in the past 18 months thanks to methods like
>> Contrastive Predictive Coding.
>>
>>
>> In this talk, we will consider the analogous problem in a reinforcement
>> learning (RL) setting. In the absence of any guidance, which tasks should
>> an agent pursue during an unsupervised play phase? Can an agent learn
>> unsupervised behaviors that are useful for downstream transfer to real
>> tasks at test time? While the de facto method for solving this class of
>> problems has long been curiosity methods, we argue that naively applying
>> this approach leads to agents that don’t learn semantically meaningful
>> behaviors. In other words, curiosity doesn’t provide the right inductive
>> bias for downstream transfer.
>>
>>
>> We propose two methods to fix curiosity's shortcomings. First, we
>> leverage the notion of self-imitation learning to derive an analogue to
>> curiosity in an abstract imitation space. This form of curiosity encourages
>> agents to learn diverse behaviors with semantic differences. Second, we
>> introduce the notion of goals to our agents. Even when we never tell agents
>> which goals are interesting, we see that simply introducing the existence
>> of goals provides enough inductive bias for agents to learn a meaningful
>> distribution of interesting unsupervised behaviors. These behaviors can be
>> readily leveraged to achieve a variety of difficult tasks at test time,
>> including extremely challenging robotic navigation, manipulation, and
>> locomotion tasks.
>>
>> *Host:* Matthew Walter <mwalter at ttic.com>
>>
>>
>>
>> **********************************************************************************************************
>> *Matthew Walter* is inviting you to a scheduled Zoom meeting.
>>
>> Topic: Talks at TTIC: Bradly Stadie, Vector Institute
>> Time: Apr 2, 2020 11:00 AM Central Time (US and Canada)
>>
>>
>> *Join Zoom Meetinghttps://zoom.us/j/644963077
>> <https://zoom.us/j/644963077>*
>>
>> Meeting ID: 644 963 077
>>
>> One tap mobile
>> +14086380968,,644963077# US (San Jose)
>> +16468769923,,644963077# US (New York)
>>
>> Dial by your location
>>         +1 408 638 0968 US (San Jose)
>>         +1 646 876 9923 US (New York)
>>         +1 669 900 6833 US (San Jose)
>>         +1 253 215 8782 US
>>         +1 301 715 8592 US
>>         +1 312 626 6799 US (Chicago)
>>         +1 346 248 7799 US (Houston)
>> Meeting ID: 644 963 077
>>
>> Find your local number: https://zoom.us/u/akrIgOS2N
>>
>>
>>
>> Mary C. Marre
>> Faculty Administrative Support
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue*
>> *Room 517*
>> *Chicago, IL  60637*
>> *p:(773) 834-1757*
>> *f: (773) 357-6970*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20200402/1ffd57c0/attachment-0001.html>