[Theory] REMINDER: 1/13 TTIC Colloquium: Devavrat Shah, MIT
Mary Marre
mmarre at ttic.edu
Mon Jan 13 10:16:24 CST 2020
*When:* Monday, January 13th at 11:00 am
*Where:* TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
*Who: * Devavrat Shah, MIT
*Title:* AlphaGo Zero, Monte Carlo Tree Search and Self-Play: Towards
Theoretical Foundations
*Abstract: *AlphaGo Zero (AGZ) by Silver et al (2017) introduced a new
tabula rasa reinforcement learning algorithm that has achieved superhuman
performance in the games of Go, Chess, and Shogi with no prior knowledge
other than the rules of the game. Methodically, there are two key
innovations: (a) use of Monte-Carlo Tree Search (MCTS) with Supervised
Learning (SL) through well-engineered Neural Network for learning the
policy, and (b) use of self-play for generating training examples. The
remarkable empirical success naturally raises the following question: can
we explain this empirical success theoretically, or is it simply ingenious
engineering?
In this talk, we shall argue that MCTS with expressive enough SL learns
optimal policy at nearly minimax optimal rate. In the process, we correct a
fundamental error in the well cited MCTS method and its proof.
Interestingly enough, the AGZ had already utilized this correction in its
implementation. However, our theoretical analysis suggests feasibility of
further improvement over AGZ’s implementation. Our results hold for both
traditional setting of infinite horizon Markov Decision Process (MDP) as
well as the setting of self-play modeled as Robust MDP. Beyond two player
games where AGZ has been effective, we show empirical success for a
representative application from network scheduling.
This is primarily based on joint work with Qiaomin Xie (Cornell) and Zhi Xu
(MIT).
*Bio:*
Devavrat Shah is a Professor with the department of Electrical Engineering
and Computer Science and Director of Statistics and Data Science at the
Massachusetts Institute of Technology. His current research interests are
at the interface of Statistics, Decision Making and Social Data Processing.
His work has been recognized through prize paper awards in Machine
Learning, Electrical Engineering, Operations Research and Computer Science,
as well as career prizes 2008 ACM Sigmetrics Rising Star Award, 2010 Erlang
prize from the INFORMS Applied Probability Society and 2019 ACM Sigmetrics
Test of Time Paper Award. He is a distinguished young alumni of his alma
mater IIT Bombay. He has (co-)authored monographs “Gossip algorithms” and
“Explaining the success of nearest neighbors in prediction’’. He co-founded
machine learning start-up Celect, Inc. which is now part of Nike, Inc.
*Host:* David McAllester <mcallester at ttic.edu>
For more information on the colloquium series or to subscribe to the
mailing list, please see http://www.ttic.edu/colloquium.php
Mary C. Marre
Administrative Assistant
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Room 517*
*Chicago, IL 60637*
*p:(773) 834-1757*
*f: (773) 357-6970*
*mmarre at ttic.edu <mmarre at ttic.edu>*
On Sun, Jan 12, 2020 at 6:14 PM Mary Marre <mmarre at ttic.edu> wrote:
> *When:* Monday, January 13th at 11:00 am
>
>
>
> *Where:* TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>
>
>
> *Who: * Devavrat Shah, MIT
>
>
> *Title:* AlphaGo Zero, Monte Carlo Tree Search and Self-Play: Towards
> Theoretical Foundations
>
> *Abstract: *AlphaGo Zero (AGZ) by Silver et al (2017) introduced a new
> tabula rasa reinforcement learning algorithm that has achieved superhuman
> performance in the games of Go, Chess, and Shogi with no prior knowledge
> other than the rules of the game. Methodically, there are two key
> innovations: (a) use of Monte-Carlo Tree Search (MCTS) with Supervised
> Learning (SL) through well-engineered Neural Network for learning the
> policy, and (b) use of self-play for generating training examples. The
> remarkable empirical success naturally raises the following question: can
> we explain this empirical success theoretically, or is it simply ingenious
> engineering?
>
>
> In this talk, we shall argue that MCTS with expressive enough SL learns
> optimal policy at nearly minimax optimal rate. In the process, we correct a
> fundamental error in the well cited MCTS method and its proof.
> Interestingly enough, the AGZ had already utilized this correction in its
> implementation. However, our theoretical analysis suggests feasibility of
> further improvement over AGZ’s implementation. Our results hold for both
> traditional setting of infinite horizon Markov Decision Process (MDP) as
> well as the setting of self-play modeled as Robust MDP. Beyond two player
> games where AGZ has been effective, we show empirical success for a
> representative application from network scheduling.
>
> This is primarily based on joint work with Qiaomin Xie (Cornell) and Zhi
> Xu (MIT).
>
> *Bio:*
> Devavrat Shah is a Professor with the department of Electrical Engineering
> and Computer Science and Director of Statistics and Data Science at the
> Massachusetts Institute of Technology. His current research interests are
> at the interface of Statistics, Decision Making and Social Data Processing.
> His work has been recognized through prize paper awards in Machine
> Learning, Electrical Engineering, Operations Research and Computer Science,
> as well as career prizes 2008 ACM Sigmetrics Rising Star Award, 2010 Erlang
> prize from the INFORMS Applied Probability Society and 2019 ACM Sigmetrics
> Test of Time Paper Award. He is a distinguished young alumni of his alma
> mater IIT Bombay. He has (co-)authored monographs “Gossip algorithms” and
> “Explaining the success of nearest neighbors in prediction’’. He co-founded
> machine learning start-up Celect, Inc. which is now part of Nike, Inc.
>
>
> *Host:* David McAllester <mcallester at ttic.edu>
>
>
> For more information on the colloquium series or to subscribe to the
> mailing list, please see http://www.ttic.edu/colloquium.php
>
>
> Mary C. Marre
> Administrative Assistant
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Room 517*
> *Chicago, IL 60637*
> *p:(773) 834-1757*
> *f: (773) 357-6970*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Mon, Jan 6, 2020 at 5:01 PM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When:* Monday, January 13th at 11:00 am
>>
>>
>>
>> *Where:* TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526
>>
>>
>>
>> *Who: * Devavrat Shah, MIT
>>
>>
>> *Title:* AlphaGo Zero, Monte Carlo Tree Search and Self-Play: Towards
>> Theoretical Foundations
>>
>> *Abstract: *AlphaGo Zero (AGZ) by Silver et al (2017) introduced a new
>> tabula rasa reinforcement learning algorithm that has achieved superhuman
>> performance in the games of Go, Chess, and Shogi with no prior knowledge
>> other than the rules of the game. Methodically, there are two key
>> innovations: (a) use of Monte-Carlo Tree Search (MCTS) with Supervised
>> Learning (SL) through well-engineered Neural Network for learning the
>> policy, and (b) use of self-play for generating training examples. The
>> remarkable empirical success naturally raises the following question: can
>> we explain this empirical success theoretically, or is it simply ingenious
>> engineering?
>>
>>
>> In this talk, we shall argue that MCTS with expressive enough SL learns
>> optimal policy at nearly minimax optimal rate. In the process, we correct a
>> fundamental error in the well cited MCTS method and its proof.
>> Interestingly enough, the AGZ had already utilized this correction in its
>> implementation. However, our theoretical analysis suggests feasibility of
>> further improvement over AGZ’s implementation. Our results hold for both
>> traditional setting of infinite horizon Markov Decision Process (MDP) as
>> well as the setting of self-play modeled as Robust MDP. Beyond two player
>> games where AGZ has been effective, we show empirical success for a
>> representative application from network scheduling.
>>
>> This is primarily based on joint work with Qiaomin Xie (Cornell) and Zhi
>> Xu (MIT).
>>
>> *Bio:*
>> Devavrat Shah is a Professor with the department of Electrical
>> Engineering and Computer Science and Director of Statistics and Data
>> Science at the Massachusetts Institute of Technology. His current research
>> interests are at the interface of Statistics, Decision Making and Social
>> Data Processing. His work has been recognized through prize paper awards in
>> Machine Learning, Electrical Engineering, Operations Research and Computer
>> Science, as well as career prizes 2008 ACM Sigmetrics Rising Star Award,
>> 2010 Erlang prize from the INFORMS Applied Probability Society and 2019 ACM
>> Sigmetrics Test of Time Paper Award. He is a distinguished young alumni of
>> his alma mater IIT Bombay. He has (co-)authored monographs “Gossip
>> algorithms” and “Explaining the success of nearest neighbors in
>> prediction’’. He co-founded machine learning start-up Celect, Inc. which is
>> now part of Nike, Inc.
>>
>>
>> *Host:* David McAllester <mcallester at ttic.edu>
>>
>>
>> For more information on the colloquium series or to subscribe to the
>> mailing list, please see http://www.ttic.edu/colloquium.php
>>
>>
>>
>>
>> Mary C. Marre
>> Administrative Assistant
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue*
>> *Room 517*
>> *Chicago, IL 60637*
>> *p:(773) 834-1757*
>> *f: (773) 357-6970*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20200113/42ba7085/attachment-0001.html>
More information about the Theory
mailing list