<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div class="gmail_default"><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"> Monday, January 13th at 11:00 am</font></font><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Where:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit">TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526</font></font></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> Devavrat</font></font></font> Shah, MIT</p></div><div class="gmail_default"><br></div><div class="gmail_default"><br></div><div class="gmail_default"><div><b>Title:</b> AlphaGo Zero, Monte Carlo Tree Search and Self-Play: Towards Theoretical Foundations</div><div><br></div><div><b>Abstract: </b>AlphaGo Zero (AGZ) by Silver et al (2017) introduced a new tabula rasa reinforcement learning algorithm that has achieved superhuman performance in the games of Go, Chess, and Shogi with no prior knowledge other than the rules of the game. Methodically, there are two key innovations: (a) use of Monte-Carlo Tree Search (MCTS) with Supervised Learning (SL) through well-engineered Neural Network for learning the policy, and (b) use of self-play for generating training examples. The remarkable empirical success naturally raises the following question: can we explain this empirical success theoretically, or is it simply ingenious engineering? </div><p class="MsoNormal" style="margin:0in 0in 0.0001pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br>In this talk, we shall argue that MCTS with expressive enough SL learns optimal policy at nearly minimax optimal rate. In the process, we correct a fundamental error in the well cited MCTS method and its proof. Interestingly enough, the AGZ had already utilized this correction in its implementation. However, our theoretical analysis suggests feasibility of further improvement over AGZ’s implementation. Our results hold for both traditional setting of infinite horizon Markov Decision Process (MDP) as well as the setting of self-play modeled as Robust MDP. Beyond two player games where AGZ has been effective, we show empirical success for a representative application from network scheduling.<br><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">This is primarily based on joint work with Qiaomin Xie (Cornell) and Zhi Xu (MIT).<br></p><div><br></div><div><b>Bio:</b><br></div><div>Devavrat Shah is a Professor with the department of Electrical Engineering and Computer Science and Director of Statistics and Data Science at the Massachusetts Institute of Technology. His current research interests are at the interface of Statistics, Decision Making and Social Data Processing. His work has been recognized through prize paper awards in Machine Learning, Electrical Engineering, Operations Research and Computer Science, as well as career prizes 2008 ACM Sigmetrics Rising Star Award, 2010 Erlang prize from the INFORMS Applied Probability Society and 2019 ACM Sigmetrics Test of Time Paper Award. He is a distinguished young alumni of his alma mater IIT Bombay. He has (co-)authored monographs “Gossip algorithms” and “Explaining the success of nearest neighbors in prediction’’. He co-founded machine learning start-up Celect, Inc. which is now part of Nike, Inc.<span style="font-family:"Avenir Roman",sans-serif;font-size:12pt;color:black;text-align:justify"> </span> <b><br></b></div><div><br></div></div><div class="gmail_default"><div><br></div><div><b>Host:</b> <a href="mailto:mcallester@ttic.edu" target="_blank">David McAllester</a></div><div><br></div><div><br></div><div><span style="font-size:12.8px;font-family:arial,helvetica,sans-serif">For more information on the </span><span style="font-size:12.8px;font-family:arial,helvetica,sans-serif">colloquium</span><span style="font-size:12.8px;font-family:arial,helvetica,sans-serif"> series or to subscribe to the mailing list, please see </span><a href="http://www.ttic.edu/colloquium.php" target="_blank" style="font-size:12.8px;font-family:arial,helvetica,sans-serif">http://www.ttic.edu/colloquium.php</a> <br></div><div><br></div><div><br></div></div></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><font face="arial, helvetica, sans-serif">Mary C. Marre</font><div><font face="arial, helvetica, sans-serif">Administrative Assistant</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">6045 S. Kenwood Avenue</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Room 517</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i></div><div><i><font face="arial, helvetica, sans-serif">p:(773) 834-1757</font></i></div><div><i><font face="arial, helvetica, sans-serif">f: (773) 357-6970</font></i></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jan 6, 2020 at 5:01 PM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-size:small"><div><p style="font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"> Monday, January 13th at 11:00 am</font></font><br></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Where:</b> </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit">TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 526</font></font></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"> </font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> Devavrat</font></font></font> Shah, MIT</p></div><div><br></div><div><br></div><div><div><b>Title:</b> AlphaGo Zero, Monte Carlo Tree Search and Self-Play: Towards Theoretical Foundations</div><div><br></div><div><b>Abstract: </b>AlphaGo Zero (AGZ) by Silver et al (2017) introduced a new tabula rasa reinforcement learning algorithm that has achieved superhuman performance in the games of Go, Chess, and Shogi with no prior knowledge other than the rules of the game. Methodically, there are two key innovations: (a) use of Monte-Carlo Tree Search (MCTS) with Supervised Learning (SL) through well-engineered Neural Network for learning the policy, and (b) use of self-play for generating training examples. The remarkable empirical success naturally raises the following question: can we explain this empirical success theoretically, or is it simply ingenious engineering? </div><p class="MsoNormal" style="margin:0in 0in 0.0001pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><br>In this talk, we shall argue that MCTS with expressive enough SL learns optimal policy at nearly minimax optimal rate. In the process, we correct a fundamental error in the well cited MCTS method and its proof. Interestingly enough, the AGZ had already utilized this correction in its implementation. However, our theoretical analysis suggests feasibility of further improvement over AGZ’s implementation. Our results hold for both traditional setting of infinite horizon Markov Decision Process (MDP) as well as the setting of self-play modeled as Robust MDP. Beyond two player games where AGZ has been effective, we show empirical success for a representative application from network scheduling.<br><br></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial">This is primarily based on joint work with Qiaomin Xie (Cornell) and Zhi Xu (MIT).<br></p><div><br></div><div><b>Bio:</b><br></div><div><span>Devavrat</span> Shah is a Professor with the department of Electrical Engineering and Computer Science and Director of Statistics and Data Science at the Massachusetts Institute of Technology. His current research interests are at the interface of Statistics, Decision Making and Social Data Processing. His work has been recognized through prize paper awards in Machine Learning, Electrical Engineering, Operations Research and Computer Science, as well as career prizes 2008 ACM Sigmetrics Rising Star Award, 2010 Erlang prize from the INFORMS Applied Probability Society and 2019 ACM Sigmetrics Test of Time Paper Award. He is a distinguished young alumni of his alma mater IIT Bombay. He has (co-)authored monographs “Gossip algorithms” and “Explaining the success of nearest neighbors in prediction’’. He co-founded machine learning start-up Celect, Inc. which is now part of Nike, Inc.<span style="font-family:"Avenir Roman",sans-serif;font-size:12pt;color:black;text-align:justify"> </span> <b><br></b></div><div><br></div></div><div><div><br></div><div><b>Host:</b> <a href="mailto:mcallester@ttic.edu" target="_blank">David McAllester</a></div><div><br></div><div><br></div><div><span style="font-size:12.8px;font-family:arial,helvetica,sans-serif">For more information on the </span><span style="font-size:12.8px;font-family:arial,helvetica,sans-serif">colloquium</span><span style="font-size:12.8px;font-family:arial,helvetica,sans-serif"> series or to subscribe to the mailing list, please see </span><a href="http://www.ttic.edu/colloquium.php" style="font-size:12.8px;font-family:arial,helvetica,sans-serif" target="_blank">http://www.ttic.edu/colloquium.php</a> <br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div><div><div dir="ltr"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><font face="arial, helvetica, sans-serif">Mary C. Marre</font><div><font face="arial, helvetica, sans-serif">Administrative Assistant</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">6045 S. Kenwood Avenue</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Room 517</font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL 60637</font></i></div><div><i><font face="arial, helvetica, sans-serif">p:(773) 834-1757</font></i></div><div><i><font face="arial, helvetica, sans-serif">f: (773) 357-6970</font></i></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>
</blockquote></div></div>