[Colloquium] Peshkin talk today (7/15, 3:30 at TTI)

Meridel Trimble mtrimble at tti-c.org
Tue Jul 15 08:26:13 CDT 2003


--------------------------------------------------------------------------------
-------- 

TOYOTA TECHNOLOGICAL INSTITUTE 
--------------------------------------------------------------------------------
-------- 

Date: Tuesday, July 15th, 2003

Time: 3:30 p.m.

Place: Toyota Technological Institute conference room (The Press Building - 
1427 E. 60th St.)

Speaker: Leonid Peshkin 
Harvard University

Title: “Reinforcement Learning by Policy Search”

Abstract:
Teaching is hard, criticizing is easy. This metaphor stands behind the concept 
of reinforcement learning as opposed to supervised learning. Reinforcement 
learning means learning a policy---a mapping of observations into actions---
based on feedback from the environment. Learning can be viewed as browsing a 
set of policies while evaluating them by trial through interaction with the 
environment. In this talk I briefly review the framework of reinforcement 
learning and present two highlights from my dissertation. First, I describe an 
algorithm which learns by ascending the gradient of expected cumulative 
reinforcement. I show what conditions enable experience re-use in learning. 
Building on statistical learning theory, I address the question of sufficient 
experience for uniform convergence of policy evaluation and obtain sample 
complexity bounds. Second, I demonstrate an application of the proposed 
algorithm to the complex domain of simulated adaptive packet routing in a 
telecommunication network. I conclude by suggesting how to build an intelligent 
agent and where to apply reinforcement learning to computer vision and natural 
language processing. 

Keywords: MDP, POMDP, policy search, gradient methods, reinforcement learning, 
adaptive systems, stochastic control, adaptive behavior.

If you wish to meet with the speaker, please send e-mail to Meridel 
(mtrimble at tti-c.org)



More information about the Colloquium mailing list