[Colloquium] Talk #2 by Leonid Peshkin (Tuesday, July 15th) at TTI
Meridel Trimble
mtrimble at tti-c.org
Tue Jul 8 13:37:02 CDT 2003
--------------------------------------------------------------------------------
--------
TOYOTA TECHNOLOGICAL INSTITUTE
--------------------------------------------------------------------------------
--------
Date: Tuesday, July 15th, 2003
Time: 3:30 p.m.
Place: Toyota Technological Institute conference room (The Press Building -
1427 E. 60th St.)
Speaker: Leonid Peshkin
Harvard University
Title: Reinforcement Learning by Policy Search
Abstract:
Teaching is hard, criticizing is easy. This metaphor stands behind the concept
of reinforcement learning as opposed to supervised learning. Reinforcement
learning means learning a policy---a mapping of observations into actions---
based on feedback from the environment. Learning can be viewed as browsing a
set of policies while evaluating them by trial through interaction with the
environment. In this talk I briefly review the framework of reinforcement
learning and present two highlights from my dissertation. First, I describe an
algorithm which learns by ascending the gradient of expected cumulative
reinforcement. I show what conditions enable experience re-use in learning.
Building on statistical learning theory, I address the question of sufficient
experience for uniform convergence of policy evaluation and obtain sample
complexity bounds. Second, I demonstrate an application of the proposed
algorithm to the complex domain of simulated adaptive packet routing in a
telecommunication network. I conclude by suggesting how to build an intelligent
agent and where to apply reinforcement learning to computer vision and natural
language processing.
Keywords: MDP, POMDP, policy search, gradient methods, reinforcement learning,
adaptive systems, stochastic control, adaptive behavior.
If you wish to meet with the speaker, please send e-mail to Meridel
(mtrimble at tti-c.org)
More information about the Colloquium
mailing list