[Colloquium] [Talks at TTIC] 5/1 Young Researcher Seminar Series: Bradly Stadie, University of Toronto

Wed Apr 24 11:26:28 CDT 2019

When:     Wednesday, May 1st at *11:00 am*

Where:    TTIC, 6045 S Kenwood Avenue, 5th Floor, Room 526

*Who:*       Bradly Stadie, University of Toronto

*Title: *Learning From Sub-Optimal Data

*Abstract: *Learning algorithms typically assume their input data is good
natured. If one takes this input data and trains an agent with it, then the
agent should, given enough time and compute, eventually learn how to solve
the intended task. But this is not always a realistic expectation.
Sometimes, the data given to an agent is flawed or fails to fully convey
the correct problem. In other words, the input data is sub-optimal. In this
talk, we will discuss two recent advances for overcoming sub-optimal data.

First, we consider the problem of imitation learning from sub-optimal
demonstrations. In this setting, a robot receives failed or flawed
demonstrations of a task. It must learn to infer, and subsequently
complete, the intended task from only these failed demonstrations. Results
are presented on a variety of robotics problems such as door opening and
pick and place.

Second, we consider the problem of learning from sub-optimal reward
functions. Often, the reward functions provided to reinforcement learning
agents are derived by combining low level primitives such as agent position
and velocity. For example, the reward for a robot learning to walk might be
its forward velocity plus the position of its head. These reward functions
are first and foremost intended for human consumption, not the consumption
of an RL algorithm. Consequently, it might be possible to learn a better
intrinsic reward function that it is easier for the RL algorithm to
optimize against. We provide a new algorithm for learning such intrinsic
reward functions. Optimizing against these learned intrinsic rewards leads
to better overall agent performance than optimizing against the raw
hand-designed reward function. Crucially, these reward functions can be
learned on the fly without significant extra computational costs. Results
are presented on a variety of MuJoCo tasks and some hard robotics problems
such as block stacking.

*Host: *Greg Shakhnarovich <greg at ttic.edu>

-- 
*Alicia McClarin*
*Toyota Technological Institute at Chicago*
*6045 S. Kenwood Ave., **Office 510*
*Chicago, IL 60637*
*773-702-5370*
*www.ttic.edu* <http://www.ttic.edu/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20190424/f20a968f/attachment.html>