[Theory] [TTIC Talks] TODAY 4/22 Research at TTIC: Bradly Stadie, TTIC
Brandie Jones
bjones at ttic.edu
Fri Apr 22 10:00:00 CDT 2022
*When:* Friday, April 22nd at *12:30pm CT*
*Where:* Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
*Virtually:* via zoom: *register in advance here*
<https://uchicagogroup.zoom.us/meeting/register/tJIvduChqjMiHNTEa-QzLrhMT1KJWHxWHqUh>
*Who:* Bradly Stadie, TTIC
*Title:* Frontiers of robotic intelligence
*Abstract: *When I joined TTIC two years ago, I set out to examine two
key problems in robotics: goal-based learning and sim2real transfer. As my
time at TTIC wraps up, I want to look back at the progress we’ve made in
these areas during the past two years.
In goal-based learning, we have introduced L3P, a new method that pairs
non-parametric graph search with low-level reinforcement learning.
Ultimately, this algorithm was bottlenecked by the difficulty in learning
to accomplish individual goals corresponding to different nodes in the
graph. While investigating this problem, I noticed that Hindsight
Experience Replay performs much worse with (0, 1) rewards than with (-1, 0)
rewards, even though this is a simple linear shift in reward scale! Trying
to understand why this happens took the better part of a year, and resulted
in some new theory that explains the optimal reward scale for goal-based RL
algorithms. We choose to name the resulting algorithm Hindsight Divergence
Minimization (HDM).
Meanwhile, in Sim2Real, we introduced Invariance Through Inference (ITI).
This algorithm has a deep connection to generative models, and allows us to
treat generated samples in a domain-independent fashion. As a result, our
RL algorithms can now leverage features from offline ancillary datasets, in
a fashion similar to recent successes in contrastive learning. Visualizing
PCA plots of feature dimensions before and after training, we see that ITI
is much better than prior Sim2Real methods at projecting samples from
different environments onto a shared latent structure. This enables our
algorithm to learn rich transferable features, and consequently we are able
to train a policy on a simulated Fetch robot and transfer it onto a real
world UR5.
***********************************************************************************************
*Presence at TTIC requires being fully vaccinated for COVID-19 or having
a TTIC or UChicago-approved exemption. Masks are optional in all common
areas. Full visitor guidance available at ttic.edu/visitors
<http://ttic.edu/visitors>.*
***********************************************************************************************
*Research at TTIC Seminar Series*
TTIC is hosting a weekly seminar series presenting the research currently
underway at the Institute. Every week a different TTIC faculty member will
present their research. The lectures are intended both for students
seeking research topics and advisors, and for the general TTIC and
University of Chicago communities interested in hearing what their
colleagues are up to.
*Brandie Jones *
*Administrative Assistant*
Toyota Technological Institute
6045 S. Kenwood Avenue
Chicago, IL 60637
www.ttic.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20220422/f619ce08/attachment.html>
More information about the Theory
mailing list