[Theory] 10/25 Young Researcher Seminar Series: Noah Golowich, MIT
Mary Marre
mmarre at ttic.edu
Wed Oct 18 20:34:02 CDT 2023
*When:* Wednesday, October 25th, 2023 at* 10:30** a**m CT *
*Where: *Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
*Virtually:* *via* Panopto *livestream*
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0147fe27-678e-4a39-b17f-b0a000173db7>
*Who: * Noah Golowich, Massachusetts Institute of Technology
------------------------------
*Title:* Exploring and Learning in Sparse Linear MDPs without
Computationally Intractable Oracles
*Abstract:* The key assumption underlying linear Markov Decision Processes
(MDPs) is that the learner has access to a known feature map *ϕ*(*x*,*a*) that
maps state-action pairs to *d*-dimensional vectors, and that the rewards
and transitions are linear functions in this representation. But where do
these features come from? In the absence of expert domain knowledge, a
tempting strategy is to use the ``kitchen sink" approach and hope that the
true features are included in a much larger set of potential features. In
this talk I will discuss our recent work revisiting linear MDPs from the
perspective of feature selection. In a *k*-sparse linear MDP, there is an
unknown subset *S*⊂[*d*] of size *k* containing all the relevant features,
and the goal is to learn a near-optimal policy in only
poly(*k*,log*d*) interactions
with the environment. Our main result is the first polynomial-time
algorithm for this problem. In contrast, earlier works either made
prohibitively strong assumptions that obviated the need for exploration, or
required solving computationally intractable optimization problems. As a
corollary of our main result, we give an algorithm for learning a
near-optimal policy in block MDPs whose decoding function is a low-depth
decision tree; the algorithm runs in quasi-polynomial time and takes a
polynomial number of samples. This can be seen as a reinforcement learning
analogue of classic results in computational learning theory. Furthermore,
it gives a natural model where improving the sample complexity via
representation learning is computationally feasible.
Based on joint work with Ankur Moitra and Dhruv Rohatgi. Paper link:
https://arxiv.org/abs/2309.09457.
*Bio:* Noah Golowich is a PhD Student at Massachusetts Institute of
Technology, advised by Constantinos Daskalakis and Ankur Moitra. His
research interests are in theoretical machine learning, with particular
focus on the connections between multi-agent learning, game theory, and
online learning, and on the theory of reinforcement learning. He is
supported by a Fannie & John Hertz Foundation Fellowship and an NSF
Graduate Fellowship, and was supported by an MIT Akamai Fellowship.
*Host: **Ali Vakilian* <vakilian at ttic.edu>
**********************************************************************************
The *TTIC Young Researcher Seminar Series *(
http://www.ttic.edu/young-researcher.php) features talks by Ph.D. students
and postdocs whose research is of broad interest to the computer science
community. The series provides an opportunity for early-career researchers to
present recent work to and meet with students and faculty at TTIC and
nearby universities.
Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL 60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20231018/d8a3a0b1/attachment-0001.html>
More information about the Theory
mailing list