[Colloquium] 10/25 Young Researcher Seminar Series: Noah Golowich, MIT

Mary Marre mmarre at ttic.edu
Wed Oct 18 20:34:02 CDT 2023


*When:*        Wednesday, October 25th, 2023 at* 10:30** a**m CT   *


*Where:       *Talk will be given *live, in-person* at

                   TTIC, 6045 S. Kenwood Avenue

                   5th Floor, Room 530


*Virtually:*   *via* Panopto *livestream*
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0147fe27-678e-4a39-b17f-b0a000173db7>


*Who: *         Noah Golowich, Massachusetts Institute of Technology


------------------------------
*Title:*          Exploring and Learning in Sparse Linear MDPs without
Computationally Intractable Oracles


*Abstract:* The key assumption underlying linear Markov Decision Processes
(MDPs) is that the learner has access to a known feature map *ϕ*(*x*,*a*) that
maps state-action pairs to *d*-dimensional vectors, and that the rewards
and transitions are linear functions in this representation. But where do
these features come from? In the absence of expert domain knowledge, a
tempting strategy is to use the ``kitchen sink" approach and hope that the
true features are included in a much larger set of potential features. In
this talk I will discuss our recent work revisiting linear MDPs from the
perspective of feature selection. In a *k*-sparse linear MDP, there is an
unknown subset *S*⊂[*d*] of size *k* containing all the relevant features,
and the goal is to learn a near-optimal policy in only
poly(*k*,log*d*) interactions
with the environment. Our main result is the first polynomial-time
algorithm for this problem. In contrast, earlier works either made
prohibitively strong assumptions that obviated the need for exploration, or
required solving computationally intractable optimization problems. As a
corollary of our main result, we give an algorithm for learning a
near-optimal policy in block MDPs whose decoding function is a low-depth
decision tree; the algorithm runs in quasi-polynomial time and takes a
polynomial number of samples. This can be seen as a reinforcement learning
analogue of classic results in computational learning theory. Furthermore,
it gives a natural model where improving the sample complexity via
representation learning is computationally feasible.

Based on joint work with Ankur Moitra and Dhruv Rohatgi. Paper link:
https://arxiv.org/abs/2309.09457.

*Bio:* Noah Golowich is a PhD Student at Massachusetts Institute of
Technology, advised by Constantinos Daskalakis and Ankur Moitra. His
research interests are in theoretical machine learning, with particular
focus on the connections between multi-agent learning, game theory, and
online learning, and on the theory of reinforcement learning. He is
supported by a Fannie & John Hertz Foundation Fellowship and an NSF
Graduate Fellowship, and was supported by an MIT Akamai Fellowship.
*Host: **Ali Vakilian* <vakilian at ttic.edu>

**********************************************************************************
 The *TTIC Young Researcher Seminar Series *(
http://www.ttic.edu/young-researcher.php) features talks by Ph.D. students
and postdocs whose research is of broad interest to the computer science
community. The series provides an opportunity for early-career researchers to
present recent work to and meet with students and faculty at TTIC and
nearby universities.




Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL  60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20231018/d8a3a0b1/attachment.html>


More information about the Colloquium mailing list