[Theory] NOW: 10/25 Young Researcher Seminar Series: Noah Golowich, MIT
Mary Marre
mmarre at ttic.edu
Wed Oct 25 10:25:10 CDT 2023
*When:* Wednesday, October 25th, 2023 at* 10:30 am CT *
*Where: *Talk will be given *live, in-person* at
TTIC, 6045 S. Kenwood Avenue
5th Floor, Room 530
*Virtually:* *via* Panopto *livestream*
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0147fe27-678e-4a39-b17f-b0a000173db7>
*Who: * Noah Golowich, Massachusetts Institute of Technology
------------------------------
*Title:* Exploring and Learning in Sparse Linear MDPs without
Computationally Intractable Oracles
*Abstract:* The key assumption underlying linear Markov Decision Processes
(MDPs) is that the learner has access to a known feature map *ϕ*(*x*,*a*) that
maps state-action pairs to *d*-dimensional vectors, and that the rewards
and transitions are linear functions in this representation. But where do
these features come from? In the absence of expert domain knowledge, a
tempting strategy is to use the ``kitchen sink" approach and hope that the
true features are included in a much larger set of potential features. In
this talk I will discuss our recent work revisiting linear MDPs from the
perspective of feature selection. In a *k*-sparse linear MDP, there is an
unknown subset *S*⊂[*d*] of size *k* containing all the relevant features,
and the goal is to learn a near-optimal policy in only
poly(*k*,log*d*) interactions
with the environment. Our main result is the first polynomial-time
algorithm for this problem. In contrast, earlier works either made
prohibitively strong assumptions that obviated the need for exploration, or
required solving computationally intractable optimization problems. As a
corollary of our main result, we give an algorithm for learning a
near-optimal policy in block MDPs whose decoding function is a low-depth
decision tree; the algorithm runs in quasi-polynomial time and takes a
polynomial number of samples. This can be seen as a reinforcement learning
analogue of classic results in computational learning theory. Furthermore,
it gives a natural model where improving the sample complexity via
representation learning is computationally feasible.
Based on joint work with Ankur Moitra and Dhruv Rohatgi. Paper link:
https://arxiv.org/abs/2309.09457.
*Bio:* Noah Golowich is a PhD Student at Massachusetts Institute of
Technology, advised by Constantinos Daskalakis and Ankur Moitra. His
research interests are in theoretical machine learning, with particular
focus on the connections between multi-agent learning, game theory, and
online learning, and on the theory of reinforcement learning. He is
supported by a Fannie & John Hertz Foundation Fellowship and an NSF
Graduate Fellowship, and was supported by an MIT Akamai Fellowship.
*Host: **Ali Vakilian* <vakilian at ttic.edu>
**********************************************************************************
The *TTIC Young Researcher Seminar Series *(
http://www.ttic.edu/young-researcher.php) features talks by Ph.D. students
and postdocs whose research is of broad interest to the computer science
community. The series provides an opportunity
for early-career researchers to present recent work to and meet with
students and faculty at TTIC and nearby universities.
Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL 60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*
On Wed, Oct 25, 2023 at 9:30 AM Mary Marre <mmarre at ttic.edu> wrote:
> *When:* Wednesday, October 25th, 2023 at* 10:30 am CT *
>
>
>
> *Where: *Talk will be given *live, in-person* at
>
> TTIC, 6045 S. Kenwood Avenue
>
> 5th Floor, Room 530
>
>
>
> *Virtually:* *via* Panopto *livestream*
> <https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0147fe27-678e-4a39-b17f-b0a000173db7>
>
>
>
> *Who: * Noah Golowich, Massachusetts Institute of Technology
>
>
> ------------------------------
>
> *Title:* Exploring and Learning in Sparse Linear MDPs without
> Computationally Intractable Oracles
>
>
> *Abstract:* The key assumption underlying linear Markov Decision
> Processes (MDPs) is that the learner has access to a known feature map *ϕ*
> (*x*,*a*) that maps state-action pairs to *d*-dimensional vectors, and
> that the rewards and transitions are linear functions in this
> representation. But where do these features come from? In the absence of
> expert domain knowledge, a tempting strategy is to use the ``kitchen sink"
> approach and hope that the true features are included in a much larger set
> of potential features. In this talk I will discuss our recent work
> revisiting linear MDPs from the perspective of feature selection. In a *k*-sparse
> linear MDP, there is an unknown subset *S*⊂[*d*] of size *k* containing
> all the relevant features, and the goal is to learn a near-optimal policy
> in only poly(*k*,log*d*) interactions with the environment. Our main
> result is the first polynomial-time algorithm for this problem. In
> contrast, earlier works either made prohibitively strong assumptions that
> obviated the need for exploration, or required solving computationally
> intractable optimization problems. As a corollary of our main result, we
> give an algorithm for learning a near-optimal policy in block MDPs whose
> decoding function is a low-depth decision tree; the algorithm runs in
> quasi-polynomial time and takes a polynomial number of samples. This can be
> seen as a reinforcement learning analogue of classic results in
> computational learning theory. Furthermore, it gives a natural model where
> improving the sample complexity via representation learning is
> computationally feasible.
>
> Based on joint work with Ankur Moitra and Dhruv Rohatgi. Paper link:
> https://arxiv.org/abs/2309.09457.
>
> *Bio:* Noah Golowich is a PhD Student at Massachusetts Institute of
> Technology, advised by Constantinos Daskalakis and Ankur Moitra. His
> research interests are in theoretical machine learning, with particular
> focus on the connections between multi-agent learning, game theory, and
> online learning, and on the theory of reinforcement learning. He is
> supported by a Fannie & John Hertz Foundation Fellowship and an NSF
> Graduate Fellowship, and was supported by an MIT Akamai Fellowship.
>
> *Host: **Ali Vakilian* <vakilian at ttic.edu>
>
>
>
>
> **********************************************************************************
>
> The *TTIC Young Researcher Seminar Series *(
> http://www.ttic.edu/young-researcher.php) features talks by Ph.D.
> students and postdocs whose research is of broad interest to the computer
> science community. The series provides an opportunity
> for early-career researchers to present recent work to and meet with
> students and faculty at TTIC and nearby universities.
>
>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue, Rm 517*
> *Chicago, IL 60637*
> *773-834-1757*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Tue, Oct 24, 2023 at 3:44 PM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When:* Wednesday, October 25th, 2023 at* 10:30 am CT *
>>
>>
>>
>> *Where: *Talk will be given *live, in-person* at
>>
>> TTIC, 6045 S. Kenwood Avenue
>>
>> 5th Floor, Room 530
>>
>>
>>
>> *Virtually:* *via* Panopto *livestream*
>> <https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0147fe27-678e-4a39-b17f-b0a000173db7>
>>
>>
>>
>> *Who: * Noah Golowich, Massachusetts Institute of Technology
>>
>>
>> ------------------------------
>>
>> *Title:* Exploring and Learning in Sparse Linear MDPs without
>> Computationally Intractable Oracles
>>
>>
>> * Abstract:* The key assumption underlying linear Markov Decision
>> Processes (MDPs) is that the learner has access to a known feature map
>> *ϕ*(*x*,*a*) that maps state-action pairs to *d*-dimensional vectors,
>> and that the rewards and transitions are linear functions in this
>> representation. But where do these features come from? In the absence of
>> expert domain knowledge, a tempting strategy is to use the ``kitchen sink"
>> approach and hope that the true features are included in a much larger set
>> of potential features. In this talk I will discuss our recent work
>> revisiting linear MDPs from the perspective of feature selection. In a
>> *k*-sparse linear MDP, there is an unknown subset *S*⊂[*d*] of size *k* containing
>> all the relevant features, and the goal is to learn a near-optimal policy
>> in only poly(*k*,log*d*) interactions with the environment. Our main
>> result is the first polynomial-time algorithm for this problem. In
>> contrast, earlier works either made prohibitively strong assumptions that
>> obviated the need for exploration, or required solving computationally
>> intractable optimization problems. As a corollary of our main result, we
>> give an algorithm for learning a near-optimal policy in block MDPs whose
>> decoding function is a low-depth decision tree; the algorithm runs in
>> quasi-polynomial time and takes a polynomial number of samples. This can be
>> seen as a reinforcement learning analogue of classic results in
>> computational learning theory. Furthermore, it gives a natural model where
>> improving the sample complexity via representation learning is
>> computationally feasible.
>>
>> Based on joint work with Ankur Moitra and Dhruv Rohatgi. Paper link:
>> https://arxiv.org/abs/2309.09457.
>>
>> *Bio:* Noah Golowich is a PhD Student at Massachusetts Institute of
>> Technology, advised by Constantinos Daskalakis and Ankur Moitra. His
>> research interests are in theoretical machine learning, with particular
>> focus on the connections between multi-agent learning, game theory, and
>> online learning, and on the theory of reinforcement learning. He is
>> supported by a Fannie & John Hertz Foundation Fellowship and an NSF
>> Graduate Fellowship, and was supported by an MIT Akamai Fellowship.
>>
>> *Host: **Ali Vakilian* <vakilian at ttic.edu>
>>
>>
>>
>>
>> **********************************************************************************
>>
>> The *TTIC Young Researcher Seminar Series *(
>> http://www.ttic.edu/young-researcher.php) features talks by Ph.D.
>> students and postdocs whose research is of broad interest to the computer
>> science community. The series provides an opportunity
>> for early-career researchers to present recent work to and meet with
>> students and faculty at TTIC and nearby universities.
>>
>>
>>
>>
>> Mary C. Marre
>> Faculty Administrative Support
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue, Rm 517*
>> *Chicago, IL 60637*
>> *773-834-1757*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>>
>> On Wed, Oct 18, 2023 at 8:34 PM Mary Marre <mmarre at ttic.edu> wrote:
>>
>>> *When:* Wednesday, October 25th, 2023 at* 10:30** a**m CT *
>>>
>>>
>>> *Where: *Talk will be given *live, in-person* at
>>>
>>> TTIC, 6045 S. Kenwood Avenue
>>>
>>> 5th Floor, Room 530
>>>
>>>
>>> *Virtually:* *via* Panopto *livestream*
>>> <https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0147fe27-678e-4a39-b17f-b0a000173db7>
>>>
>>>
>>> *Who: * Noah Golowich, Massachusetts Institute of Technology
>>>
>>>
>>> ------------------------------
>>> *Title:* Exploring and Learning in Sparse Linear MDPs without
>>> Computationally Intractable Oracles
>>>
>>>
>>> *Abstract:* The key assumption underlying linear Markov Decision
>>> Processes (MDPs) is that the learner has access to a known feature map
>>> *ϕ*(*x*,*a*) that maps state-action pairs to *d*-dimensional vectors,
>>> and that the rewards and transitions are linear functions in this
>>> representation. But where do these features come from? In the absence of
>>> expert domain knowledge, a tempting strategy is to use the ``kitchen sink"
>>> approach and hope that the true features are included in a much larger set
>>> of potential features. In this talk I will discuss our recent work
>>> revisiting linear MDPs from the perspective of feature selection. In a
>>> *k*-sparse linear MDP, there is an unknown subset *S*⊂[*d*] of size *k* containing
>>> all the relevant features, and the goal is to learn a near-optimal policy
>>> in only poly(*k*,log*d*) interactions with the environment. Our main
>>> result is the first polynomial-time algorithm for this problem. In
>>> contrast, earlier works either made prohibitively strong assumptions that
>>> obviated the need for exploration, or required solving computationally
>>> intractable optimization problems. As a corollary of our main result, we
>>> give an algorithm for learning a near-optimal policy in block MDPs whose
>>> decoding function is a low-depth decision tree; the algorithm runs in
>>> quasi-polynomial time and takes a polynomial number of samples. This can be
>>> seen as a reinforcement learning analogue of classic results in
>>> computational learning theory. Furthermore, it gives a natural model where
>>> improving the sample complexity via representation learning is
>>> computationally feasible.
>>>
>>> Based on joint work with Ankur Moitra and Dhruv Rohatgi. Paper link:
>>> https://arxiv.org/abs/2309.09457.
>>>
>>> *Bio:* Noah Golowich is a PhD Student at Massachusetts Institute of
>>> Technology, advised by Constantinos Daskalakis and Ankur Moitra. His
>>> research interests are in theoretical machine learning, with particular
>>> focus on the connections between multi-agent learning, game theory, and
>>> online learning, and on the theory of reinforcement learning. He is
>>> supported by a Fannie & John Hertz Foundation Fellowship and an NSF
>>> Graduate Fellowship, and was supported by an MIT Akamai Fellowship.
>>> *Host: **Ali Vakilian* <vakilian at ttic.edu>
>>>
>>>
>>> **********************************************************************************
>>> The *TTIC Young Researcher Seminar Series *(
>>> http://www.ttic.edu/young-researcher.php) features talks by Ph.D.
>>> students and postdocs whose research is of broad interest to the
>>> computer science community. The series provides an opportunity
>>> for early-career researchers to present recent work to and meet with
>>> students and faculty at TTIC and nearby universities.
>>>
>>>
>>>
>>>
>>> Mary C. Marre
>>> Faculty Administrative Support
>>> *Toyota Technological Institute*
>>> *6045 S. Kenwood Avenue, Rm 517*
>>> *Chicago, IL 60637*
>>> *773-834-1757*
>>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20231025/347df22f/attachment-0001.html>
More information about the Theory
mailing list