[Theory] TOMORROW: 3/10 Talks at TTIC: Tal Lancewicki, Tel Aviv University

Mary Marre via Theory theory at mailman.cs.uchicago.edu
Sun Mar 9 17:42:23 CDT 2025


*When:*        Monday, March 10, 2025 at* 2** pm CT   *


*Where:       *Talk will be given *live, in-person* at

                   TTIC, 6045 S. Kenwood Avenue

                   5th Floor, Room 530


*Virtually:*    *via panopto: **livestream*
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=814dffaf-5ad4-43c7-a281-b295017bd601>






*Who: *         Tal Lancewicki, Tel Aviv University

*Title:* Near-optimal Regret in Online MDPs with Aggregate Bandit Feedback

*Abstract: *The standard model of reinforcement learning (RL) assumes a
rich feedback loop, where for each step within the episode the agent
observes the loss in that state as feedback. While ideal, this is often not
the case in real-world applications. For example, in multi-turn dialogues
with an LLM, feedback is typically available only at the end of the entire
dialogue, not for each intermediate response. Similarly, in robotic
manipulation, feedback is often only available for the entire trajectory,
indicating whether the robot successfully completed its task, rather than
providing feedback at every step of the robot's movement.
In this talk, we will explore the challenge of learning Online Markov
decision processes (MDPs) with aggregate bandit feedback (a.k.a
full-bandit), where the agent observes only the total loss incurred over
the entire trajectory, rather than the individual losses at each
intermediate step. We will review prior algorithms and techniques for this
problem and introduce a new Policy Optimization algorithm and its analysis.

*Bio: *Tal is a final-year PhD student in the Department of Computer
Science at Tel Aviv University, advised by Prof. Yishay Mansour. During his
PhD, he has worked as a research intern at Amazon and the Bosch Center for
Artificial Intelligence. His main research interests include Reinforcement
Learning, Online Learning, and Multi-armed Bandits.

*Host: **Avrim Blum* <avrim at ttic.edu>




Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue, Rm 517*
*Chicago, IL  60637*
*773-834-1757*
*mmarre at ttic.edu <mmarre at ttic.edu>*


On Mon, Mar 3, 2025 at 6:16 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*        Monday, March 10, 2025 at* 2** pm** CT   *
>
>
> *Where:       *Talk will be given *live, in-person* at
>
>                    TTIC, 6045 S. Kenwood Avenue
>
>                    5th Floor, Room 530
>
>
> *Virtually:*    *via panopto: **livestream*
> <https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=814dffaf-5ad4-43c7-a281-b295017bd601>
>
>
>
>
>
>
> *Who: *         Tal Lancewicki, Tel Aviv University
>
> *Title:* Near-optimal Regret in Online MDPs with Aggregate Bandit Feedback
>
> *Abstract: *The standard model of reinforcement learning (RL) assumes a
> rich feedback loop, where for each step within the episode the agent
> observes the loss in that state as feedback. While ideal, this is often not
> the case in real-world applications. For example, in multi-turn dialogues
> with an LLM, feedback is typically available only at the end of the entire
> dialogue, not for each intermediate response. Similarly, in robotic
> manipulation, feedback is often only available for the entire trajectory,
> indicating whether the robot successfully completed its task, rather than
> providing feedback at every step of the robot's movement.
> In this talk, we will explore the challenge of learning Online Markov
> decision processes (MDPs) with aggregate bandit feedback (a.k.a
> full-bandit), where the agent observes only the total loss incurred over
> the entire trajectory, rather than the individual losses at each
> intermediate step. We will review prior algorithms and techniques for this
> problem and introduce a new Policy Optimization algorithm and its analysis.
>
> *Bio: *Tal is a final-year PhD student in the Department of Computer
> Science at Tel Aviv University, advised by Prof. Yishay Mansour. During his
> PhD, he has worked as a research intern at Amazon and the Bosch Center for
> Artificial Intelligence. His main research interests include Reinforcement
> Learning, Online Learning, and Multi-armed Bandits.
>
> *Host: **Avrim Blum* <avrim at ttic.edu>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue, Rm 517*
> *Chicago, IL  60637*
> *773-834-1757*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20250309/acea492c/attachment.html>


More information about the Theory mailing list