<div dir="ltr"><div dir="ltr"><div><div class="gmail_default" style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)">    Mon</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, March 10,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <span style="background-color:rgb(255,255,0)"><u>2</u></span></font></b><font color="#000000"><u><b><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> pm</span></font></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"> CT </font><font style="font-family:verdana,sans-serif"> </font><font style="font-family:verdana,sans-serif"> </font></b></u></font></font></font></div><div><div class="gmail_default"><div class="gmail_default"><div class="gmail_default"><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where:       </font></b><font color="#000000">Talk will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050">               </font><font color="#000000">    TTIC, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                   5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">  </span><span style="letter-spacing:0.2px"><font color="#0000ff" face="tahoma, sans-serif"><b> </b></font></span><span style="color:rgb(60,64,67);font-family:arial,sans-serif;letter-spacing:0.2px"> </span><i style="color:rgb(60,64,67);font-family:arial,sans-serif;letter-spacing:0.2px">via panopto: </i><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=814dffaf-5ad4-43c7-a281-b295017bd601" rel="noreferrer" target="_blank" style="font-family:arial,sans-serif;letter-spacing:0.2px"><b>livestream</b></a><span style="color:rgb(60,64,67);font-family:arial,sans-serif;letter-spacing:0.2px"> </span></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="color:rgb(34,34,34);letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif">                         </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(60,64,67);letter-spacing:0.2px"><b><font face="arial, sans-serif">                     </font></b></span></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050">    </font><font color="#000000"><font color="#500050">    </font></font></font></font></font>Tal Lancewicki, Tel Aviv University</p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div class="gmail_default"><div dir="ltr"><div><b>Title:</b> Near-optimal Regret in Online MDPs with Aggregate Bandit Feedback</div><div><br><b>Abstract: </b>The standard model of reinforcement learning (RL) assumes a rich feedback loop, where for each step within the episode the agent observes the loss in that state as feedback. While ideal, this is often not the case in real-world applications. For example, in multi-turn dialogues with an LLM, feedback is typically available only at the end of the entire dialogue, not for each intermediate response. Similarly, in robotic manipulation, feedback is often only available for the entire trajectory, indicating whether the robot successfully completed its task, rather than providing feedback at every step of the robot's movement.<br>In this talk, we will explore the challenge of learning Online Markov decision processes (MDPs) with aggregate bandit feedback (a.k.a full-bandit), where the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step. We will review prior algorithms and techniques for this problem and introduce a new Policy Optimization algorithm and its analysis.<br></div><div><br></div><div><b>Bio: </b>Tal is a final-year PhD student in the Department of Computer Science at Tel Aviv University, advised by Prof. Yishay Mansour. During his PhD, he has worked as a research intern at Amazon and the Bosch Center for Artificial Intelligence. His main research interests include Reinforcement Learning, Online Learning, and Multi-armed Bandits.</div><div><br></div></div></div></div></div><div><div class="gmail_default"><b style="font-family:arial,sans-serif">Host: </b><a href="mailto:avrim@ttic.edu" rel="noreferrer" target="_blank" style="font-family:arial,sans-serif"><b>Avrim Blum</b></a></div></div><div class="gmail_default"><br></div><div class="gmail_default"><br></div><div class="gmail_default"><br></div><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, Mar 3, 2025 at 6:16 PM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="auto"><div dir="ltr"><div dir="ltr"><div><div style="font-size:small"><font style="font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:inherit"><font style="vertical-align:inherit"><b>When:</b>    </font></font><font style="vertical-align:inherit"><font style="vertical-align:inherit"><font style="font-family:arial,sans-serif;color:rgb(0,0,0)">    Mon</font><span class="gmail_default" style="font-family:arial,sans-serif;color:rgb(0,0,0)">day, March 10,<span class="gmail_default"> </span>2025</span><font style="font-family:arial,sans-serif;color:rgb(0,0,0)"> at</font><b><font color="#000000" style="font-family:arial,sans-serif"> <span style="background-color:rgb(255,255,0)">2</span></font></b><font color="#000000"><b><u><font face="arial, sans-serif"><span style="background-color:rgb(255,255,0)"> pm</span></font></u></b><b style="background-color:rgb(255,255,0)"><font face="arial, sans-serif"><u> CT</u> </font><font style="font-family:verdana,sans-serif"> </font><font style="font-family:verdana,sans-serif"> </font></b></font></font></font></div><div><div><div><div><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><b><font color="#500050" face="arial, sans-serif"><br></font></b></p><p style="color:rgb(80,0,80);font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;line-height:normal;margin:0px"><font face="arial, sans-serif"><b><font color="#500050">Where:       </font></b><font color="#000000"><span>Talk</span> will be given </font><font color="#000000" style="font-weight:bold"><u>live, in-person</u></font><font style="font-weight:bold"> </font>at<br></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font color="#500050">               </font><font color="#000000">    <span>TTIC</span>, 6045 S. Kenwood Avenue</font></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif" color="#000000">                   5th Floor, Room 530<b> </b></font></p><p class="MsoNormal" style="margin:0in;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><b><span style="color:black"><br></span></b></font></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">Virtually:</b><span style="font-family:arial,sans-serif;color:rgb(60,64,67);letter-spacing:0.2px">  </span><span style="letter-spacing:0.2px"><font color="#0000ff" face="tahoma, sans-serif"><b> </b></font></span><span style="color:rgb(60,64,67);font-family:arial,sans-serif;letter-spacing:0.2px"> </span><i style="color:rgb(60,64,67);font-family:arial,sans-serif;letter-spacing:0.2px">via panopto: </i><a href="https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=814dffaf-5ad4-43c7-a281-b295017bd601" style="font-family:arial,sans-serif;letter-spacing:0.2px" rel="noreferrer" target="_blank"><b>livestream</b></a><span style="color:rgb(60,64,67);font-family:arial,sans-serif;letter-spacing:0.2px"> </span></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><b style="color:rgb(34,34,34);letter-spacing:0.2px"><font size="1" face="tahoma, sans-serif">                         </font></b></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;color:rgb(80,0,80);line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><span style="color:rgb(60,64,67);letter-spacing:0.2px"><b><font face="arial, sans-serif">                     </font></b></span></p><p class="MsoNormal" style="margin:0in 0in 0.0001pt;line-height:normal;background-image:initial;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial"><font face="arial, sans-serif"><font style="color:rgb(80,0,80);vertical-align:inherit"><font style="vertical-align:inherit"><b>Who: </b> <font color="#500050">    </font><font color="#000000"><font color="#500050">    </font></font></font></font></font>Tal Lancewicki, Tel Aviv University</p><div style="border-top:none;border-right:none;border-left:none;border-bottom:2.25pt solid rgb(11,118,159);padding:0in 0in 1pt"></div><div><br></div></div></div><div><div dir="ltr"><div><b>Title:</b> Near-optimal Regret in Online MDPs with Aggregate Bandit Feedback</div><div><br><b>Abstract: </b>The standard model of reinforcement learning (RL) assumes a rich feedback loop, where for each step within the episode the agent observes the loss in that state as feedback. While ideal, this is often not the case in real-world applications. For example, in multi-turn dialogues with an LLM, feedback is typically available only at the end of the entire dialogue, not for each intermediate response. Similarly, in robotic manipulation, feedback is often only available for the entire trajectory, indicating whether the robot successfully completed its task, rather than providing feedback at every step of the robot's movement.<br>In this talk, we will explore the challenge of learning Online Markov decision processes (MDPs) with aggregate bandit feedback (a.k.a full-bandit), where the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step. We will review prior algorithms and techniques for this problem and introduce a new Policy Optimization algorithm and its analysis.<br></div><div><br></div><div><b>Bio: </b>Tal is a final-year PhD student in the Department of Computer Science at Tel Aviv University, advised by Prof. Yishay Mansour. During his PhD, he has worked as a research intern at Amazon and the Bosch Center for Artificial Intelligence. His main research interests include Reinforcement Learning, Online Learning, and Multi-armed Bandits.</div><div><br></div></div></div></div></div><div><div><b style="font-family:arial,sans-serif">Host: </b><a href="mailto:avrim@ttic.edu" style="font-family:arial,sans-serif" rel="noreferrer" target="_blank"><b>Avrim Blum</b></a></div></div><div><br></div><div><br></div><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue, Rm 517</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">773-834-1757</font></i></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" rel="noreferrer" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div></div></div>
</blockquote></div></div>