<html><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">DEPARTMENT OF COMPUTER SCIENCE<div><br class="webkit-block-placeholder"></div><div>UNIVERSITY OF CHICAGO</div><div><br class="webkit-block-placeholder"></div><div>Date: Wednesday, April 2, 2008</div><div>Time: 2:30 p.m.</div><div>Place: Ryerson 251, 1100 E. 58th Street</div><div><br class="webkit-block-placeholder"></div><div>----------------------------------------------------------</div><div><br class="webkit-block-placeholder"></div><div>Speaker:<span class="Apple-tab-span" style="white-space: pre; ">        </span>Alexander Rakhlin</div><div><br class="webkit-block-placeholder"></div><div>From:<span class="Apple-tab-span" style="white-space: pre; ">                </span>University of California - Berkeley</div><div><br class="webkit-block-placeholder"></div><div>Web page:<span class="Apple-tab-span" style="white-space: pre; ">        <a href="http://www.cs.princeton.edu/~frances/">http://www.eecs.berkeley.edu/~rakhlin/</a></span></div><div><br class="webkit-block-placeholder"></div><div>Title: Online Learning with Limited Feedback</div><div><br class="webkit-block-placeholder"></div><div>Abstract:&nbsp;One’s ability to learn and make decisions rests heavily on

the availability of feedback. In sequential decision-making problems such

feedback is often limited. A gambler, for example, can observe entirely the

outcome of a horse race regardless of where he place is bet; however, when the

same gambler chooses his route to travel to the race track, perhaps at a busy

hour, he will likely never learn the outcome of possible alternatives. The

latter limited-feedback problem is the focus of this talk.&nbsp;</div><p class="MsoNormal" style="mso-pagination:none;mso-layout-grid-align:none;
text-autospace:none"><o:p></o:p></p><p class="MsoNormal" style="mso-pagination:none;mso-layout-grid-align:none;
text-autospace:none">The problem can be phrased as an Online Linear

Optimization game with “bandit” feedback. The existence of a low-regret

algorithm has been an open question since the work of Awerbuch and Kleinberg in

2004. A recent reduction to the multi-armed bandit problem allowed Dani, Hayes,

and Kakade to achieve a low-regret guarantee, unfortunately at the expense of

computational efficiency. We present the first known efficient low-regret

algorithm for bandit Online Linear Optimization over arbitrary convex decision

sets. We show how the difficulties encountered by previous approaches are

overcome by employing regularization. Our solution reveals surprising

connections between online learning and Interior Point methods in Optimization.

We also discuss an emerging phenomenon: regret guarantees for stochastic and

adversarial settings turn out to be the same.&nbsp;</p><p class="MsoNormal" style="mso-pagination:none;mso-layout-grid-align:none;
text-autospace:none"><o:p></o:p></p><p class="MsoNormal" style="mso-pagination:none;mso-layout-grid-align:none;
text-autospace:none">In particular, our method solves the Online Shortest Path

problem: at each round, a path from source to sink is chosen and only the total

length (delay) of this path is revealed. A low-regret algorithm for this

problem has applications in network routing, resource allocation, dynamic

treatment of patients, and more. The worst-case guarantees enjoyed by our

method imply robustness with respect to noise and malicious adversary.&nbsp;</p><p class="MsoNormal" style="mso-pagination:none;mso-layout-grid-align:none;
text-autospace:none"><o:p></o:p></p><p class="MsoNormal" style="mso-pagination:none;mso-layout-grid-align:none;
text-autospace:none">Joint work with Jacob Abernethy and Elad Hazan.<span style="font-size:11.0pt;font-family:Helvetica"><o:p></o:p></span></p>

<!--EndFragment-->

<p class="MsoNormal">---------------------------------------------------------</p><p class="MsoNormal">Host:<span class="Apple-tab-span" style="white-space: pre; ">        </span>Partha Niyogi</p></body></html>