[Colloquium] Talk by Alexander Rakhlin, University of California - Berkeley on Wednesday, April 2, 2008
caseyk
caseyk at cs.uchicago.edu
Thu Mar 6 13:36:02 CST 2008
DEPARTMENT OF COMPUTER SCIENCE
UNIVERSITY OF CHICAGO
Date: Wednesday, April 2, 2008
Time: 2:30 p.m.
Place: Ryerson 251, 1100 E. 58th Street
----------------------------------------------------------
Speaker: Alexander Rakhlin
From: University of California - Berkeley
Web page: http://www.eecs.berkeley.edu/~rakhlin/
Title: Online Learning with Limited Feedback
Abstract: One’s ability to learn and make decisions rests heavily on
the availability of feedback. In sequential decision-making problems
such feedback is often limited. A gambler, for example, can observe
entirely the outcome of a horse race regardless of where he place is
bet; however, when the same gambler chooses his route to travel to the
race track, perhaps at a busy hour, he will likely never learn the
outcome of possible alternatives. The latter limited-feedback problem
is the focus of this talk.
The problem can be phrased as an Online Linear Optimization game with
“bandit” feedback. The existence of a low-regret algorithm has been an
open question since the work of Awerbuch and Kleinberg in 2004. A
recent reduction to the multi-armed bandit problem allowed Dani,
Hayes, and Kakade to achieve a low-regret guarantee, unfortunately at
the expense of computational efficiency. We present the first known
efficient low-regret algorithm for bandit Online Linear Optimization
over arbitrary convex decision sets. We show how the difficulties
encountered by previous approaches are overcome by employing
regularization. Our solution reveals surprising connections between
online learning and Interior Point methods in Optimization. We also
discuss an emerging phenomenon: regret guarantees for stochastic and
adversarial settings turn out to be the same.
In particular, our method solves the Online Shortest Path problem: at
each round, a path from source to sink is chosen and only the total
length (delay) of this path is revealed. A low-regret algorithm for
this problem has applications in network routing, resource allocation,
dynamic treatment of patients, and more. The worst-case guarantees
enjoyed by our method imply robustness with respect to noise and
malicious adversary.
Joint work with Jacob Abernethy and Elad Hazan.
---------------------------------------------------------
Host: Partha Niyogi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.cs.uchicago.edu/pipermail/colloquium/attachments/20080306/cf62a8a1/attachment.html
More information about the Colloquium
mailing list