[Colloquium] Dani/Dissertation Defense/11-16-07

Fri Nov 2 14:41:36 CDT 2007

		Department of Computer Science/The University of Chicago

				*** Dissertation Defense ***

Candidate:  Varsha Dani

Date:  Friday, November 16, 2007

Time and Location:  2:30 p.m. in Ryerson 251

Title:  Algorithms for Bandit Online Linear Optimization

Abstract:
The well-known ``multi-armed bandit'' problem is a problem of
iterated decision making in a largely unknown and constantly
changing environment. A decision-maker must repeatedly
select one of $k$ decisions, and incur a corresponding loss,
with the eventual goal of minimizing the ``regret,'' defined as
the difference between the actual total loss of an online
decision-making procedure and that of the best single decision
*in hindsight*. In 1995, Auer, Cesa-Bianchi, Freund and
Schapire presented an algorithm that achieves a regret that
grows as at most the square root of the number of rounds for
which the game is played. Their bound also has a square root
dependence on the number $k$ of possible decisions.

We study the problem of ``bandit online linear optimization,''
which  is a generalization of the multi-armed bandit problem
to a seting in which the decision set is some compact subset
of $\R^n$, and the loss functions to be optimized are linear.
Although the algorithms from the multi-armed badit setting can
be directly applied here, they suffer from the dependence of
of their regret on the the size of the decision set, since this
may be very large or even infinite. The aim, rather, is to find
algorithms whose regret is controlled by the ambient dimension
$n$. ``Bandit'' refers to the setting in which the algorithm is only
told its loss on each round, rather than the entire loss function.
Prior to this work, algorithms known for the general problem,
with arbitrary bounded linear cost functions, achieved
a regret bound that was polynomial in $n$ by sacrificing
their dependence on the number of rounds, the best previous
bounds being of order $\poly(n) T^{2/3}$. We bridge this gap by
presenting an algorithm that achieves a regret bound of
$O^* (n^{3/2}\sqrt{T})$. We also study a special case in which
the costs are drawn from some fixed but unknown distribution.
For this case we present an algorithm that is optimal up to
logarithmic factors.

Candidate's Advisor:  Prof. Lance Fortnow

A draft copy of Ms. Dani's dissertation is available in Ry 156.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Margaret P. Jaffey                             margaret at cs.uchicago.edu
Department of Computer Science
Student Support Rep (Ry 156)        (773) 702-6011
The University of Chicago                  http://www.cs.uchicago.edu
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=