[Theory] 8/17 Talks at TTIC: Aadirupa Saha, TTIC

Sat Aug 13 10:25:19 CDT 2022

*When:*        Wednesday, August 17th at* 11:00 am CT*

*Where:       *Talk will be given *live, in-person* at

                   TTIC, 6045 S. Kenwood Avenue

                   5th Floor, Room 530

*Virtually:*  via Panopto (*livestream*
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=294c5b73-bf11-476a-8690-aef000154634>
)

*Who: *         Aadirupa Saha, TTIC

------------------------------

*Title: *  Learning to Make Context-Dependent Predictions Through
Preference Elicitation
*Abstract:* Customer statistics collected in several real-world systems
have reflected that users often prefer eliciting their liking for a given
pair of items, say (A,B), in terms of relative queries like: "Do you prefer
Item A over B?", rather than their absolute counterparts: ``How much do you
score items A and B on a scale of [0-10]?".

Drawing inspirations and with the hope of building cost-effective
user-friendly systems, this led to the famous formulation of Dueling
Bandits (DB) [Yue&Joachims'11], which was followed by a huge surge of
interest in the online learning from pairwise preferences in the
Bandits-Online ML community. While the setting could be extremely useful in
diverse fields of real-life applications, starting from recommendation
systems, to crowd-sourcing platforms, to search-engine optimization, online
retail, or even more complex tasks like designing multi-player games or
training chat-bots/ humanoid robots, just to name a few, unfortunately, the
existing DB techniques were predominantly limited only to the simpler
settings of only pairwise-preferences, finite decision spaces, and
stochastic environments, which are clearly unrealistic from a practical
standpoint.

In this work, we studied the practical framework of contextual dueling
bandits where the goal of the learner is to make customized predictions
based on the users' need (or user context). More formally, we study the
K-armed contextual dueling bandit problem, which is a sequential decision
making setting where the learner uses the contextual information to make
two decisions, but only observes *preference-based (relative)
feedback* suggesting
that one decision was better than the other. We focus on the regret
minimization problem under realizability, where the feedback is generated
by a pairwise preference matrix that is well-specified by a given function
class F. We provide a new algorithm that achieves the optimal regret rate
for a new notion of `best response' regret, which is a strictly stronger
performance measure than those considered in prior works. The algorithm is
also computationally efficient, running in polynomial time assuming access
to an online oracle for square loss regression over F. This resolves an
open problem of Dudík et al. [2015] on oracle efficient, regret-optimal
algorithms for contextual dueling bandits. We will conclude the talk with a
brief overview of the potential of bringing preference-based learning into
real-world systems.

(based on the joint work with Akshay Krishnamurthy, "Efficient and Optimal
Algorithms for Contextual Dueling Bandits under Realizability", ALT 2022).

*Speaker Bio:* Aadirupa is visiting faculty at TTI Chicago. Before this,
she was a postdoctoral researcher at Microsoft Research New York City. She
obtained her Ph.D. from the Department of Computer Science, Indian
Institute of Science, Bangalore, advised by Aditya Gopalan and Chiranjib
Bhattacharyya. Aadirupa was an intern at Microsoft Research, Bangalore,
Inria, Paris, and Google AI, Mountain View. Her research interests include
Bandits, Reinforcement Learning, Optimization, Learning theory, Algorithms.
Off late, she is also very interested in working on problems in the
intersection of ML and Game theory, Algorithmic fairness, and Privacy.

*Host:* *Avrim Blum* <avrim at ttic.edu>

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Chicago, IL  60637*
*mmarre at ttic.edu <mmarre at ttic.edu>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20220813/68ff98e1/attachment-0001.html>