[Theory] TODAY: 8/17 Talks at TTIC: Aadirupa Saha, TTIC

Wed Aug 17 10:00:00 CDT 2022

*When:*        Wednesday, August 17th at* 11:00 am CT*

*Where:       *Talk will be given *live, in-person* at

                   TTIC, 6045 S. Kenwood Avenue

                   5th Floor, Room 530

*Virtually:*  via Panopto (*livestream*
<https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=294c5b73-bf11-476a-8690-aef000154634>
)

*Who: *         Aadirupa Saha, TTIC

------------------------------

*Title: *  Learning to Make Context-Dependent Predictions Through
Preference Elicitation
*Abstract:* Customer statistics collected in several real-world systems
have reflected that users often prefer eliciting their liking for a given
pair of items, say (A,B), in terms of relative queries like: "Do you prefer
Item A over B?", rather than their absolute counterparts: ``How much do you
score items A and B on a scale of [0-10]?".

Drawing inspirations and with the hope of building cost-effective
user-friendly systems, this led to the famous formulation of Dueling
Bandits (DB) [Yue&Joachims'11], which was followed by a huge surge of
interest in the online learning from pairwise preferences in the
Bandits-Online ML community. While the setting could be extremely useful in
diverse fields of real-life applications, starting from recommendation
systems, to crowd-sourcing platforms, to search-engine optimization, online
retail, or even more complex tasks like designing multi-player games or
training chat-bots/ humanoid robots, just to name a few, unfortunately, the
existing DB techniques were predominantly limited only to the simpler
settings of only pairwise-preferences, finite decision spaces, and
stochastic environments, which are clearly unrealistic from a practical
standpoint.

In this work, we studied the practical framework of contextual dueling
bandits where the goal of the learner is to make customized predictions
based on the users' need (or user context). More formally, we study the
K-armed contextual dueling bandit problem, which is a sequential decision
making setting where the learner uses the contextual information to make
two decisions, but only observes *preference-based (relative)
feedback* suggesting
that one decision was better than the other. We focus on the regret
minimization problem under realizability, where the feedback is generated
by a pairwise preference matrix that is well-specified by a given function
class F. We provide a new algorithm that achieves the optimal regret rate
for a new notion of `best response' regret, which is a strictly stronger
performance measure than those considered in prior works. The algorithm is
also computationally efficient, running in polynomial time assuming access
to an online oracle for square loss regression over F. This resolves an
open problem of Dudík et al. [2015] on oracle efficient, regret-optimal
algorithms for contextual dueling bandits. We will conclude the talk with a
brief overview of the potential of bringing preference-based learning into
real-world systems.

(based on the joint work with Akshay Krishnamurthy, "Efficient and Optimal
Algorithms for Contextual Dueling Bandits under Realizability", ALT 2022).

*Speaker Bio:* Aadirupa is visiting faculty at TTI Chicago. Before this,
she was a postdoctoral researcher at Microsoft Research New York City. She
obtained her Ph.D. from the Department of Computer Science, Indian
Institute of Science, Bangalore, advised by Aditya Gopalan and Chiranjib
Bhattacharyya. Aadirupa was an intern at Microsoft Research, Bangalore,
Inria, Paris, and Google AI, Mountain View. Her research interests include
Bandits, Reinforcement Learning, Optimization, Learning theory, Algorithms.
Off late, she is also very interested in working on problems in the
intersection of ML and Game theory, Algorithmic fairness, and Privacy.

*Host:* *Avrim Blum* <avrim at ttic.edu>

Mary C. Marre
Faculty Administrative Support
*Toyota Technological Institute*
*6045 S. Kenwood Avenue*
*Chicago, IL  60637*
*mmarre at ttic.edu <mmarre at ttic.edu>*

On Tue, Aug 16, 2022 at 1:06 PM Mary Marre <mmarre at ttic.edu> wrote:

> *When:*        Wednesday, August 17th at* 11:00 am CT*
>
>
> *Where:       *Talk will be given *live, in-person* at
>
>                    TTIC, 6045 S. Kenwood Avenue
>
>                    5th Floor, Room 530
>
> *Virtually:*  via Panopto (*livestream*
> <https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=294c5b73-bf11-476a-8690-aef000154634>
> )
>
> *Who: *         Aadirupa Saha, TTIC
>
>
> ------------------------------
>
> *Title: *  Learning to Make Context-Dependent Predictions Through
> Preference Elicitation
> *Abstract:* Customer statistics collected in several real-world systems
> have reflected that users often prefer eliciting their liking for a given
> pair of items, say (A,B), in terms of relative queries like: "Do you prefer
> Item A over B?", rather than their absolute counterparts: ``How much do you
> score items A and B on a scale of [0-10]?".
>
> Drawing inspirations and with the hope of building cost-effective
> user-friendly systems, this led to the famous formulation of Dueling
> Bandits (DB) [Yue&Joachims'11], which was followed by a huge surge of
> interest in the online learning from pairwise preferences in the
> Bandits-Online ML community. While the setting could be extremely useful in
> diverse fields of real-life applications, starting from recommendation
> systems, to crowd-sourcing platforms, to search-engine optimization, online
> retail, or even more complex tasks like designing multi-player games or
> training chat-bots/ humanoid robots, just to name a few, unfortunately, the
> existing DB techniques were predominantly limited only to the simpler
> settings of only pairwise-preferences, finite decision spaces, and
> stochastic environments, which are clearly unrealistic from a practical
> standpoint.
>
> In this work, we studied the practical framework of contextual dueling
> bandits where the goal of the learner is to make customized predictions
> based on the users' need (or user context). More formally, we study the
> K-armed contextual dueling bandit problem, which is a sequential decision
> making setting where the learner uses the contextual information to make
> two decisions, but only observes *preference-based (relative) feedback* suggesting
> that one decision was better than the other. We focus on the regret
> minimization problem under realizability, where the feedback is generated
> by a pairwise preference matrix that is well-specified by a given function
> class F. We provide a new algorithm that achieves the optimal regret rate
> for a new notion of `best response' regret, which is a strictly stronger
> performance measure than those considered in prior works. The algorithm is
> also computationally efficient, running in polynomial time assuming access
> to an online oracle for square loss regression over F. This resolves an
> open problem of Dudík et al. [2015] on oracle efficient, regret-optimal
> algorithms for contextual dueling bandits. We will conclude the talk with a
> brief overview of the potential of bringing preference-based learning into
> real-world systems.
>
> (based on the joint work with Akshay Krishnamurthy, "Efficient and Optimal
> Algorithms for Contextual Dueling Bandits under Realizability", ALT 2022).
>
> *Speaker Bio:* Aadirupa is visiting faculty at TTI Chicago. Before this,
> she was a postdoctoral researcher at Microsoft Research New York City. She
> obtained her Ph.D. from the Department of Computer Science, Indian
> Institute of Science, Bangalore, advised by Aditya Gopalan and Chiranjib
> Bhattacharyya. Aadirupa was an intern at Microsoft Research, Bangalore,
> Inria, Paris, and Google AI, Mountain View. Her research interests
> include Bandits, Reinforcement Learning, Optimization, Learning theory,
> Algorithms. Off late, she is also very interested in working on problems in
> the intersection of ML and Game theory, Algorithmic fairness, and Privacy.
>
> *Host:* *Avrim Blum* <avrim at ttic.edu>
>
>
>
>
> Mary C. Marre
> Faculty Administrative Support
> *Toyota Technological Institute*
> *6045 S. Kenwood Avenue*
> *Chicago, IL  60637*
> *mmarre at ttic.edu <mmarre at ttic.edu>*
>
>
> On Sat, Aug 13, 2022 at 10:25 AM Mary Marre <mmarre at ttic.edu> wrote:
>
>> *When:*        Wednesday, August 17th at* 11:00 am CT*
>>
>>
>> *Where:       *Talk will be given *live, in-person* at
>>
>>                    TTIC, 6045 S. Kenwood Avenue
>>
>>                    5th Floor, Room 530
>>
>> *Virtually:*  via Panopto (*livestream*
>> <https://uchicago.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=294c5b73-bf11-476a-8690-aef000154634>
>> )
>>
>> *Who: *         Aadirupa Saha, TTIC
>>
>>
>> ------------------------------
>>
>> *Title: *  Learning to Make Context-Dependent Predictions Through
>> Preference Elicitation
>> *Abstract:* Customer statistics collected in several real-world systems
>> have reflected that users often prefer eliciting their liking for a given
>> pair of items, say (A,B), in terms of relative queries like: "Do you prefer
>> Item A over B?", rather than their absolute counterparts: ``How much do you
>> score items A and B on a scale of [0-10]?".
>>
>> Drawing inspirations and with the hope of building cost-effective
>> user-friendly systems, this led to the famous formulation of Dueling
>> Bandits (DB) [Yue&Joachims'11], which was followed by a huge surge of
>> interest in the online learning from pairwise preferences in the
>> Bandits-Online ML community. While the setting could be extremely useful in
>> diverse fields of real-life applications, starting from recommendation
>> systems, to crowd-sourcing platforms, to search-engine optimization, online
>> retail, or even more complex tasks like designing multi-player games or
>> training chat-bots/ humanoid robots, just to name a few, unfortunately, the
>> existing DB techniques were predominantly limited only to the simpler
>> settings of only pairwise-preferences, finite decision spaces, and
>> stochastic environments, which are clearly unrealistic from a practical
>> standpoint.
>>
>> In this work, we studied the practical framework of contextual dueling
>> bandits where the goal of the learner is to make customized predictions
>> based on the users' need (or user context). More formally, we study the
>> K-armed contextual dueling bandit problem, which is a sequential decision
>> making setting where the learner uses the contextual information to make
>> two decisions, but only observes *preference-based (relative) feedback* suggesting
>> that one decision was better than the other. We focus on the regret
>> minimization problem under realizability, where the feedback is generated
>> by a pairwise preference matrix that is well-specified by a given function
>> class F. We provide a new algorithm that achieves the optimal regret rate
>> for a new notion of `best response' regret, which is a strictly stronger
>> performance measure than those considered in prior works. The algorithm is
>> also computationally efficient, running in polynomial time assuming access
>> to an online oracle for square loss regression over F. This resolves an
>> open problem of Dudík et al. [2015] on oracle efficient, regret-optimal
>> algorithms for contextual dueling bandits. We will conclude the talk with a
>> brief overview of the potential of bringing preference-based learning into
>> real-world systems.
>>
>> (based on the joint work with Akshay Krishnamurthy, "Efficient and
>> Optimal Algorithms for Contextual Dueling Bandits under Realizability", ALT
>> 2022).
>>
>> *Speaker Bio:* Aadirupa is visiting faculty at TTI Chicago. Before this,
>> she was a postdoctoral researcher at Microsoft Research New York City. She
>> obtained her Ph.D. from the Department of Computer Science, Indian
>> Institute of Science, Bangalore, advised by Aditya Gopalan and Chiranjib
>> Bhattacharyya. Aadirupa was an intern at Microsoft Research, Bangalore,
>> Inria, Paris, and Google AI, Mountain View. Her research interests
>> include Bandits, Reinforcement Learning, Optimization, Learning theory,
>> Algorithms. Off late, she is also very interested in working on problems in
>> the intersection of ML and Game theory, Algorithmic fairness, and Privacy.
>>
>> *Host:* *Avrim Blum* <avrim at ttic.edu>
>>
>>
>> Mary C. Marre
>> Faculty Administrative Support
>> *Toyota Technological Institute*
>> *6045 S. Kenwood Avenue*
>> *Chicago, IL  60637*
>> *mmarre at ttic.edu <mmarre at ttic.edu>*
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cs.uchicago.edu/pipermail/theory/attachments/20220817/3bad8a72/attachment-0001.html>