Seminar by Yogeshwer Sharma

Multi-armed bandits in sponsored search auctions

Yogeshwer Sharma
Cornell Univ., Ithaca

Date:    Tuesday, January 5, 2010   
Time:    10:00 AM   
Venue:   CS102.

Abstract:

In this talk, we are going to focus on online decision problems. In an online decision problem, the algorithm is provided with a set of alternatives, and selects one alternative in each of the T sequential trials, deriving a re- ward for each selection. After T trials, the total reward of the algorithm is compared with the best "single-arm" strategy which has the maximum re- ward in hindsight. The difference between the reward of the best single-arm strategy and that of the algorithm is called the regret, and one seeks algo- rithms whose regret is sublinear in T and running time is polynomial in the problem size. We study an important class of online decision problems, called the multi- armed bandit problems, and extend the basic model in two important ways. In one extension, we model sponsored search auctions as a multi-armed ban- dit problem, and allow the alternatives (or advertisers in this case) to be strategic which can report possibly wrong rewards. We seek to provide in- centives to advertisers so as to get socially efficient solutions. We prove that any socially efficient solution that provides right incentives to advertisers must suffer much higher regret than the regret suffered by algorithms for multi-armed problem without incentive issues. In the second extension, also motivated by sponsored search and resource selection in distributed systems, we allow the set of available alternatives to vary over time, provide a natural way to define the regret, and give policies that suffer low regret. We also prove that the regret suffered by our policies is information-theoretically the lowest possible. The talk is going to be self-contained and I will not assume any prior knowledge of online learning or sponsored search auctions. This talk is based on joint works with Moshe Babaioff (Microsoft Research), Robert Kleinberg (Cornell University), Alexandru Niculescu-Mizil (IBM Watson), and Alex Slivkins (Microsoft Research).

Back to Seminars in 2009-10