Foundations and Trends in Information Retrieval ( IF 8.3 ) Pub Date : 2019-5-22 , DOI: 10.1561/1500000067 Dorota Glowacka
Bandit algorithms, named after casino slot machines sometimes known as “one-armed bandits”, fall into a broad category of stochastic scheduling problems. In the setting with multiple arms, each arm generates a reward with a given probability. The gambler’s aim is to find the arm producing the highest payoff and then continue playing in order to accumulate the maximum reward possible. However, having only a limited number of plays, the gambler is faced with a dilemma: should he play the arm currently known to produce the highest reward or should he keep on trying other arms in the hope of finding a better paying one? This problem formulation is easily applicable to many real-life scenarios, hence in recent years there has been an increased interest in developing bandit algorithms for a range of applications. In information retrieval and recommender systems, bandit algorithms, which are simple to implement and do not require any training data, have been particularly popular in online personalization, online ranker evaluation and search engine optimization. This survey provides a brief overview of bandit algorithms designed to tackle specific issues in information retrieval and recommendation and, where applicable, it describes how they were applied in practice.
中文翻译:
信息检索中的强盗算法
以赌场老虎机(有时被称为“单臂匪徒”)命名的强盗算法属于随机调度问题的大类。在多臂情况下,每个臂都会以给定的概率生成奖励。赌徒的目的是找到产生最高回报的手臂,然后继续玩以积累最大的报酬。但是,赌徒数量有限,因此面临两难选择:他应该玩目前已知能产生最高奖励的那支武器,还是应该继续尝试其他武器以希望找到薪水更高的武器?这个问题的表达很容易适用于许多现实情况,因此,近年来,对于开发适用于各种应用的强盗算法的兴趣日益浓厚。在信息检索和推荐系统中,易于实现且不需要任何训练数据的匪徒算法在在线个性化,在线排名评估和搜索引擎优化中特别受欢迎。这项调查简要概述了旨在解决信息检索和推荐中的特定问题的强盗算法,并在适用时描述了它们在实践中的应用方式。