当前位置: X-MOL 学术Found. Trends Inf. Ret. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Online Evaluation for Information Retrieval
Foundations and Trends in Information Retrieval ( IF 8.3 ) Pub Date : 2016-6-21 , DOI: 10.1561/1500000051
Katja Hofmann , Lihong Li , Filip Radlinski

Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users’ interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience. In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data. A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.



中文翻译:

信息检索在线评估

在线评估是衡量信息检索系统有效性的最常用方法之一。它涉及将信息检索系统部署到实际用户中,并观察这些用户与系统互动时的原位交互。这使具有现实世界信息需求的实际用户在评估检索质量中扮演重要角色。因此,在线评估补充了常见的替代性离线评估方法,这些方法可以提供更容易解释的结果,但在衡量质量和实际用户体验时通常不太现实。在此调查中,我们概述了用于信息检索的在线评估技术。我们展示了如何将在线评估用于受控实验,将其细分为允许绝对或相对质量评估的实验设计。我们对不同指标的介绍进一步根据常见的不同规模的实验单位(文档,列表和会话)对在线评估进行了划分。此外,我们还对有关数据重用的最新工作以及基于历史数据的实验估计进行了广泛的讨论。这项工作的大部分内容集中在实际问题上:如何在实践中进行评估,如何选择实验参数,如何考虑在线评估中固有的道德考量以及实验者应意识到的局限性。尽管当今大多数已发布的有关在线实验的工作都是在拥有数百万用户的系统中大规模进行的,但我们也强调可以将相同的技术以小规模应用。为此,我们强调最近的工作,使它更易于在较小的规模上使用,并鼓励研究在各种情况下寻找真实世界的信息。最后,我们总结了该领域的最新工作,并描述了未解决的问题以及未来的发展方向。

更新日期:2016-06-21
down
wechat
bug