Foundations and Trends in Information Retrieval ( IF 8.3 ) Pub Date : 2018-1-2 , DOI: 10.1561/1500000062 Doris Hoogeveen , Li Wang , Timothy Baldwin , Karin M. Verspoor
This survey presents an overview of information retrieval, natural language processing and machine learning research that makes use of forum data, including both discussion forums and community questionanswering (cQA) archives. The focus is on automated analysis, with the goal of gaining a better understanding of the data and its users. We discuss the different strategies used for both retrieval tasks (post retrieval, question retrieval, and answer retrieval) and classification tasks (post type classification, question classification, post quality assessment, subjectivity, and viewpoint classification) at the post level, as well as at the thread level (thread retrieval, solvedness and task orientation, discourse structure recovery and dialogue act tagging, QA-pair extraction, and thread summarisation). We also review work on forum users, including user satisfaction, expert finding, question recommendation and routing, and community analysis. The survey includes a brief history of forums, an overview of the different kinds of forums, a summary of publicly available datasets for forum research, and a short discussion on the evaluation of retrieval tasks using forum data. The aim is to give a broad overview of the different kinds of forum research, a summary of the methods that have been applied, some insights into successful strategies, and potential areas for future research.
中文翻译:
网络论坛检索和文本分析:调查
此调查概述了利用论坛数据(包括讨论论坛和社区问题解答(cQA)档案)的信息检索,自然语言处理和机器学习研究的概况。重点是自动分析,目的是更好地了解数据及其用户。我们讨论了帖子级别的检索任务(帖子检索,问题检索和答案检索)和分类任务(文章类型分类,问题分类,文章质量评估,主观性和观点分类)所使用的不同策略,以及在线程级别(线程检索,解决能力和任务定向,话语结构恢复和对话行为标记,QA对提取和线程摘要)。我们还会审查有关论坛用户的工作,包括用户满意度,专家发现,问题推荐和路由以及社区分析。该调查包括论坛的简短历史,不同类型的论坛的概述,用于论坛研究的公共可用数据集的摘要,以及关于使用论坛数据评估检索任务的简短讨论。目的是对不同类型的论坛研究进行广泛概述,总结已应用的方法,对成功策略的一些见解以及未来研究的潜在领域。简短讨论使用论坛数据评估检索任务。目的是对不同类型的论坛研究进行广泛概述,总结已应用的方法,对成功策略的一些见解以及未来研究的潜在领域。简短讨论使用论坛数据评估检索任务。目的是对不同类型的论坛研究进行广泛概述,总结已应用的方法,对成功策略的一些见解以及未来研究的潜在领域。