当前位置: X-MOL 学术Communication Methods and Measures › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Expert-Informed Topic Models for Document Set Discovery
Communication Methods and Measures ( IF 11.4 ) Pub Date : 2021-06-30 , DOI: 10.1080/19312458.2021.1920008
Eike Mark Rinke 1, 2 , Timo Dobbrick 2 , Charlotte Löb 3 , Cäcilia Zirn 4 , Hartmut Wessler 3
Affiliation  

ABSTRACT

The first step in many text-as-data studies is to find documents that address a specific topic within a larger document set. Researchers often rely on simple keyword searches to do this, even though this may introduce considerable selection bias. Such bias may be even greater when researchers lack the domain knowledge required to make informed search decisions, for example, in cross-national research or research on unfamiliar social contexts. We propose expert-informed topic modeling (EITM) as a hybrid approach to tackle this problem. EITM combines the validity of external domain knowledge captured through expert surveys with probabilistic topic models to help researchers identify subsets of documents that cover initially unknown domain-specific topics, such as specific events and debates, that belong to a researcher-defined master topic. EITM is a flexible and efficient approach to the thematic selection of documents from large text corpora for further study. We benchmark and validate the method by discovering blog posts that address the public role of religion within large corpora of Australian, Swiss, and Turkish blog posts and provide researchers with a complete workflow to guide the application of EITM in their own work.



中文翻译:

用于文档集发现的专家知情主题模型

摘要

许多文本即数据研究的第一步是在更大的文档集中找到解决特定主题的文档。研究人员通常依靠简单的关键字搜索来做到这一点,即使这可能会引入相当大的选择偏差。当研究人员缺乏做出明智搜索决策所需的领域知识时,这种偏见可能会更大,例如,在跨国研究或对不熟悉的社会背景的研究中。我们提出专家知情的主题建模(EITM) 作为解决此问题的混合方法。EITM 将通过专家调查捕获的外部领域知识的有效性与概率主题模型相结合,以帮助研究人员识别属于研究人员定义的主主题的最初未知的特定领域主题的文档子集,例如特定事件和辩论。EITM 是一种灵活有效的方法,可以从大型文本语料库中对文档进行主题选择以供进一步研究。我们通过在澳大利亚、瑞士和土耳其的大型博客文章语料库中发现解决宗教公共角色的博客文章来对方法进行基准测试和验证,并为研究人员提供完整的工作流程来指导 EITM 在他们自己的工作中的应用。

更新日期:2021-06-30
down
wechat
bug