当前位置: X-MOL 学术MIS Quarterly › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Extracting Actionable Insights from Text Data: A Stable Topic Model Approach
MIS Quarterly ( IF 7.0 ) Pub Date : 2023-09-01 , DOI: 10.25300/misq/2022/16957
Yi Yang , , Ramanath Subramanyam ,

Topic models are becoming a frequently employed tool in the empirical methods repertoire of information systems and management scholars. Given textual corpora, such as consumer reviews and online discussion forums, researchers and business practitioners often use topic modeling to either explore data in an unsupervised fashion or generate variables of interest for subsequent econometric analysis. However, one important concern stems from the fact that topic models can be notorious for their instability, i.e., the generated results could be inconsistent and irreproducible at different times, even on the same dataset. Therefore, researchers might arrive at potentially unreliable results regarding the theoretical relationships that they are testing or developing. In this paper, we attempt to highlight this problem and suggest a potential approach to addressing it. First, we empirically define and evaluate the stability problem of topic models using four textual datasets. Next, to alleviate the problem and with the goal of extracting actionable insights from textual data, we propose a new method, Stable LDA, which incorporates topical word clusters into the topic model to steer the model inference toward consistent results. We show that the proposed Stable LDA approach can significantly improve model stability while maintaining or even improving the topic model quality. Further, employing two case studies related to an online knowledge community and online consumer reviews, we demonstrate that the variables generated from Stable LDA can lead to more consistent estimations in econometric analyses. We believe that our work can further enhance management scholars’ collective toolkit to analyze ever-growing textual data.

中文翻译:

从文本数据中提取可行的见解:稳定的主题模型方法

主题模型正在成为信息系统和管理学者实证方法库中经常使用的工具。给定文本语料库,例如消费者评论和在线讨论论坛,研究人员和商业从业者经常使用主题建模来以无监督的方式探索数据或生成感兴趣的变量以供后续计量经济学分析。然而,一个重要的问题源于这样一个事实:主题模型可能因其不稳定性而臭名昭著,即生成的结果在不同时间可能不一致且不可重现,即使在同一数据集上也是如此。因此,研究人员可能会得出关于他们正在测试或开发的理论关系的潜在不可靠的结果。在本文中,我们试图强调这个问题并提出解决这个问题的可能方法。首先,我们使用四个文本数据集凭经验定义和评估主题模型的稳定性问题。接下来,为了缓解这个问题,并以从文本数据中提取可行的见解为目标,我们提出了一种新方法,即稳定 LDA,它将主题词簇合并到主题模型中,以引导模型推理获得一致的结果。我们表明,所提出的稳定 LDA 方法可以显着提高模型稳定性,同时保持甚至提高主题模型质量。此外,通过与在线知识社区和在线消费者评论相关的两个案例研究,我们证明了稳定 LDA 生成的变量可以在计量经济分析中产生更一致的估计。
更新日期:2023-09-01
down
wechat
bug