当前位置:
X-MOL 学术
›
Tour. Manag.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Facilitating topic modeling in tourism research:Comprehensive comparison of new AI technologies
Tourism Management ( IF 10.9 ) Pub Date : 2024-07-31 , DOI: 10.1016/j.tourman.2024.105007 Andrei P. Kirilenko , Svetlana Stepchenkova
Tourism Management ( IF 10.9 ) Pub Date : 2024-07-31 , DOI: 10.1016/j.tourman.2024.105007 Andrei P. Kirilenko , Svetlana Stepchenkova
In the past few years, a new crop of transformer-based language models such as Google's BERT and OpenAI's ChatGPT has become increasingly popular in text analysis, owing their success to their ability to capture the entire document's context. These new methods, however, have yet to percolate into tourism academic literature. This paper aims to fill in this gap by providing a comparative analysis of these instruments against the commonly used Latent Dirichlet Allocation for topic extraction of contrasting tourism-related data: coherent vs. noisy, short vs. long, and small vs. large corpus size. The data are typical of tourism literature and include comments of followers of a popular blogger, TripAdvisor reviews, and review titles. We provide recommendations of data domains where the review methods demonstrate the best performance, consider success dimensions, and discuss each method's strong and weak sides. In general, GPT tends to return comprehensive, highly interpretable, and relevant to the real-world topics for all datasets, including the noisy ones, and at all scales. Meanwhile, ChatGPT is the most vulnerable to the issue of trust common to the “black box” model, which we explore in detail.
中文翻译:
促进旅游研究中的主题建模:人工智能新技术的综合比较
在过去的几年里,一批新的基于 Transformer 的语言模型(例如 Google 的 BERT 和 OpenAI 的 ChatGPT)在文本分析中变得越来越流行,因为它们能够捕获整个文档的上下文。然而,这些新方法尚未渗透到旅游学术文献中。本文旨在通过对这些工具与常用的潜在狄利克雷分配进行比较分析来填补这一空白,用于对比旅游相关数据的主题提取:连贯与嘈杂、短与长、小与大语料库大小。这些数据是典型的旅游文献数据,包括热门博主的关注者评论、TripAdvisor 评论和评论标题。我们提供审查方法表现出最佳性能的数据域的建议,考虑成功的维度,并讨论每种方法的优点和缺点。一般来说,GPT 倾向于返回全面的、高度可解释的、与所有数据集(包括噪声数据集)的现实世界主题相关的所有规模的数据集。同时,ChatGPT 最容易受到“黑匣子”模型常见的信任问题的影响,我们对此进行了详细探讨。
更新日期:2024-07-31
中文翻译:
促进旅游研究中的主题建模:人工智能新技术的综合比较
在过去的几年里,一批新的基于 Transformer 的语言模型(例如 Google 的 BERT 和 OpenAI 的 ChatGPT)在文本分析中变得越来越流行,因为它们能够捕获整个文档的上下文。然而,这些新方法尚未渗透到旅游学术文献中。本文旨在通过对这些工具与常用的潜在狄利克雷分配进行比较分析来填补这一空白,用于对比旅游相关数据的主题提取:连贯与嘈杂、短与长、小与大语料库大小。这些数据是典型的旅游文献数据,包括热门博主的关注者评论、TripAdvisor 评论和评论标题。我们提供审查方法表现出最佳性能的数据域的建议,考虑成功的维度,并讨论每种方法的优点和缺点。一般来说,GPT 倾向于返回全面的、高度可解释的、与所有数据集(包括噪声数据集)的现实世界主题相关的所有规模的数据集。同时,ChatGPT 最容易受到“黑匣子”模型常见的信任问题的影响,我们对此进行了详细探讨。