Semi-Automated Nonresponse Detection for Open-Text Survey Data,Social Science Computer Review

当前位置： X-MOL 学术 › Soc. Sci. Comput. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Semi-Automated Nonresponse Detection for Open-Text Survey Data
Social Science Computer Review ( IF 3.0 ) Pub Date : 2024-05-10 , DOI: 10.1177/08944393241249720
Kristen Cibelli Hibben ₁ , Zachary Smith ₁ , Benjamin Rogers ₁ , Valerie Ryan ₁ , Paul Scanlon ₁ , Travis Hoppe ₁

Affiliation

Open-ended survey questions can enable researchers to gain insights beyond more commonly used closed-ended question formats by allowing respondents an opportunity to provide information with few constraints and in their own words. Open-ended web probes are also increasingly used to inform the design and evaluation of survey questions. However, open-ended questions are more susceptible to insufficient or irrelevant responses that can be burdensome and time-consuming to identify and remove manually, often resulting in underuse of open-ended questions and, when used, potential inclusion of poor-quality data. To address these challenges, we developed and publicly released the Semi-Automated Nonresponse Detection for Survey text (SANDS), an item nonresponse detection approach based on a Bidirectional Transformer for Language Understanding model, fine-tuned using Simple Contrastive Sentence Embedding and targeted human coding, to categorize open-ended text data as valid or likely nonresponse. This approach is powerful in that it uses natural language processing as opposed to existing nonresponse detection approaches that have relied exclusively on rules or regular expressions or used bag-of-words approaches that tend to perform less well on short pieces of text, typos, or uncommon words, often prevalent in open-text survey data. This paper presents the development of SANDS and a quantitative evaluation of its performance and potential bias using open-text responses from a series of web probes as case studies. Overall, the SANDS model performed well in identifying a dataset of likely valid results to be used for quantitative or qualitative analysis, particularly on health-related data. Developed for generalizable use and accessible to others, the SANDS model can greatly improve the efficiency of identifying inadequate and irrelevant open-text responses, offering expanded opportunities for the use of open-text data to inform question design and improve survey data quality.

中文翻译：

开放文本调查数据的半自动化无响应检测

开放式调查问题可以让受访者有机会用自己的话提供不受限制的信息，从而使研究人员能够获得更常用的封闭式问题格式之外的见解。开放式网络调查也越来越多地用于为调查问题的设计和评估提供信息。然而，开放式问题更容易受到不充分或不相关的答复的影响，手动识别和删除这些答复可能会很麻烦且耗时，通常会导致开放式问题的使用不足，并且在使用时可能会包含质量较差的数据。为了应对这些挑战，我们开发并公开发布了调查文本半自动无响应检测 (SANDS)，这是一种基于语言理解双向转换器模型的项目无响应检测方法，使用简单对比句子嵌入和有针对性的人类编码进行微调，将开放式文本数据分类为有效或可能无响应。这种方法的强大之处在于，它使用自然语言处理，而不是现有的无响应检测方法，这些方法完全依赖于规则或正则表达式，或者使用词袋方法，而这些方法往往在短文本、拼写错误或文本中表现不佳。不常见的词语，通常在开放文本调查数据中普遍存在。本文介绍了 SANDS 的发展，并使用一系列网络调查的开放文本响应作为案例研究，对其性能和潜在偏差进行了定量评估。总体而言，SANDS 模型在识别用于定量或定性分析的可能有效结果的数据集方面表现良好，特别是在健康相关数据方面。 SANDS 模型专为普遍使用而开发，可供其他人访问，可以极大地提高识别不充分和不相关的开放文本响应的效率，为使用开放文本数据提供更多机会来为问题设计提供信息并提高调查数据质量。

更新日期：2024-05-10

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南