当前位置: X-MOL 学术Communication Methods and Measures › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Quadrilogy for (Big) Data Reliabilities
Communication Methods and Measures ( IF 11.4 ) Pub Date : 2021-06-27 , DOI: 10.1080/19312458.2020.1861592
Klaus Krippendorff 1
Affiliation  

ABSTRACT

This paper responds to the challenge of testing the reliabilities of really big data and proposes a quadrilogy of four measures of the reliability of data, applicable quite generally. These measures grew out of the recognition that crowd coded data contest big data scientists’ conviction that the social contexts and meanings of data become irrelevant in the face of their sheer volumes. Bigness has also challenged available inter–coder agreement coefficients and available software, which are either too restricted regarding the forms of data they accept or exceed computational limits when data become very large. In the course of tailoring Krippendorff’s alpha to very large data, the possibility emerged of dividing the concept of reliability into four separate kinds, serving different methodological aims in social research. They respectively assess the replicability of the process of generating data, the accuracy of generating data, the surrogacy of proposed theories, coders, formulas, or algorithms to serve as a substitute for human coders, and the decisiveness among several human judgements. Their mathematical relationships assure comparability. The paper develops this quadrilogy of agreement measures first for binary data, provides a link to software for computing it, but then extends it to nominal data – a first step towards further generalizations. It also proposes a computational path to estimate the confidence limits for each of these measures and the probabilities of accepting data as reliable when there is a chance of being below a tolerable level. It ends with a discussion of how to select reliability benchmarks appropriate for the quadrilogy of agreement measures.



中文翻译:

(大)数据可靠性的四元组

摘要

本文针对真正大数据可靠性测试的挑战,提出了数据可靠性的四项衡量标准,适用性非常广泛。这些措施源于这样一种认识,即人群编码数据与大数据科学家的信念相抗衡,即数据的社会背景和意义在其庞大的数量面前变得无关紧要。Bigness 还挑战了可用的编码器间一致性系数和可用的软件,它们要么在接受的数据形式方面过于受限,要么在数据变得非常大时超出计算限制。在将 Krippendorff 的 alpha 调整为非常大的数据的过程中,出现了将可靠性概念分为四种不同类型的可能性,以服务于社会研究中的不同方法论目标。他们分别评估生成数据过程的可复制性、生成数据的准确性、提出的理论、编码器、公式或算法的替代性以替代人类编码器,以及几种人类判断之间的决定性。它们的数学关系确保了可比性。该论文首先针对二进制数据开发了这种一致性度量的四边形,提供了一个用于计算它的软件的链接,然后将其扩展到名义数据——这是进一步概括的第一步。它还提出了一个计算路径来估计这些度量中的每一个的置信限以及当有可能低于可容忍水平时接受数据为可靠的概率。

更新日期:2021-08-26
down
wechat
bug