当前位置:
X-MOL 学术
›
New Media & Society
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
The social construction of datasets: On the practices, processes, and challenges of dataset creation for machine learning
New Media & Society ( IF 4.5 ) Pub Date : 2024-08-30 , DOI: 10.1177/14614448241251797 Will Orr 1 , Kate Crawford 1
New Media & Society ( IF 4.5 ) Pub Date : 2024-08-30 , DOI: 10.1177/14614448241251797 Will Orr 1 , Kate Crawford 1
Affiliation
Despite the critical role that datasets play in how systems make predictions and interpret the world, the dynamics of their construction are not well understood. Drawing on a corpus of interviews with dataset creators, we uncover the messy and contingent realities of dataset preparation. We identify four key challenges in constructing datasets, including balancing the benefits and costs of increasing dataset scale, limited access to resources, a reliance on shortcuts for compiling datasets and evaluating their quality, and ambivalence regarding accountability for a dataset. These themes illustrate the ways in which datasets are not objective or neutral but reflect the personal judgments and trade-offs of their creators within wider institutional dynamics, working within social, technical, and organizational constraints. We underscore the importance of examining the processes of dataset creation to strengthen an understanding of responsible practices for dataset development and care.
中文翻译:
数据集的社会构建:机器学习数据集创建的实践、流程和挑战
尽管数据集在系统如何进行预测和解释世界方面发挥着关键作用,但其构建的动态尚不清楚。利用对数据集创建者的采访语料库,我们揭示了数据集准备过程中混乱且偶然的现实。我们确定了构建数据集的四个关键挑战,包括平衡增加数据集规模的收益和成本、对资源的访问有限、对编译数据集和评估其质量的捷径的依赖,以及对数据集责任的矛盾心理。这些主题说明了数据集不是客观或中立的,而是反映了其创建者在更广泛的制度动态中、在社会、技术和组织限制下工作的个人判断和权衡。我们强调检查数据集创建过程的重要性,以加强对数据集开发和维护的负责任实践的理解。
更新日期:2024-08-30
中文翻译:
数据集的社会构建:机器学习数据集创建的实践、流程和挑战
尽管数据集在系统如何进行预测和解释世界方面发挥着关键作用,但其构建的动态尚不清楚。利用对数据集创建者的采访语料库,我们揭示了数据集准备过程中混乱且偶然的现实。我们确定了构建数据集的四个关键挑战,包括平衡增加数据集规模的收益和成本、对资源的访问有限、对编译数据集和评估其质量的捷径的依赖,以及对数据集责任的矛盾心理。这些主题说明了数据集不是客观或中立的,而是反映了其创建者在更广泛的制度动态中、在社会、技术和组织限制下工作的个人判断和权衡。我们强调检查数据集创建过程的重要性,以加强对数据集开发和维护的负责任实践的理解。