当前位置: X-MOL 学术Inform. Fusion › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DSAP: Analyzing bias through demographic comparison of datasets
Information Fusion ( IF 14.7 ) Pub Date : 2024-10-29 , DOI: 10.1016/j.inffus.2024.102760
Iris Dominguez-Catena, Daniel Paternain, Mikel Galar

In the last few years, Artificial Intelligence (AI) systems have become increasingly widespread. Unfortunately, these systems can share many biases with human decision-making, including demographic biases. Often, these biases can be traced back to the data used for training, where large uncurated datasets have become the norm. Despite our awareness of these biases, we still lack general tools to detect, quantify, and compare them across different datasets. In this work, we propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of datasets. First, DSAP uses existing demographic estimation models to extract a dataset’s demographic profile. Second, it applies a similarity metric to compare the demographic profiles of different datasets. While these individual components are well-known, their joint use for demographic dataset comparison is novel and has not been previously addressed in the literature. This approach allows three key applications: the identification of demographic blind spots and bias issues across datasets, the measurement of demographic bias, and the assessment of demographic shifts over time. DSAP can be used on datasets with or without explicit demographic information, provided that demographic information can be derived from the samples using auxiliary models, such as those for image or voice datasets. To show the usefulness of the proposed methodology, we consider the Facial Expression Recognition task, where demographic bias has previously been found. The three applications are studied over a set of twenty datasets with varying properties. The code is available at https://github.com/irisdominguez/DSAP.

中文翻译:


DSAP:通过数据集的人口统计比较分析偏差



在过去的几年里,人工智能 (AI) 系统变得越来越普遍。不幸的是,这些系统可能与人类决策有许多共同的偏见,包括人口偏见。通常,这些偏差可以追溯到用于训练的数据,其中大型非精选数据集已成为常态。尽管我们意识到这些偏差,但我们仍然缺乏通用工具来检测、量化和比较不同数据集中的偏差。在这项工作中,我们提出了 DSAP (Demographic Similarity from Auxiliary Profiles),这是一种比较数据集人口构成的两步方法。首先,DSAP 使用现有的人口统计估计模型来提取数据集的人口统计概况。其次,它应用相似性指标来比较不同数据集的人口统计资料。虽然这些单独的组成部分是众所周知的,但它们在人口统计数据比较中的共同用途是新颖的,以前在文献中没有涉及过。这种方法允许三个关键应用:识别数据集中的人口盲点和偏见问题、人口偏见的测量以及评估人口随时间的变化。DSAP 可用于具有或不具有明确人口统计信息的数据集,前提是可以使用辅助模型(例如图像或语音数据集的模型)从样本中获取人口统计信息。为了显示所提出的方法的有用性,我们考虑了面部表情识别任务,该任务之前已经发现了人口统计学偏差。这三个应用程序在一组 20 个具有不同属性的数据集上进行了研究。该代码可在 https://github.com/irisdominguez/DSAP 获取。
更新日期:2024-10-29
down
wechat
bug