当前位置:
X-MOL 学术
›
IEEE Trans. Softw. Eng.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
FairBalance: How to Achieve Equalized Odds With Data Pre-Processing
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2024-07-22 , DOI: 10.1109/tse.2024.3431445 Zhe Yu 1 , Joymallya Chakraborty 2 , Tim Menzies 3
IEEE Transactions on Software Engineering ( IF 6.5 ) Pub Date : 2024-07-22 , DOI: 10.1109/tse.2024.3431445 Zhe Yu 1 , Joymallya Chakraborty 2 , Tim Menzies 3
Affiliation
This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive demographic groups—satisfying equalized odds. Different from prior works which either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition; this work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at https://github.com/hil-se/FairBalance
.
中文翻译:
FairBalance:如何通过数据预处理实现赔率均等
这项研究旨在通过提供一种简单而有效的预处理方法来实现机器学习软件中的均等赔率公平性,从而使软件工程协会受益。由于机器学习软件越来越多地用于高风险和高风险的决策,公平问题引起了越来越多的关注。所有软件开发人员都有责任确保他们的软件负责任,确保机器学习软件在不同的敏感人口群体上不会有不同的表现,从而满足平等的机会。与之前的工作不同,之前的工作要么像黑匣子一样在学习过程中优化均衡赔率相关指标,要么按照直觉操纵训练数据;这项工作研究了违反均等赔率的根本原因以及解决方法。我们发现,在不修改正常训练过程的情况下,通过样本权重均衡每个人口统计群体的类别分布是实现均等赔率的必要条件。此外,当类别分布加权不仅相等而且平衡(1:1)时,可以保证均等赔率的重要部分条件(平均赔率差为零)。基于这些分析,我们提出了 FairBalance,这是一种预处理算法,通过为训练数据分配计算的权重来平衡每个人口统计群体中的类别分布。在八个真实数据集上,我们的实证结果表明,在低计算开销的情况下,所提出的预处理算法 FairBalance 可以显着提高均衡赔率,而不会对效用造成太大损害(如果有的话)。 FairBalance 在均等赔率方面也优于现有的最先进方法。 为了促进重用、复制和验证,我们在 https://github.com/hil-se/FairBalance 上提供了脚本。
更新日期:2024-07-22
中文翻译:
FairBalance:如何通过数据预处理实现赔率均等
这项研究旨在通过提供一种简单而有效的预处理方法来实现机器学习软件中的均等赔率公平性,从而使软件工程协会受益。由于机器学习软件越来越多地用于高风险和高风险的决策,公平问题引起了越来越多的关注。所有软件开发人员都有责任确保他们的软件负责任,确保机器学习软件在不同的敏感人口群体上不会有不同的表现,从而满足平等的机会。与之前的工作不同,之前的工作要么像黑匣子一样在学习过程中优化均衡赔率相关指标,要么按照直觉操纵训练数据;这项工作研究了违反均等赔率的根本原因以及解决方法。我们发现,在不修改正常训练过程的情况下,通过样本权重均衡每个人口统计群体的类别分布是实现均等赔率的必要条件。此外,当类别分布加权不仅相等而且平衡(1:1)时,可以保证均等赔率的重要部分条件(平均赔率差为零)。基于这些分析,我们提出了 FairBalance,这是一种预处理算法,通过为训练数据分配计算的权重来平衡每个人口统计群体中的类别分布。在八个真实数据集上,我们的实证结果表明,在低计算开销的情况下,所提出的预处理算法 FairBalance 可以显着提高均衡赔率,而不会对效用造成太大损害(如果有的话)。 FairBalance 在均等赔率方面也优于现有的最先进方法。 为了促进重用、复制和验证,我们在 https://github.com/hil-se/FairBalance 上提供了脚本。