当前位置: X-MOL 学术Inf. Syst. Front. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning
Information Systems Frontiers ( IF 6.9 ) Pub Date : 2024-08-29 , DOI: 10.1007/s10796-024-10533-7
Riyaz Sikora , Yoon Sang Lee

Imbalanced data sets are a growing problem in data mining and business analytics. However, the ability of machine learning algorithms to predict the minority class deteriorates in the presence of class imbalance. Although there have been many approaches that have been studied in literature to tackle the imbalance problem, most of these approaches have been met with limited success. In this study, we propose three methods based on a wrapper approach that combine the use of under-sampling with ensemble learning to improve the performance of standard data mining algorithms. We test our ensemble methods on 10 data sets collected from the UCI repository with an imbalance ratio of at least 70%. We compare their performance with two other traditional techniques for dealing with the imbalance problem and show significant improvement in the recall, AUROC, and the average of precision and recall.



中文翻译:


类不平衡问题:使用欠采样和集成学习的基于包装的方法



不平衡的数据集是数据挖掘和业务分析中日益严重的问题。然而,在类别不平衡的情况下,机器学习算法预测少数类别的能力会下降。尽管文献中已经研究了许多方法来解决不平衡问题,但大多数方法都取得了有限的成功。在本研究中,我们提出了三种基于包装方法的方法,将欠采样与集成学习的使用结合起来,以提高标准数据挖掘算法的性能。我们在从 UCI 存储库收集的 10 个数据集上测试我们的集成方法,不平衡率至少为 70%。我们将它们的性能与其他两种处理不平衡问题的传统技术进行了比较,结果显示在召回率、AUROC 以及准确率和召回率的平均值方面有显着的改进。

更新日期:2024-08-29
down
wechat
bug