A Multi-objective Feature Selection Method Considering the Interaction Between Features
Information Systems Frontiers ( IF 6.9 ) Pub Date : 2024-03-09 , DOI: 10.1007/s10796-024-10481-2
Motahare Namakin , Modjtaba Rouhani , Mostafa Sabzekar

Feature selection (FS) is one of the major tasks in data cleansing step in machine learning. However, multi-objective FS is more challenging because it tries to optimize two conflicting objectives, namely minimizing the feature set and classification error. In this way, evolutionary algorithms are promising solutions aimed to obtain more reliable Pareto fronts. However, unfortunately they suffer from consuming much time due to exploration in a large search space. Another issue encountered in multi-objective FS approaches is related to the correlation between features. This challenge arises because choosing such features reduces the performance of the classification. To address these challenges, we introduce a multi-objective FS approach that makes several significant contributions. First, the proposed method deals with the correlation between features through a novel probability structure. Secondly, it relies on the Pareto Archived Evolution Strategy (PAES) method, which offers many advantages, including simplicity and its ability to explore the solution space at an acceptable speed. We enhance the PAES structure in a manner that promotes the intelligent generation of offsprings. Consequently, our proposed approach benefits from the introduced probability structure to generate more promising offspring. Lastly, it incorporates a novel strategy to guide the algorithm to find the optimal subset throughout the evolutionary process. The obtained results on real-world datasets reveal a substantial enhancement in the quality of the final solutions.



特征选择(FS)是机器学习中数据清理步骤的主要任务之一。然而,多目标FS更具挑战性,因为它试图优化两个相互冲突的目标,即最小化特征集和分类误差。通过这种方式,进化算法是一种很有前途的解决方案,旨在获得更可靠的帕累托前沿。然而,不幸的是,由于在大搜索空间中进行探索,他们花费了大量时间。多目标FS方法中遇到的另一个问题与特征之间的相关性有关。出现这一挑战是因为选择此类特征会降低分类的性能。为了应对这些挑战,我们引入了多目标 FS 方法,该方法做出了多项重大贡献。首先,所提出的方法通过一种新颖的概率结构处理特征之间的相关性。其次,它依赖于帕累托存档进化策略(PAES)方法,该方法具有许多优点,包括简单性以及以可接受的速度探索解决方案空间的能力。我们以促进后代智能生成的方式增强 PAES 结构。因此,我们提出的方法受益于引入的概率结构,以产生更有前途的后代。最后,它采用了一种新颖的策略来指导算法在整个进化过程中找到最佳子集。在现实世界数据集上获得的结果表明,最终解决方案的质量得到了显着提高。
