当前位置:
X-MOL 学术
›
Int. J. Intell. Syst.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Ensemble feature selection for multi-label text classification: An intelligent order statistics approach
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2022-09-02 , DOI: 10.1002/int.23044 Mohsen Miri, Mohammad Bagher Dowlatshahi, Amin Hashemi, Marjan Kuchaki Rafsanjani, Brij B. Gupta, W. Alhalabi
International Journal of Intelligent Systems ( IF 5.0 ) Pub Date : 2022-09-02 , DOI: 10.1002/int.23044 Mohsen Miri, Mohammad Bagher Dowlatshahi, Amin Hashemi, Marjan Kuchaki Rafsanjani, Brij B. Gupta, W. Alhalabi
Because of the overgrowth of data, especially in text format, the value and importance of multi-label text classification have increased. Aside from this, preprocessing and particularly intelligent feature selection (FS) are the most important step in classification. Each FS finds the best features based on its approach, but we try to use a multi-strategy approach to find more useful features. Evaluating and comparing features’ importance and relevance makes using multiple strategy and methods more suitable than conventional approaches because each feature is measured based on several perspectives. Nevertheless, the ensemble FS merges the final performance results of various methods to take advantage of different methods’ strengths and better classify. In this article, we have proposed an ensemble FS method for multi-label text data (MLTD) for the first time using the order statistics (EMFS) approach. We have utilized four multi-label FS (MLFS) algorithms with various particular performances to achieve a good result. In this method, as one of the most important statistics methods, Order Statistics was used to aggregate the ranks of different algorithms, which is robust against noise, redundant and inessential features. In the end, the performance of EMFS, executing six MLTDs, was evaluated according to six performance criteria (ranking-based and classification-based). Surprisingly, the proposed method was more accurate than others among all used MLTDs. The proposed method has improved by 1.5% compared to other methods. This value is based on the results obtained based on six evaluation criteria and all tested data sets.
中文翻译:
多标签文本分类的集成特征选择:一种智能顺序统计方法
由于数据的过度增长,尤其是文本格式的数据,多标签文本分类的价值和重要性已经增加。除此之外,预处理和特别是智能特征选择 (FS) 是分类中最重要的步骤。每个 FS 都根据其方法找到最佳特征,但我们尝试使用多策略方法来寻找更有用的特征。评估和比较特征的重要性和相关性使得使用多种策略和方法比传统方法更合适,因为每个特征都是基于多个角度来衡量的。然而,集成 FS 合并了各种方法的最终性能结果,以利用不同方法的优势并更好地分类。在本文中,我们首次使用顺序统计 (EMFS) 方法提出了一种用于多标签文本数据 (MLTD) 的集成 FS 方法。我们使用了四种具有各种特定性能的多标签 FS (MLFS) 算法来取得良好的结果。在该方法中,顺序统计作为最重要的统计方法之一,用于聚合不同算法的排名,对噪声、冗余和无关紧要的特征具有鲁棒性。最后,根据六个性能标准(基于排名和基于分类)评估了执行六个 MLTD 的 EMFS 的性能。令人惊讶的是,在所有使用的 MLTD 中,所提出的方法比其他方法更准确。与其他方法相比,所提出的方法提高了 1.5%。
更新日期:2022-09-02
中文翻译:
多标签文本分类的集成特征选择:一种智能顺序统计方法
由于数据的过度增长,尤其是文本格式的数据,多标签文本分类的价值和重要性已经增加。除此之外,预处理和特别是智能特征选择 (FS) 是分类中最重要的步骤。每个 FS 都根据其方法找到最佳特征,但我们尝试使用多策略方法来寻找更有用的特征。评估和比较特征的重要性和相关性使得使用多种策略和方法比传统方法更合适,因为每个特征都是基于多个角度来衡量的。然而,集成 FS 合并了各种方法的最终性能结果,以利用不同方法的优势并更好地分类。在本文中,我们首次使用顺序统计 (EMFS) 方法提出了一种用于多标签文本数据 (MLTD) 的集成 FS 方法。我们使用了四种具有各种特定性能的多标签 FS (MLFS) 算法来取得良好的结果。在该方法中,顺序统计作为最重要的统计方法之一,用于聚合不同算法的排名,对噪声、冗余和无关紧要的特征具有鲁棒性。最后,根据六个性能标准(基于排名和基于分类)评估了执行六个 MLTD 的 EMFS 的性能。令人惊讶的是,在所有使用的 MLTD 中,所提出的方法比其他方法更准确。与其他方法相比,所提出的方法提高了 1.5%。