当前位置:
X-MOL 学术
›
Future Gener. Comput. Syst.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree-based classifiers
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-06-29 , DOI: 10.1016/j.future.2024.06.051 Tommaso Zoppi , Stefano Gazzini , Andrea Ceccarelli
Future Generation Computer Systems ( IF 6.2 ) Pub Date : 2024-06-29 , DOI: 10.1016/j.future.2024.06.051 Tommaso Zoppi , Stefano Gazzini , Andrea Ceccarelli
Recent years have seen a growing involvement of researchers and practitioners in crafting Deep Neural Networks (DNNs) that seem to outperform existing machine learning approaches for solving classification problems as anomaly-based error and intrusion detection. Undoubtedly, classifiers may be very diverse among themselves, and choosing one or another is typically due to the specific task and target system. Designing and training the optimal tabular data classifier requires extensive experimentation, sensitivity analyses, big datasets, and domain-specific knowledge that may not be available at will or considered a non-strategical asset by many companies and stakeholders. This paper compares, using a total of 23 public datasets: i) traditional (tree-based, statistical) supervised classifiers, ii) DNNs that are specifically designed for classifying tabular data, iii) DNNs for image classification that are applied to tabular data after converting data points into images, alone and as ensembles. Experimental results and related discussions show clear advantages in adopting tree-based classifiers for anomaly-based error and intrusion detection in tabular data as they outperform their competitors, including DNNs. Then, individual classifiers are compared against ensembles using different combinations of the classifiers considered in this study as base-learners, providing a unified final response through many meta-learning strategies. Results show that there is no benefit in building ensembles instead of using a tree-based classifier as Random Forests, eXtreme Gradient Boosting or Extra Trees. The paper concludes that anomaly-based error and intrusion detectors for critical systems should use the old (but gold) tree-based classifiers, which are also easier to fine-tune, and understand; plus, they require less time and resources to learn their model.
中文翻译:
表格数据中基于异常的错误和入侵检测:没有 DNN 优于基于树的分类器
近年来,研究人员和从业者越来越多地参与构建深度神经网络 (DNN),这些网络在解决基于异常的错误和入侵检测等分类问题方面似乎优于现有的机器学习方法。毫无疑问,分类器之间可能非常不同,并且选择一个或另一个通常是由于特定的任务和目标系统。设计和训练最佳的表格数据分类器需要大量的实验、敏感性分析、大数据集和特定领域的知识,这些知识可能无法随意获得,或者被许多公司和利益相关者视为非战略资产。本文总共使用 23 个公共数据集进行比较:i) 传统的(基于树的统计)监督分类器,ii) 专门为分类表格数据而设计的 DNN,iii) 用于图像分类的 DNN,之后应用于表格数据将数据点单独或作为整体转换为图像。实验结果和相关讨论表明,采用基于树的分类器进行表格数据中基于异常的错误和入侵检测具有明显的优势,因为它们的性能优于包括 DNN 在内的竞争对手。然后,使用本研究中被视为基础学习器的分类器的不同组合,将单个分类器与整体进行比较,通过许多元学习策略提供统一的最终响应。结果表明,与使用基于树的分类器(如随机森林、极限梯度提升或额外树)相比,构建集成并没有任何好处。 该论文的结论是,关键系统的基于异常的错误和入侵检测器应该使用旧的(但黄金)基于树的分类器,这也更容易微调和理解;此外,他们需要更少的时间和资源来学习他们的模型。
更新日期:2024-06-29
中文翻译:
表格数据中基于异常的错误和入侵检测:没有 DNN 优于基于树的分类器
近年来,研究人员和从业者越来越多地参与构建深度神经网络 (DNN),这些网络在解决基于异常的错误和入侵检测等分类问题方面似乎优于现有的机器学习方法。毫无疑问,分类器之间可能非常不同,并且选择一个或另一个通常是由于特定的任务和目标系统。设计和训练最佳的表格数据分类器需要大量的实验、敏感性分析、大数据集和特定领域的知识,这些知识可能无法随意获得,或者被许多公司和利益相关者视为非战略资产。本文总共使用 23 个公共数据集进行比较:i) 传统的(基于树的统计)监督分类器,ii) 专门为分类表格数据而设计的 DNN,iii) 用于图像分类的 DNN,之后应用于表格数据将数据点单独或作为整体转换为图像。实验结果和相关讨论表明,采用基于树的分类器进行表格数据中基于异常的错误和入侵检测具有明显的优势,因为它们的性能优于包括 DNN 在内的竞争对手。然后,使用本研究中被视为基础学习器的分类器的不同组合,将单个分类器与整体进行比较,通过许多元学习策略提供统一的最终响应。结果表明,与使用基于树的分类器(如随机森林、极限梯度提升或额外树)相比,构建集成并没有任何好处。 该论文的结论是,关键系统的基于异常的错误和入侵检测器应该使用旧的(但黄金)基于树的分类器,这也更容易微调和理解;此外,他们需要更少的时间和资源来学习他们的模型。