Synthetic and privacy-preserving traffic trace generation using generative AI models for training Network Intrusion Detection Systems,Journal of Network and Computer Applications

当前位置： X-MOL 学术 › J. Netw. Comput. Appl. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Synthetic and privacy-preserving traffic trace generation using generative AI models for training Network Intrusion Detection Systems
Journal of Network and Computer Applications ( IF 7.7 ) Pub Date : 2024-06-20 , DOI: 10.1016/j.jnca.2024.103926
Giuseppe Aceto , Fabio Giampaolo , Ciro Guida , Stefano Izzo , Antonio Pescapè , Francesco Piccialli , Edoardo Prezioso

Network Intrusion Detection Systems (NIDS) are crucial tools for protecting networked devices from cyberattacks. Recent development in the field of Artificial Intelligence (AI) has provided tremendous advantages in implementing NIDSs able to monitor network traffic and block cyberattacks in real-time. In the literature, it is widely recognized that the effective training of a NIDS requires a large quantity of labeled traffic, representative of attacks. Nonetheless, the availability of public and abundant datasets remains remarkably restricted due to the cost of gathering and labeling real traffic traces and privacy concerns for sharing them. To tackle these challenges, in this paper we present a generative AI model capable of synthesizing anonymized traffic traces from real ones, thus dealing with privacy, abundance, and representativeness. The proposal is based on a Conditional Variational Autoencoder (CVAE) and a preprocessing procedure specifically designed for the generation of new traffic traces. To validate our solution, we conduct an extensive empirical study leveraging three recent and publicly-available datasets, containing benign and malicious traffic. The validation is carried out from both the perspectives of classification performance of a robust NIDS and the quality of synthetic data, in comparison to the utilization of real data. We compare our CVAE with two state-of-the-art AI-based traffic data generators and prove that, trained with traces emitted by our generative model, a NIDS has a limited F1-score loss compared to training on real data; competing models instead struggle or fail to generate traces that are as effective for NIDS training and as statistically similar to the original. We make the synthetic datasets available in both PCAP and tabular formats, to facilitate the reproducibility of our findings and encourage further exploration in the field of generative AI for networking.

中文翻译：

使用生成式 AI 模型生成综合且保护隐私的流量跟踪，用于训练网络入侵检测系统

网络入侵检测系统 (NIDS) 是保护联网设备免受网络攻击的重要工具。人工智能 (AI) 领域的最新发展为实施能够实时监控网络流量和阻止网络攻击的 NIDS 提供了巨大的优势。在文献中，人们普遍认为 NIDS 的有效训练需要大量代表攻击的标记流量。尽管如此，由于收集和标记真实交通痕迹的成本以及共享这些数据的隐私问题，公共和丰富的数据集的可用性仍然受到明显限制。为了应对这些挑战，在本文中，我们提出了一种生成人工智能模型，能够从真实流量中合成匿名流量轨迹，从而处理隐私、丰富性和代表性。该提案基于条件变分自动编码器（CVAE）和专门为生成新流量轨迹而设计的预处理程序。为了验证我们的解决方案，我们利用三个最新的公开数据集（包含良性和恶意流量）进行了广泛的实证研究。与真实数据的利用相比，验证是从稳健的 NIDS 的分类性能和合成数据的质量两个角度进行的。我们将 CVAE 与两个最先进的基于人工智能的交通数据生成器进行比较，并证明，通过我们的生成模型发出的痕迹进行训练，与真实数据训练相比，NIDS 的 F1 分数损失有限；相反，竞争模型很难或无法生成对 NIDS 训练同样有效并且在统计上与原始模型相似的轨迹。我们以 PCAP 和表格格式提供合成数据集，以促进我们研究结果的可重复性，并鼓励在网络生成人工智能领域进行进一步探索。

更新日期：2024-06-20

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>