Nebula: Self-Attention for Dynamic Malware Analysis,IEEE Transactions on Information Forensics and Security

当前位置： X-MOL 学术 › IEEE Trans. Inform. Forensics Secur. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Nebula: Self-Attention for Dynamic Malware Analysis
IEEE Transactions on Information Forensics and Security ( IF 6.3 ) Pub Date : 2024-06-06 , DOI: 10.1109/tifs.2024.3409083
Dmitrijs Trizna ₁ , Luca Demetrio ₂ , Battista Biggio ₃ , Fabio Roli ₂

Affiliation

Dynamic analysis enables detecting Windows malware by executing programs in a controlled environment and logging their actions. Previous work has proposed training machine learning models, i.e., convolutional and long short-term memory networks, on homogeneous input features like runtime APIs to either detect or classify malware, neglecting other relevant information coming from heterogeneous data like network and file operations. To overcome these issues, we introduce Nebula, a versatile, self-attention Transformer-based neural architecture that generalizes across different behavioral representations and formats, combining diverse information from dynamic log reports. Nebula is composed by several components needed to tokenize, filter, normalize and encode data to feed the transformer architecture. We firstly perform a comprehensive ablation study to evaluate their impact on the performance of the whole system, highlighting which components can be used as-is, and which must be enriched with specific domain knowledge. We perform extensive experiments on both malware detection and classification tasks, using three datasets acquired from different dynamic analyses platforms, show that, on average, Nebula outperforms state-of-the-art models at low false positive rates, with a peak of 12% improvement. Moreover, we showcase how self-supervised learning pre-training matches the performance of fully-supervised models with only 20% of training data, and we inspect the output of Nebula through explainable AI techniques, pinpointing how attention is focusing on specific tokens correlated to malicious activities of malware families. To foster reproducibility, we open-source our findings and models at https://github.com/dtrizna/nebula .

中文翻译：

Nebula：动态恶意软件分析的自我关注

动态分析可以通过在受控环境中执行程序并记录其操作来检测 Windows 恶意软件。之前的工作提出了在运行时 API 等同质输入特征上训练机器学习模型，即卷积和长短期记忆网络，以检测或分类恶意软件，而忽略来自网络和文件操作等异构数据的其他相关信息。为了克服这些问题，我们引入了 Nebula，这是一种基于 Transformer 的多功能、自我关注的神经架构，它可以概括不同的行为表示和格式，结合动态日志报告中的各种信息。 Nebula 由多个组件组成，这些组件需要对数据进行标记、过滤、规范化和编码，以馈送 Transformer 架构。我们首先进行全面的消融研究，以评估它们对整个系统性能的影响，强调哪些组件可以按原样使用，哪些组件必须通过特定领域知识来丰富。我们使用从不同动态分析平台获取的三个数据集对恶意软件检测和分类任务进行了广泛的实验，结果表明，平均而言，Nebula 在误报率较低的情况下优于最先进的模型，峰值为 12%改进。此外，我们展示了自监督学习预训练如何使用仅 20% 的训练数据来匹配完全监督模型的性能，并且我们通过可解释的 AI 技术检查 Nebula 的输出，确定注意力如何集中在与恶意软件家族的恶意活动。为了提高可重复性，我们在 https://github.com/dtrizna/nebula 开源了我们的发现和模型。

更新日期：2024-06-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>