<br>识别来自多种架构的二进制代码的编译器和优化级别,IEEE Access

当前位置： X-MOL 学术 › IEEE Access › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

识别来自多种架构的二进制代码的编译器和优化级别
IEEE Access ( IF 3.4 ) Pub Date : 2021-12-06 , DOI: 10.1109/access.2021.3132950
Davide Pizzolotto , Katsuro Inoue

编译本机应用程序时，可以配置不同的编译器标志或优化级别。这个选择取决于不同的要求。例如，如果应用程序二进制文件打算用于最终版本，则应针对执行速度和效率设置标志和优化设置。或者，如果应用程序用于调试目的，则应相应配置调试标志，通常涉及少量代码优化或不涉及代码优化。但是，无法轻松地从编译的二进制文件中提取此信息。尽管如此，在比较不同的二进制文件时，确保相同的编译器和编译标志尤其重要，以避免分析不准确或不可靠。不幸的是，要了解使用了哪些标志和优化，需要深入了解目标体系结构和所使用的编译器。在本研究中，我们提出了两种深度学习模型，用于检测已编译二进制文件中的编译器和优化级别。我们研究的优化级别是 x86_64、AArch64、RISC-V、SPARC、PowerPC、MIPS 和 ARM 架构中的 O0、O1、O2、O3 和 Os。另外，对于x86_64和AArch64架构，我们还判断编译器是GCC还是Clang。我们创建了包含超过 76000 个二进制文件的数据集并将其用于训练。我们的实验表明，检测编译器的准确度超过 99.95%，检测优化级别的准确度在 92% 到 98% 之间（具体取决于架构）。此外，我们还分析了数据量极其有限时准确率的变化。我们的研究表明，可以以函数级粒度准确检测编译器标志设置和优化级别。

"点击查看英文标题和摘要"

Identifying Compiler and Optimization Level in Binary Code From Multiple Architectures

While compiling a native application, different compiler flags or optimization levels can be configured. This choice depends on the different requirements. For example, if the application binary is intended for final release, the flags and optimization settings should be set for execution speed and efficiency. Alternatively, if the application is to be used for debugging purposes, debug flags should be configured accordingly, usually involving minor or no code optimization. However, this information cannot be easily extracted from a compiled binary. Nonetheless, ensuring the same compiler and compilation flags is particularly important when comparing different binary files, to avoid inaccurate or unreliable analyses. Unfortunately, to understand which flags and optimizations have been used, a deep knowledge of the target architecture and the compiler used is required. In this study, we present two deep learning models used to detect both compiler and optimization level in a compiled binary. The optimization levels we study are O0, O1, O2, O3, and Os in the x86_64, AArch64, RISC-V, SPARC, PowerPC, MIPS, and ARM architectures. In addition, for the x86_64 and AArch64 architectures, we also determine whether the compiler is GCC or Clang. We created a dataset of more than 76000 binaries and used it for training. Our experiments showed over 99.95% accuracy in detecting the compiler and between 92% to 98%, depending on the architecture, in detecting the optimization level. Furthermore, we analyzed the change in accuracy when the amount of data was extremely limited. Our study shows that it is possible to accurately detect both compiler flag settings and optimization levels with function-level granularity.

更新日期：2021-12-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南