Fusing In-Storage and Near-Storage Acceleration of Convolutional Neural Networks,ACM Journal on Emerging Technologies in Computing Systems

当前位置： X-MOL 学术 › ACM J. Emerg. Technol. Comput. Syst. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Fusing In-Storage and Near-Storage Acceleration of Convolutional Neural Networks
ACM Journal on Emerging Technologies in Computing Systems ( IF 2.1 ) Pub Date : 2023-06-17 , DOI: https://dl.acm.org/doi/10.1145/3597496
Ikenna Okafor, Akshay Krishna Ramanathan, Nagadastagiri Reddy Challapalle, Zheyu Li, Vijaykrishnan Narayanan

Video analytics have a wide range of applications and has attracted much interest over the years. While it can be both computationally and energy intensive, video analytics can greatly benefit from in/ near memory compute. The practice of moving compute closer to memory has continued to show improvements to performance and energy consumption and is seeing increasing adoption. Recent advancements in solid state drives (SSDs) have incorporated near memory Field Programmable Gate Arrays (FPGAs) with shared access to the drive’s storage cells. These near memory FPGAs are capable of running operations required by video analytic pipelines such as object detection and template matching. These operations are typically executed using Convolutional Neural Networks (CNNs). A CNN is composed of multiple individually processed layers which perform various image processing tasks. Due to lack of resources, a layer may be partitioned into more manageable sub-layers. These sub-layers are then processed sequentially, however some sub-layers can be processed simultaneously. Moreover, the storage cells within FPGA equipped SSD’s are capable of being augmented with in-storage compute to accelerate CNN workloads and exploit the intra parallelism within a CNN layer. To this end we present our work, which leverages heterogeneous architectures to create an in/near-storage acceleration solution for video analytics. We designed a NAND flash accelerator, and an FPGA accelerator, then mapped and evaluated several CNN benchmarks. We show how to utilize FPGAs, local DRAMs, and in-memory SSD compute to accelerate CNN workloads. Our work also demonstrates how to remove unnecessary memory transfers to save latency and energy.

中文翻译：

融合卷积神经网络的存储内和近存储加速

视频分析具有广泛的应用范围，多年来引起了人们的广泛兴趣。虽然视频分析可能是计算密集型和能源密集型的，但它可以极大地受益于内存中/近内存计算。将计算移近内存的做法不断显示出性能和能耗方面的改进，并且越来越多的采用。固态硬盘 (SSD) 的最新进展已将近内存现场可编程门阵列 (FPGA) 与驱动器存储单元的共享访问结合起来。这些近内存 FPGA 能够运行视频分析管道所需的操作，例如对象检测和模板匹配。这些操作通常使用卷积神经网络 (CNN) 执行。CNN 由多个单独处理的层组成，这些层执行各种图像处理任务。由于缺乏资源，一个层可以被划分为更易于管理的子层。然后顺序处理这些子层，然而可以同时处理一些子层。此外，配备 FPGA 的 SSD 内的存储单元能够通过存储内计算进行增强，以加速 CNN 工作负载并利用 CNN 层内的内部并行性。为此，我们展示了我们的工作，它利用异构架构来创建用于视频分析的存储内/近存储加速解决方案。我们设计了一个 NAND 闪存加速器和一个 FPGA 加速器，然后映射并评估了多个 CNN 基准测试。我们展示了如何利用 FPGA、本地 DRAM 和内存 SSD 计算来加速 CNN 工作负载。

更新日期：2023-06-19

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11