FLEX-CIM: A Flexible Kernel Size 1-GHz 181.6-TOPS/W 25.63-TOPS/mm2 Analog Compute-in-Memory Macro,IEEE Journal of Solid-State Circuits

当前位置： X-MOL 学术 › IEEE J. Solid-State Circuits › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

FLEX-CIM: A Flexible Kernel Size 1-GHz 181.6-TOPS/W 25.63-TOPS/mm2 Analog Compute-in-Memory Macro
IEEE Journal of Solid-State Circuits ( IF 4.6 ) Pub Date : 2024-04-15 , DOI: 10.1109/jssc.2024.3386192
Yuzhao Fu ₁ , Wei-Han Yu ₁ , Ka-Fai Un ₁ , Chi-Hang Chan ₁ , Yan Zhu ₁ , Minglei Zhang ₁ , Rui P. Martins ₁ , Pui-In Mak ₁

Affiliation

Compute-in-memory (CIM) is a promising approach for realizing energy-efficient convolutional neural network (CNN) accelerators. Previous CIM works demonstrated a high peak energy efficiency of over 100 TOPS/W, with larger fabrics of 1000+ channels. Yet, they typically suffer from low utilization for small CNN layers (e.g.,

$\sim $

9% for ResNet-32). It penalizes their average energy efficiency, throughput density, and effective memory size by the utilization rate. In addition, the analog-to-digital converter (ADC) occupies most of their computing time (

$\sim $

90%), further hindering the CIM’s throughput. This work presents an FLEX-CIM fabricated under 28-nm CMOS featuring: 1) an analog partial sum (APS) circuit to enable a flexible CIM Kernel size; 2) an overclocked fast multiply–accumulate array (FMA) to boost the throughput; and 3) an adaptive-resolution ADC to enhance the throughput and energy efficiency. The achieved utilization is 99.2% on ResNet-32. Under 4-bit MAC precision, the peak energy efficiency is 181.6 TOPS/W, and the peak throughput density is 25.63 TOPS/mm2.

中文翻译：

FLEX-CIM：灵活的内核大小 1GHz 181.6-TOPS/W 25.63-TOPS/mm2 模拟内存计算宏

内存计算 (CIM) 是实现节能卷积神经网络 (CNN) 加速器的一种很有前途的方法。之前的 CIM 工作展示了超过 100 TOPS/W 的高峰值能源效率，以及 1000 多个通道的更大结构。然而，它们通常会遭受小型 CNN 层利用率低的问题（例如， $\sim$ ResNet-32 为 9%）。它通过利用率来惩罚它们的平均能源效率、吞吐量密度和有效内存大小。此外，模数转换器（ADC）占据了它们的大部分计算时间（ $\sim$ 90%），进一步阻碍了 CIM 的吞吐量。这项工作提出了在 28 nm CMOS 下制造的 FLEX-CIM，具有以下特点：1) 模拟部分和 (APS) 电路，可实现灵活的 CIM 内核大小； 2) 超频快速乘法累加阵列 (FMA) 以提高吞吐量； 3) 自适应分辨率 ADC，用于提高吞吐量和能源效率。在 ResNet-32 上实现的利用率为 99.2%。 4位MAC精度下，峰值能效为181.6 TOPS/W，峰值吞吐密度为25.63 TOPS/mm2。

更新日期：2024-04-15

点击分享查看原文

点击收藏

阅读更多本刊新发论文本刊介绍/投稿指南