RoboVisio: A Micro-Robot Vision Domain-Specific SoC for Autonomous Navigation Enabling Fully-on-Chip Intelligence via 2-MB eMRAM
IEEE Journal of Solid-State Circuits ( IF 4.6 ) Pub Date : 2024-02-27 , DOI: 10.1109/jssc.2024.3368350
Qirui Zhang 1 , Zichen Fan 1 , Hyochan An 2 , Zhehong Wang 3 , Ziyun Li 3 , Guanru Wang 4 , Pierre Abillama 1 , Hun-Seok Kim 1 , David Blaauw 1 , Dennis Sylvester 1

This article presents RoboVisio, an efficient and highly flexible domain-specific system-on-chip (SoC) for vision tasks in fully autonomous micro-robot navigation. A novel hybrid processing element (PE) is proposed, in which classic vision tasks achieve high efficiency by using a 2-D-mapping architecture, while convolutional neural network (CNN) is executed in an efficient output-channel-parallel systolic manner. Combining both processing schemes into a single PE array future-proofs the architecture, facilitating next-generation CNN-heavy vision algorithms, while saving 40% area and leakage with no power overhead and throughput loss, compared with two separate array implementations. To further improve energy and area efficiency, the design incorporates a number of key features: 1) 2-MB magnetoresistive random access memory (MRAM) for non-volatile fully-on-chip weight storage; 2) a unified image-activation memory (IAMEM) with block-swapping-based input/output image buffering that reduces buffer footprint by 50% and eliminates data copy for multi-frame buffering; and 3) a combination of weight buffering and CNN loop ordering that reduces weight memory system power by 75%. Fabricated in 22-nm CMOS, the design achieves 0.22 nJ/pixels for Harris corner feature detection (a classic or non-CNN vision task) and 3.5 TOPS/W (16-bit OP) for CNN, a 40%–170% efficiency improvement over state-of-the-art edge machine learning (ML) SoCs using non-volatile memory (NVM).


RoboVisio:用于自主导航的微型机器人视觉领域特定 SoC,通过 2 MB eMRAM 实现全片上智能

本文介绍了 RoboVisio,这是一种高效且高度灵活的特定领域片上系统 (SoC),用于完全自主的微型机器人导航中的视觉任务。提出了一种新颖的混合处理元件(PE),其中经典视觉任务通过使用二维映射架构实现高效率,而卷积神经网络(CNN)以高效的输出通道并行脉动方式执行。与两个单独的阵列实现相比,将两种处理方案组合到单个 PE 阵列中可以使架构面向未来,促进下一代 CNN 重视觉算法,同时节省 40% 的面积和泄漏,且没有功耗和吞吐量损失。为了进一步提高能源和面积效率,该设计融合了许多关键特性:1) 2MB 磁阻随机存取存储器 (MRAM),用于非易失性全片上重量存储; 2) 统一图像激活存储器 (IAMEM),具有基于块交换的输入/输出图像缓冲,可将缓冲区占用空间减少 50%,并消除多帧缓冲的数据复制; 3) 权重缓冲和 CNN 循环排序的组合,可将权重存储系统功耗降低 75%。该设计采用 22 nm CMOS 制造,Harris 角点特征检测(经典或非 CNN 视觉任务)实现了 0.22 nJ/像素,CNN 实现了 3.5 TOPS/W(16 位 OP),效率为 40%–170%相对于使用非易失性存储器 (NVM) 的最先进的边缘机器学习 (ML) SoC 进行了改进。