当前位置: X-MOL 学术IEEE J. Solid-State Circuits › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A 28-nm Energy-Efficient Sparse Neural Network Processor for Point Cloud Applications Using Block-Wise Online Neighbor Searching
IEEE Journal of Solid-State Circuits ( IF 4.6 ) Pub Date : 2024-04-26 , DOI: 10.1109/jssc.2024.3386878
Xiaoyu Feng 1 , Wenyu Sun 1 , Chen Tang 1 , Xinyuan Lin 1 , Jinshan Yue 2 , Huazhong Yang 1 , Yongpan Liu 1
Affiliation  

Voxel-based point cloud networks composed of multiple kinds of sparse convolutions (SCONVs) play an essential role in emerging applications such as autonomous driving and visual navigation. Many researchers have proposed sparse processors for image applications. However, they cannot properly deal with three problems in the point cloud, including low efficiency of random memory access, non-parallel neighbor search and area overhead of supporting hybrid operators, and unbalanced workload among multiple cores. In this work, a 2-D/3-D unified SCONV accelerator is proposed with three key features: a block-wise sparse data storage format supporting out-of-order memory allocation and continuous memory access; a high-throughput and reconfigurable SCONV core providing unified support for multiple kinds of sparse CNNs; an asynchronous and synchronous hybrid scheduler for multiple cores with dynamic on-chip memory router to maximize data reusing and core utilization. This chip is fabricated in 28-nm CMOS technology and achieves 4.68-TOPS/W peak energy efficiency, 2 $\times $ higher than the previous accelerator. It is also the first accelerator to provide unified 2-D/3-D support and end-to-end inference ability for voxel-based point cloud networks.

中文翻译:


用于使用分块在线邻居搜索的点云应用的 28 nm 节能稀疏神经网络处理器



由多种稀疏卷积(SCONV)组成的基于体素的点云网络在自动驾驶和视觉导航等新兴应用中发挥着重要作用。许多研究人员提出了用于图像应用的稀疏处理器。然而,它们无法正确处理点云中的三个问题,包括随机存储器访问效率低、非并行邻居搜索和支持混合算子的区域开销以及多个核之间的工作负载不平衡。在这项工作中,提出了一种2-D/3-D统一SCONV加速器,具有三个关键特征:支持无序内存分配和连续内存访问的分块稀疏数据存储格式;高吞吐量、可重构的SCONV核心,为多种稀疏CNN提供统一支持;用于多核的异步和同步混合调度程序,具有动态片上内存路由器,可最大限度地提高数据重用和核心利用率。该芯片采用 28 nm CMOS 技术制造,峰值能效达到 4.68 TOPS/W,2 $\次$比之前的加速器更高。它也是第一个为基于体素的点云网络提供统一2D/3D支持和端到端推理能力的加速器。
更新日期:2024-04-26
down
wechat
bug