领先：对硬件高效的AI部署进行基准测试和探索加速器,arXiv - CS - Hardware Architecture

当前位置： X-MOL 学术 › arXiv.cs.AR › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

领先：对硬件高效的AI部署进行基准测试和探索加速器
arXiv - CS - Hardware Architecture Pub Date : 2021-04-06 , DOI: arxiv-2104.02251
Xiaofan Zhang, Hanchen Ye, Deming Chen

已开发了定制的硬件加速器，以为DNN推理和训练提供改进的性能和效率。但是，现有的硬件加速器可能并不总是适合于处理各种DNN模型，因为它们的体系结构范例和配置折衷是高度特定于应用程序的。重要的是，尽早对加速器候选程序进行基准测试，以收集全面的性能指标并找出潜在的瓶颈。基准测试后还出现了更多需求，这需要适当的解决方案来解决瓶颈并改进针对目标工作负载的当前设计。为了实现这些目标，在本文中，我们利用称为DNNExplorer的自动化工具来对定制的DNN硬件加速器进行基准测试，并探索性能和效率得到提高的新颖加速器设计。主要功能包括：（1）直接支持流行的机器学习框架以进行DNN工作负载分析，并提供精确的分析模型以进行快速加速器基准测试；（2）一种新颖的加速器设计范例，具有高维设计空间支持和细粒度的可调整性，以克服现有的设计缺陷；（3）设计空间探索（DSE）引擎，通过考虑目标AI工作负载和可用硬件资源来生成优化的加速器。结果表明，与DNNBuilder中最先进的流水线设计相比，采用建议的新颖范例的加速器可以提供高达4.2倍的高吞吐量（GOP / s）。在相同的DNN模型和资源预算的情况下，0X比最近发布的HybridDNN通用设计提高了效率。借助DNNExplorer的基准测试和探索功能，我们可以领先于构建和优化定制的AI加速器，并实现更高效的AI应用程序。

"点击查看英文标题和摘要"

Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment

Customized hardware accelerators have been developed to provide improved performance and efficiency for DNN inference and training. However, the existing hardware accelerators may not always be suitable for handling various DNN models as their architecture paradigms and configuration tradeoffs are highly application-specific. It is important to benchmark the accelerator candidates in the earliest stage to gather comprehensive performance metrics and locate the potential bottlenecks. Further demands also emerge after benchmarking, which require adequate solutions to address the bottlenecks and improve the current designs for targeted workloads. To achieve these goals, in this paper, we leverage an automation tool called DNNExplorer for benchmarking customized DNN hardware accelerators and exploring novel accelerator designs with improved performance and efficiency. Key features include (1) direct support to popular machine learning frameworks for DNN workload analysis and accurate analytical models for fast accelerator benchmarking; (2) a novel accelerator design paradigm with high-dimensional design space support and fine-grained adjustability to overcome the existing design drawbacks; and (3) a design space exploration (DSE) engine to generate optimized accelerators by considering targeted AI workloads and available hardware resources. Results show that accelerators adopting the proposed novel paradigm can deliver up to 4.2X higher throughput (GOP/s) than the state-of-the-art pipeline design in DNNBuilder and up to 2.0X improved efficiency than the recently published generic design in HybridDNN given the same DNN model and resource budgets. With DNNExplorer's benchmarking and exploration features, we can be ahead at building and optimizing customized AI accelerators and enable more efficient AI applications.

更新日期：2021-04-08

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>