当前位置: X-MOL 学术IEEE J. Solid-State Circuits › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
INTIACC: A Programmable Floating-Point Accelerator for Partial Differential Equations
IEEE Journal of Solid-State Circuits ( IF 4.6 ) Pub Date : 2024-03-26 , DOI: 10.1109/jssc.2024.3379308
Paul Xuanyuanliang Huang 1 , Yannis Tsividis 1 , Mingoo Seok 1
Affiliation  

This article presents a 32-bit floating-point (FP32) programmable accelerator for solving a wide range of partial differential equations (PDEs) based on numerical integration methods. Compared to prior works that have fixed-point systems and are only applicable to specific types of PDEs, our proposed, integration accelerator for PDEs, named INTIACC, accelerator consists of 16 locally interconnected processing elements (PEs) where each PE is a fully programmable reduced instruction set computer (RISC) processor with an FP32 arithmetic logic unit (FP32 ALU) and a custom-designed instruction set architecture (ISA). These features enable INTIACC to generate solutions with high precision and a wide dynamic range and also allow users to implement different numerical algorithms to perform high-order integration methods and to evaluate nonlinear functions. In addition, we create a novel slow-global-fast-local clocking scheme in which PEs operate asynchronously with each other most of the time. We prototype the INTIACC test chip in 65 nm, with a core area of 0.975 mm2. Running at an average local clock frequency of 570 MHz at 1 V, it offers a single-precision computation throughput of 9.12 GFLOPS. Testing results show that with a similar energy-delay product, INTIACC is up to 40 $\times $ faster than the prior state-of-the-art PDE solver.

中文翻译:


INTIACC:用于偏微分方程的可编程浮点加速器



本文介绍了一种 32 位浮点 (FP32) 可编程加速器,用于基于数值积分方法求解各种偏微分方程 (PDE)。与具有定点系统并且仅适用于特定类型的 PDE 的先前工作相比,我们提出的 PDE 集成加速器,名为 INTIACC,加速器由 16 个本地互连的处理元件(PE)组成,其中每个 PE 都是一个完全可编程的简化具有 FP32 算术逻辑单元 (FP32 ALU) 和定制设计的指令集架构 (ISA) 的指令集计算机 (RISC) 处理器。这些功能使 INTIACC 能够生成高精度和宽动态范围的解决方案,并允许用户实施不同的数值算法来执行高阶积分方法并评估非线性函数。此外,我们创建了一种新颖的慢速全局快速本地时钟方案,其中 PE 大部分时间彼此异步运行。我们对 INTIACC 测试芯片进行了 65 nm 原型设计,核心面积为 0.975 mm2。它在 1 V 电压下以 570 MHz 的平均本地时钟频率运行,提供 9.12 GFLOPS 的单精度计算吞吐量。测试结果表明,使用类似的能量延迟产品,INTIACC 比之前最先进的 PDE 求解器快 40 美元\倍。
更新日期:2024-03-26
down
wechat
bug