The Journal of Supercomputing ( IF 2.5 ) Pub Date : 2024-03-01 , DOI: 10.1007/s11227-023-05641-1 Min Tian , Qi Liu , Jingshan Pan , Ying Gou , Zanjun Zhang
Abstract
Tridiagonal system solver is a basic kernel and has been well-supported in mainstream numerical libraries. The purpose of this paper is to devise an efficient parallel algorithm to solve a large-scale tridiagonal system. Based on the performance analysis of the classic Thomas algorithm and matrix splitting method, we propose a parallel Thomas split (PTS) algorithm. Compared with the matrix splitting method, the PTS algorithm can achieve an acceleration of 10.34 \(\times \) . Furthermore, we propose a Sunway parallel Thomas split (swPTS) algorithm based on the sw26010pro manycore processor. In the swPTS algorithm, we propose a specific data partitioning scheme to implement MPI+Athread parallelism. In the reduced set of equations, a new reduction approach for the Sunway architecture is proposed. Experiments show that the parallel elimination stage of our swPTS algorithm achieves up to 38.31 \(\times \) speedup over a PTS algorithm, and overall reaches 5.74 \(\times \) speedup over a Thomas algorithm.
中文翻译:
swPTS:神威众核处理器上三对角系统的高效并行托马斯分割算法
摘要
三对角系统求解器是一个基本内核,在主流数值库中得到了良好的支持。本文的目的是设计一种高效的并行算法来求解大规模三对角系统。基于经典Thomas算法和矩阵分裂方法的性能分析,我们提出了一种并行Thomas分裂(PTS)算法。与矩阵分裂方法相比,PTS算法可以实现10.34 \(\times\)的加速。此外,我们提出了一种基于sw26010pro众核处理器的Sunway并行Thomas split(swPTS)算法。在swPTS算法中,我们提出了一种特定的数据分区方案来实现MPI+Athread并行性。在简化的方程组中,提出了一种新的 Sunway 架构简化方法。实验表明,我们的 swPTS 算法的并行消除阶段比 PTS 算法实现了高达 38.31 \(\times \)的加速,总体上比 Thomas 算法实现了 5.74 \(\times \) 的加速。