Parallelization of particle-mass-transfer algorithms on shared-memory, multi-core CPUs,Advances in Water Resources

当前位置： X-MOL 学术 › Adv. Water Resour. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Parallelization of particle-mass-transfer algorithms on shared-memory, multi-core CPUs
Advances in Water Resources ( IF 4.0 ) Pub Date : 2024-09-11 , DOI: 10.1016/j.advwatres.2024.104818
David A. Benson , Ivan Pribec , Nicholas B. Engdahl , Stephen Pankavich , Lucas Schauer

Simulating the transfer of mass between particles is not straightforwardly parallelized because it involves the calculation of the influence of many particles on each other. Engdahl et al. (2019) intuited that the number of matrix operations used for mass transfer grows quadratically with the number of particles, so that dividing the domain geometrically into sub-domains will give speed and memory advantages, even on a single processing thread. Those authors also showed the speed scalability of several one-dimensional examples on multiple cores. Here, we extend those results for more general cases, both in terms of spatial dimensions and algorithmic implementation. We show that there is an optimal subdivision scheme for naive, full-matrix calculations on a multi-processor, or multi-threading shared-memory machine. A similar sparse-matrix implementation that also uses row-and-column-sum normalization often greatly reduces the memory requirements. We also introduce a completely new mass transfer algorithm that uses a non-geometric domain decomposition and only matrix row-sum normalization. This allows the mass-transfer “matrix” to be constructed and solved one row at a time in parallel, so it is faster and vastly more memory efficient than previous methods, but requires more care for suitable accuracy.

中文翻译：

在共享内存、多核 CPU 上实现粒子质量传递算法的并行化

模拟粒子之间的质量传递并不是直接并行的，因为它涉及计算许多粒子相互影响。Engdahl 等人（2019 年）直觉上发现，用于传质的矩阵运算数量随粒子数量的二次方增长，因此将域几何划分为子域将带来速度和内存优势，即使在单个处理线程上也是如此。这些作者还展示了多个 1-dimensional 示例在多个内核上的速度可扩展性。在这里，我们将这些结果扩展到更一般的情况，包括空间维度和算法实现。我们表明，对于多处理器或多线程共享内存机器上的朴素全矩阵计算，存在一种最优的细分方案。同样使用 row-and-column-sum 规范化的类似稀疏矩阵实现通常会大大降低内存需求。我们还引入了一种全新的质量传递算法，该算法使用非几何域分解和仅矩阵行和归一化。这允许一次并行构造和求解质量传递“矩阵”一行，因此它比以前的方法更快，内存效率更高，但需要更加注意适当的精度。

更新日期：2024-09-11

点击分享查看原文

点击收藏

公开下载

阅读更多本刊新发论文本刊介绍/投稿指南