Matrix multiplication is the fundamental operation in many numerical linear algebra *** efficient implementation on parallel high performance computers, together with the implementation of other basic linear algebra o...
详细信息
Matrix multiplication is the fundamental operation in many numerical linear algebra *** efficient implementation on parallel high performance computers, together with the implementation of other basic linear algebra operations,is an issue of primary importance for providing these systems with scientific software libraries. Consequently,considerable effort has been devoted to development of efficient practical parallel matrix multiplication *** this paper,we describe performance analysis of a simple parallel algorithm, Cannon's algorithm,systolicalgorithm and hyper-systolic *** analysis indicates that the performance of the hyper-systolic algorithm outperforms the other algorithms.
We introduce a new class of parallel algorithms for the exact computation of systems with pairwise mutual interactions of n elements, so called n(2)-problems. Hitherto, practical conventional parallelization strategie...
详细信息
We introduce a new class of parallel algorithms for the exact computation of systems with pairwise mutual interactions of n elements, so called n(2)-problems. Hitherto, practical conventional parallelization strategies could ach eve a complexity of O(np) with respect to the inter-processor communication, p being the number of processors. Our new approach can reduce the interprocessor communication complexity to a number O(np(1/2)). In the framework of Additive Number Theory, the determination of the optimal communication pattern can be formulated as h-range minimization problem that can be solved numerically. Based on a complexity model, the scaling behavior of the new algorithm is numerically tested on the connection machine CM5. As a real life example, we have implemented a fast code for globular cluster n-body simulations, a generic n(2)-problem, on the GRAY T3D, with striking success. Our parallel method promises to be useful in various scientific and engineering fields like polymer chain computations, protein folding, signal processing, and, in particular, for parallel level-3 BLAS.
暂无评论