We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some densematrix computatio...
详细信息
We propose architecture independent parallel algorithm design as a framework for writing parallel code that is scalable, portable and reusable. Towards this end we study the performance of some densematrix computations such as matrix multiplication, LU decomposition and matrix inversion. Although optimized algorithms for these problems have been extensively examined before, a systematic study of an architecture independent design and analysis of parallel algorithms and their performance (including matrix computations) has not been undertaken. Even though more refined algorithms and implementations (sequential or parallel) for the stated problems exist, the complexity and performance of the introduced algorithms is sufficient to raise the issues that are important in architecture independent parallel algorithm design. Two established distributions of an input matrix among the processors of a parallel machine are examined and the particular theoretical and practical merits of each one are also discussed. The algorithms we propose have been implemented and tested on a variety of parallel systems that include the SGI Power Challenge, the IBM SP2 and the Cray T3D. Our experimental results support our claims of efficiency, portability and reusability of the presented algorithms. (C) 2002 Elsevier Science B.V. All rights reserved.
暂无评论