检索结果-内蒙古大学图书馆

Performance analysis of explicit group parallel algorithms for distributed memory multicomputer

PARALLEL COMPUTING 2008年第6-8期34卷 427-440页

作者： Ng, Kok Fu Ali, Norhashidah Hj. Mohd Univ Sains Malaysia Sch Math Sci George Town 11800 Malaysia

Since their introduction, the four-point explicit group (EG) and explicit decoupled group (EDG) methods in solving elliptic PDE's have been implemented on various parallel computing architectures such as shared memory parallel computer and distributed computer systems. However, no detailed study on the performance analysis of these algorithms was done in any of these implementations. In this paper we developed performance models for these explicit group methods and present detailed study of their hypothetical implementation on two distributed memory multicomputers with different computation speed and communication bandwidth. Detailed performance analysis based on these models predicted different theoretical performance if the methods were implemented on the clusters. This was confirmed by the experimental results performed on the two distinct clusters. Theoretical analysis and experimental results indicated that both explicit group methods are scalable with respect to number of processors and the problem size. (C) 2007 Published by Elsevier B.V.

关键词： performance analysis explicit group (EG) method explicit decoupled group (EDG) method distributed memory multicomputer Poisson equation MPI

来源：评论

学校读者我要写书评

暂无评论

ANALYSIS OF ASYNCHRONOUS POLYNOMIAL ROOT FINDING METHODS ON A distributed-memory multicomputer

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 1994年第6期5卷 639-648页

作者： COSNARD, M FRAIGNIAUD, P Ecole Normale Supérieure de Lyon Lyon France

We have studied various implementations of iterative polynomial root finding methods on a distributed memory multicomputer. These methods are based on the construction of a sequence of approximations that converge to the set of zeros. The synchronous version consists in sharing the computation of the next iterate among the processors and updating their data through a total exchange of their results. In order to decrease the communication cost, we introduce asynchronous versions. The computation of the next iterate is still shared among the processor, but the updating is done by using only nearest neighbor communications. We prove that under weak conditions, these asynchronous versions are still locally convergent, even if their convergence orders are reduced. We analyze the behavior of the asynchronous methods in function of their delay, the topology of the interconnection network, and the elementary computation and communication times. We have implemented and compared these methods on a hypercube multicomputer.

关键词： POLYNOMIAL ZEROS distributed memory multicomputer ASYNCHRONOUS METHODS

来源：评论

学校读者我要写书评

暂无评论

Parallel shear-warp factorization volume rendering using efficient 1-D and 2-D partitioning schemes for distributed memory multicomputers

引用

JOURNAL OF SUPERCOMPUTING 2002年第3期22卷 277-302页

作者： Lin, CF Yang, DL Chung, YC Feng Chia Univ Dept Informat Engn Taichung 407 Taiwan

3-D data visualization is very useful for medical imaging and computational fluid dynamics. Volume rendering can be used to exhibit the shape and volumetric properties of 3-D objects. However, volume rendering requires a considerable amount of time to process the large volume of data. To deliver the necessary rendering rates, parallel hardware architectures such as distributed memory multicomputers offer viable solutions. The challenge is to design efficient parallel algorithms that utilize the hardware parallelism effectively. In this paper, we present two efficient parallel volume rendering algorithms, the 1D-partition and 2D-partition methods, based on the shear-warp factorization for distributed memory multicomputers. The 1D-partition method has a performance bound on the size of the volume data. If the number of processors is less than a threshold, the 1D-partition method can deliver a good rendering rate. If the number of processors is over a threshold, the 2D-partition method can be used. To evaluate the performance of these two algorithms, we implemented the proposed methods along with the slice data partitioning, volume data partitioning, and sheared volume data partitioning methods on an IBM SP2 parallel machine. Six volume data sets were used as the test samples. The experimental results show that the proposed methods outperform other compatible algorithms for all test samples. When the number of processors is over a threshold, the experimental results also demonstrate that the 2D-partition method is better than the 1D-partition method.

关键词： volume rendering data partitioning image compositing shear-warp factorization distributed memory multicomputer

来源：评论

学校读者我要写书评

暂无评论

MESSAGE-PASSING MULTICELL MOLECULAR-DYNAMICS ON THE CONNECTION MACHINE 5

引用

PARALLEL COMPUTING 1994年第2期20卷 173-195页

作者： BEAZLEY, DM LOMDAHL, PS LOS ALAMOS NATL LAB DIV THEORETLOS ALAMOSNM 87545 LOS ALAMOS NATL LAB ADV COMP LABLOS ALAMOSNM 87545

We present a new scalable algorithm for short-range molecular dynamics simulations on distributed memory MIMD multicomputers based on a message-passing multi-cell approach. We have implemented the algorithm on the Connection Machine 5 (CM-5) and demonstrate that meso-scale molecular dynamics with more than 10(8) particles is now possible on massively parallel MIMD computers. Typical runs show single particle update-times of 0.15 mus in 2 dimensions (2D) and approximately 1 mus in 3 dimensions (3D) on a 1024 node CM-5 without vector units, corresponding to more than 1.8 Gflops overall performance. We also present a scaling equation which agrees well with actually observed timings.

关键词： MOLECULAR DYNAMICS SIMULATION distributed memory multicomputer CONNECTION MACHINE-5 (CM-5) TIMING RESULTS MESSAGE-PASSING MODEL SCALING MODEL

来源：评论

学校读者我要写书评

暂无评论

On supernode transformation with minimized total running time

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 1998年第5期9卷 417-428页

作者： Hodzic, E Shang, WJ AT&T Bell Labs San Jose CA 95134 USA Santa Clara Univ Dept Comp Engn Santa Clara CA 95053 USA

With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses how to find an optimal supernode size and optimal supernode relative side lengths of a supernode transformation (also known as tiling). We identify three parameters of supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For algorithms with perfectly nested loops and uniform dependencies, for sufficiently large supernodes and number of processors, and for the case where multiple supernodes are mapped to a single processor, we give an order n polynomial whose real positive roots include the optimal supernode size. For two special cases, 1) two-dimensional algorithm problems and 2) n-dimensional algorithm problems, where the communication cost is dominated by the startup penalty and, therefore, can be approximated by a constant, we give a closed form expression for the optimal supernode size, which is independent of the supernode relative side lengths and cutting hyperplanes. For the case where the algorithm iteration index space and the supernodes are hyperrectangular, we give closed form expressions for the optimal supernode relative side lengths. Our experiment shows a good match of the closed form expressions with experimental data.

关键词： supernode partitioning tiling parallelizing compilers distributed memory multicomputer minimizing running time

来源：评论

学校读者我要写书评

暂无评论

CONCURRENT PROCESSING OF LINEARLY ORDERED DATA-STRUCTURES ON HYPERCUBE multicomputerS

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 1994年第9期5卷 898-911页

作者： GHOSH, J DAS, SK JOHN, A UNIV N TEXAS DEPT COMP SCICTR RS PARALLEL & DISTRIBUTEC COMPDENTONTX 76203 UNIV TEXAS DEPT COMP SCIAUSTINTX 78712

This paper presents a simple and effective method for the concurrent manipulation of linearly ordered data structures on hypercube systems. The method is based on the existence of an augmented binomial search tree, called the pruned binomial tree, rooted at any arbitrary processor node of the hypercube such that 1) every edge of the tree corresponds to a direct link between a pair of hypercube nodes, and 2) the tree spans any arbitrary sequence of n consecutive nodes containing the root, using a fan-out of at most inverted right perpendicular log2 n inverted left perpendicular and a depth of at most inverted right perpendicular log2 n inverted left perpendicular + 1. Search trees spanning nonoverlapping processor lists are formed using only local information, and can be used concurrently without contention problems. Thus, they can be used for performing operations such as broadcast and merge simultaneously on sets with nonuniform sizes. Extensions of the tree to k-ary n-cubes and faulty hypercubes are presented. Applications of this concurrent data structure to low- and intermediate-level image processing algorithms, and for dictionary operations involving multiple keys, are also outlined.

关键词： CONCURRENT DATA STRUCTURES HYPERCUBE MAPPINGS BINOMIAL TREES distributed memory multicomputer distributed MACHINES GRAY CODE EMBEDDING K-ARY N-CUBE

来源：评论

学校读者我要写书评

暂无评论

Parallel asynchronous team algorithms: Convergence and performance analysis

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 1996年第7期7卷 677-688页

作者： Baran, B Kaszkurewicz, E Bhaya, A FED UNIV RIO DE JANEIRO COPPEDEPT ELECT ENGNBR-21945970 RIO JANEIRORJBRAZIL

This paper formalizes a general technique to combine different methods in the solution of large systems of nonlinear equations using parallel asynchronous implementations on distributed-memory multiprocessor systems. Such combinations of methods, referred to as Team Algorithms, are evaluated as a way of obtaining desirable properties of different methods and a sufficient condition for their convergence is derived. The load flow problem of electrical power networks is presented as an example problem that, under certain conditions, has the characteristics io make a Tearri Algorithm an appealing choice for its solution. Experimental results of an implementation on an Intel iPSC/860 Hypercube are reported, showing that considerable speedup and robustness can be obtained using team algorithms.

关键词： distributed memory multicomputer asynchronous methods nonlinear equations block-iterative methods convergence conditions team algorithms load flow problem electrical power networks

来源：评论

学校读者我要写书评

暂无评论

On time optimal supernode shape

引用

IEEE TRANSACTIONS ON PARALLEL AND distributed SYSTEMS 2002年第12期13卷 1220-1233页

作者： Hodzic, E Shang, WJ Santa Clara Univ Dept Comp Engn Santa Clara CA 95053 USA

With the objective of minimizing the total execution time of a parallel program on a distributed memory parallel computer, this paper discusses the selection of an optimal supernode shape of a supernode transformation (also known as tiling). We identify three parameters of a supernode transformation: supernode size, relative side lengths, and cutting hyperplane directions. For supernode transformations on algorithms with perfectly nested loops and uniform dependencies, we prove the optimality of a constant linear schedule vector and give a necessary and sufficient condition for optimal relative side lengths. We also prove that the total running time is minimized by a cutting hyperplane direction matrix from a particular subset of all valid directions and we discuss the cases where this subset is unique. The results are derived in continuous space and should be considered approximate. Our model does not include cache effects and assumes an unbounded number of available processors, the communication cost approximated by a constant, uniform dependences, and loop bounds known at compile time. A comprehensive example is discussed with an application of the results to the Jacobi algorithm.

关键词： supernode transformation tiling algorithm partitioning parallelizing compilers minimizing running time distributed memory multicomputer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：