A tabu search based approach is studied as a method for solving in parallel the two-dimensional irregular cutting problem. We use and compare different, variants of the method and various parallel computing systems. S...
详细信息
A tabu search based approach is studied as a method for solving in parallel the two-dimensional irregular cutting problem. We use and compare different, variants of the method and various parallel computing systems. Systems used are based on message passing or shared memory paradigm. parallel algorithms using both methods of communication are proposed. The efficiency of computer system utilization is discussed in the context of unpredictable time requirements of parallel tasks. We present results for different variants of the method together with efficiency measures for parallel implementations, where IBM SP2 and CRAY T3E systems, respectively, have been used.
This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon's algorithm. It is suitable for clusters and scalabl...
详细信息
ISBN:
(纸本)0769521320
This paper describes a novel parallel algorithm that implements a dense matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon's algorithm. It is suitable for clusters and scalable shared memory systems. The current approach differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. The experimental results on clusters (IBM SP, Linux-Myrinet) and shared memory systems (SGI Altix, Cray X1) demonstrate consistent performance advantages over pdgemm from the ScaLAPACK/PBBLAS suite, the leading implementation of the parallel matrix multiplication algorithms used today. In the best case on the SGI Altix, the new algorithm performs 20 times better than pdgemm for a matrix size of 1000 on 128 processors. The impact of zero-copy nonblocking RMA communications and shared memory communication on matrix multiplication performance on clusters are investigated.
Minimum Spanning Tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such ...
详细信息
ISBN:
(纸本)0769521320
Minimum Spanning Tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical imaging, and proteomics, and national security and bioterrorism such as detecting the spread of toxins through populations in the case of biological/chemical warfare. Most of the previous attempts for improving the speed of MST using parallel computing are too complicated to implement or perform well only on special graphs with regular structure. In this paper we design and implement four parallel MST algorithms (three variations of Borůvka plus our new approach) for arbitrary sparse graphs that for the first time give speedup when compared with the best sequential algorithm. In fact, our algorithms also solve the minimum spanning forest problem. We provide an experimental study of our algorithms on symmetric multiprocessors such as IBM's p690/Regatta and Sun's Enterprise servers. Our new implementation achieves good speedups ever a wide range of input graphs with regular and irregular structures, including the graphs used by previous parallel MST studies. For example, on an arbitrary random graph with 1M vertices and 20M edges, our new approach achieves a speedup of 5 using 8 processors. The source code for these algorithms is freely-available from our web site ***.
This paper presents authors' research on implementation of sorting nets in Field Programmable Logic Gates. As a theoretical base bitonic sorting nets were considered. During their research authors met several diff...
详细信息
Particle tracking methods are central to a wide spectrum of scientific computing applications. To support such applications, this paper presents a compact software architecture that can be used to interface parallel p...
详细信息
Particle tracking methods are central to a wide spectrum of scientific computing applications. To support such applications, this paper presents a compact software architecture that can be used to interface parallel particle tracking software to computational mesh management systems. A detailed description is presented of the in-element particle tracking framework supported by this software architecture - a framework that encompasses most particle tracking applications. The use of this parallel software architecture is illustrated through the implementation of two differential equation solvers, the forward Euler and an implicit trapezoidal method, on a distributed, unstructured, computational mesh. A design goal of this software effort has been to interface to software libraries such as Scalable Unstructured Mesh algorithms and Applications (SUMAA3d) in addition to application codes (e.g. FEMWATER). This goal of portability is achieved through a software architecture that specifies a lightweight functional interface that maintains the full functionality required by particle-mesh methods. The use of this approach in parallel programming environments written in C and Fortran is demonstrated.
This paper presents an efficient parallel algorithm for the shortest path problem in planar layered digraphs that runs in O(log3n) time with n processors. The algorithms uses a divide and conquer approach and is based...
详细信息
This paper presents an efficient parallel algorithm for the shortest path problem in planar layered digraphs that runs in O(log3n) time with n processors. The algorithms uses a divide and conquer approach and is based on the idea of a one-way separator, which has the property that any directed path can be crossed only once.
The maximum subsequence problem finds the contiguous subsequence of n real numbers with the highest sum. This problem appears in the analysis of DNA or protein sequences. It can be solved sequentially in O(n) time. In...
详细信息
ISBN:
(纸本)3540231633
The maximum subsequence problem finds the contiguous subsequence of n real numbers with the highest sum. This problem appears in the analysis of DNA or protein sequences. It can be solved sequentially in O(n) time. In the 2-D version, given an n x n array A, the maximum subarray of A is the contiguous subarray that has the maximum sum. The sequential algorithm for the maximum subarray problem takes O(n(3)) time. We present efficient BSP/CGM parallel algorithms that require a constant number of communication rounds for both problems. In the first algorithm, the sequence stored on each processor is reduced to only five numbers, so that the resulting values can be concentrated on a single processor which runs an adaptation of the sequential algorithm to obtain the result. The parallel algorithm requires O(n/p) computing time. In the second algorithm, the input array is partitioned equally among the processors and we first reduce each subarray to a sequence, and then apply the first algorithm to solve it. The parallel algorithm takes O(n(3)/p) computing time. The good performance of the parallel algorithms is confirmed by experimental results run on a 64-node Beowulf parallel computer.
We present parallel algorithms based on biorthogonal wavelet transforms(BWTs). We have constructed processing elements(PEs) for the one-dimensional(1-D) and two-dimensional(2-D) reconstruction filter masks to minimize...
详细信息
Based on the second-order compact upwind scheme, a group explicit method for solving the two-dimensional time-independent convection-dominated diffusion problem is developed. The stability of the group explicit method...
详细信息
Based on the second-order compact upwind scheme, a group explicit method for solving the two-dimensional time-independent convection-dominated diffusion problem is developed. The stability of the group explicit method is proven strictly. The method has second-order accuracy and good stability. This explicit scheme can be used to solve all Reynolds number convection-dominated diffusion problems. A numerical test using a parallel computer shows high efficiency. The numerical results conform closely to the analytic solution.
In this paper we investigate parallel numerical algorithms for solution of the transient stimulated scattering processes. A new symmetrical splitting scheme is proposed and a parallel version is given. The efficiency ...
详细信息
暂无评论