The single source shortest path problem for arbitrary directed graphs with n nodes, in edges and nonnegative edge weights can sequentially be solved using O(n(.)logn + m) operations. However, no work-efficient paralle...
详细信息
The single source shortest path problem for arbitrary directed graphs with n nodes, in edges and nonnegative edge weights can sequentially be solved using O(n(.)logn + m) operations. However, no work-efficient parallel algorithm is known that runs in sublinear time for arbitrary graphs. In this paper we present a rather simple algorithm for the single source shortest path problem. Our new algorithm, which we call Delta-stepping, can be implemented very efficiently in sequential and parallel setting for a large class of graphs. For random edge weights and arbitrary graphs with maximum node degree d, sequential Delta-stepping needs O(n + m + d(.)L) total average-case time, where L denotes the maximum shortest path weight from the source node s to any node reachable from s. For example, this means linear time on directed graphs with constant maximum degree. Our best parallel version for a PRAM takes O(d(.)L(.)logn + log(2)n) time and O(n + m + d(.)L(.)logn) work on average. For random graphs, even O(log(2) n) time and O(n + m) work on average can be achieved. We also discuss how the algorithm can be adapted to work with nonrandom edge weights and how it can be implemented on distributed memory machines. Experiments indicate that already a simple implementation of the algorithm achieves significant speedup on real machines. (C) 2003 Elsevier Inc. All rights reserved.
At the Center for Computational Electromagnetics at the University of Illinois, we recently solved a very-large-scale electromagnetic scattering problem. We computed the bistatic radar cross-section of a full-size air...
详细信息
At the Center for Computational Electromagnetics at the University of Illinois, we recently solved a very-large-scale electromagnetic scattering problem. We computed the bistatic radar cross-section of a full-size aircraft at 8 GHz, involving the solution of a dense matrix equation with nearly 10.2 million unknowns. We regarded this as the "ultimate test" of a massively parallel implementation of the Multilevel Fast Multipole Algorithm (MLFMA), called ScaleME. In this paper, we narrate the technical difficulties faced and the experience gained from a very informal point of view. We shall describe the various methods developed for surmounting each of the obstacles.
We discuss the efficient implementation of a collective operation called reduce-scatter, which is defined in the MPI standard. The reduce-scatter is equivalent to the combination of a reduction on vectors of length n ...
详细信息
We discuss the efficient implementation of a collective operation called reduce-scatter, which is defined in the MPI standard. The reduce-scatter is equivalent to the combination of a reduction on vectors of length n with a scatter of the resulting n-vector to all processors. We describe the implementation issues and the performance characterization of two recently proposed algorithms for the reduce-scatter that have been proven to be highly efficient in theory under the assumption of fully connected parallel system. A performance comparison with existing mainstream implementations of the operation is presented which confirms the practical advantage of the new algorithms. Experiments show that the two algorithms have different characteristics which make them complementary in providing a performance gain over standard algorithms. Our study has been carried out on two different platforms: an SP2 and a Myrinet interconnected cluster of Pentium PRO. However, most of the results reported here are not specific for either MPI or the platforms used, and they hold in general for any message passing programming system. (C) 2003 Elsevier B.V. All rights reserved.
A parallel propagation algorithm was applied to the decoding of convolutional codes. The performance of the algorithm was demonstrated by a numerical method similar to the density evaluation.
A parallel propagation algorithm was applied to the decoding of convolutional codes. The performance of the algorithm was demonstrated by a numerical method similar to the density evaluation.
We describe a parallel model-checking algorithm for the fragment of the μ-calculus that allows one alternation of minimal and maximal fixed-point operators. This fragment is also known as L2μ. Since LTL and CTL* can...
详细信息
The central contribution of this work is SAMBA (Single Application, Multiple Load Balancing), a framework for the development of parallel SPMD (single program, multiple data) applications with load balancing. This fra...
详细信息
The central contribution of this work is SAMBA (Single Application, Multiple Load Balancing), a framework for the development of parallel SPMD (single program, multiple data) applications with load balancing. This framework models the structure and the characteristics common to different SPMD applications and supports their development. SAMBA also contains a library of load balancing algorithms. This environment allows the developer to focus on the specific problem at hand. Special emphasis is given to the identification of appropriate load balancing strategies for each application. Three different case studies were used to validate the functionality of the framework: matrix multiplication, numerical integration, and a genetic algorithm. These applications illustrate its ease of use and the relevance of load balancing. Their choice was oriented by the different load imbalance factors they present and by their different task creation mechanisms. The computational experiments reported for these case studies made possible the validation of SAMBA and the comparison, without additional reprogramming costs, of different load balancing strategies for each of them. The numerical results and the elapsed times measurements show the importance of using an appropriate load balancing algorithm and the associated reductions that can be achieved in the elapsed times. They also illustrate that the most suitable load balancing strategy may vary with the type of application and with the number of available processors. Besides the support to the development of SPMD applications, the facilities offered by SAMBA in terms of load balancing play also an important role in terms of the development of efficient parallel implementations. (C) 2003 Elsevier Science B.V. All rights reserved.
An AB 2 operation is known as an efficient basic operation for public key cryptosystems over GF(2(m)), and various systolic arrays for performing AB(2) operations have already been proposed using a standard basis repr...
详细信息
An AB 2 operation is known as an efficient basic operation for public key cryptosystems over GF(2(m)), and various systolic arrays for performing AB(2) operations have already been proposed using a standard basis representation. However, these circuits have certain shortcomings for cryptographic application due to their high circuit complexity and long latency. Therefore, further research on an efficient AB(2) multiplication circuit is still needed. Accordingly, the authors present a new AB(2) algorithm and its systolic realisations in GF(2(m)). First, a new algorithm is proposed based on the MSB-first scheme using a standard basis representation. Thereafter, bitparallel and bit-serial systolic power multipliers are derived that exhibit a lower hardware complexity and smaller latency than conventional approaches. In addition, since the proposed architectures incorporate simplicity, regularity, modularity, and pipelinability, they are well suited to VLSI implementation and can be easily applied as a basic architecture for computing an inverse/ division operation and in crypto-processor chip design.
This paper solves the NP problem of DNA string matching using heuristics and parallelism. The current methods for approximate matching are merely different versions of dynamic programming. Dynamic programming is O(n2)...
详细信息
ISBN:
(纸本)1892512416
This paper solves the NP problem of DNA string matching using heuristics and parallelism. The current methods for approximate matching are merely different versions of dynamic programming. Dynamic programming is O(n2), and does not consider one of the most important areas in technology: parallelism. The proposed algorithm uses parallelism to solve approximate matching. It has a best-case time complexity of O(n), and has better performance in practice than dynamic programming.
Data-parallel algorithms are presented for polygonizing a collection of line segments represented by a data-parallel bucket PMR quadtree, a data-parallel R-tree, and a data-parallel R+-tree. Such an operation is usefu...
详细信息
This paper describes a new parallel algorithm for solving multiphysics problems. These kind of problems are very demanding in terms of CPU time and memory space, which are typically not available on a single processor...
详细信息
暂无评论