We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers. The presented algorithm is based on the PUSH-RELABEL. algorithm which is...
详细信息
We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers. The presented algorithm is based on the PUSH-RELABEL. algorithm which is known to be one of the fastest algorithms for the bipartite matching problem. Previous attempts at developing parallel implementations of it have focused on shared memory computers using only a limited number of processors. We first present a straightforward adaptation of these shared memory algorithms to distributed memory computers. However, this is not a viable approach as it requires too much communication. We then develop our new algorithm by modifying the previous approach through a sequence of steps with the main goal being to reduce the amount of communication and to increase load balance. The first goal is achieved by changing the algorithm so that many push and relabel operations can be performed locally between communication rounds and also by selecting augmenting paths that cross processor boundaries infrequently. To achieve good load balance, we limit the speed at which global relabelings traverse the graph. In several experiments on a large number of instances, we study weak and strong scalability of our algorithm using up to 128 processors. The algorithm can also be used to find epsilon-approximate matchings quickly. (C) 2011 Elsevier B.V. All rights reserved.
Two parallel geometric algorithms based on the idea of point domination are presented. The first algorithm solves the d-dimensional isothetic rectangles intersection counting problem of input size N/2(d), where d >...
详细信息
Two parallel geometric algorithms based on the idea of point domination are presented. The first algorithm solves the d-dimensional isothetic rectangles intersection counting problem of input size N/2(d), where d > 1 and N is a multiple of 2(d), in O(log(d-1) N) time and O(N log N) space. The second algorithm solves the direct dominance reporting problem for a set of N points in the plane in O(log N + J) time and O(N log N) space, where J denotes the maximum of the number of direct dominances reported by any single point in the set. Both algorithms make use of the EREW PRAM (Exclusive Read Exclusive Write parallel Random Access Machine) consisting of O(N) processors as the computational model.
A bijective code is a method for associating labeled n-trees to (n - 2)-strings of node labels in such a way that different trees yield different strings and vice versa. For all known bijective codes, optimal sequenti...
详细信息
ISBN:
(纸本)9783642114397
A bijective code is a method for associating labeled n-trees to (n - 2)-strings of node labels in such a way that different trees yield different strings and vice versa. For all known bijective codes, optimal sequential encoding and decoding algorithms are presented in literature, while parallel algorithms are investigated only for sonic of these codes. In this paper we focus our attention on the Blob code: a. code particularly considered in the field of Genetic algorithms. To the best of our knowledge, here we present the first parallel encoding and decoding algorithms for this code. The encoding algorithm implementation is optimal on an EREW PRAM, while the decoding algorithm requires O(log n) time and O(n) processors on CREW PRAM.
Given a trace of a distributed computation and a desired predicate, the predicate detection problem is to find a consistent global state that satisfies the given predicate. The predicate detection problem has many app...
详细信息
ISBN:
(纸本)9781450360944
Given a trace of a distributed computation and a desired predicate, the predicate detection problem is to find a consistent global state that satisfies the given predicate. The predicate detection problem has many applications in the testing and runtime verification of parallel and distributed systems. We show that many problems related to predicate detection are in the parallel complexity class NC, the set of decision problems decidable in polylogarithmic time on a parallel computer with a polynomial number of processors. Given a computation on n processes with at most m local states per process, our parallel algorithm to detect a given conjunctive predicate takes O(log mn) time and O(m(3)n(3) log mn) work. The sequential algorithm takes O(mn(2)) time. For data race detection, we give a parallel algorithm that takes O(logmn log n) time, also placing that problem in NC. This is the first work, to the best of our knowledge, that places the parallel complexity of such predicate detection problems in the class NC.
In this paper, based on the advantages of both optical transmission and electronic computation, we first provide an O(log log N) bus cycles parallel algorithm for weighted distance transforms of an NxN binary image on...
详细信息
ISBN:
(纸本)9783642030949
In this paper, based on the advantages of both optical transmission and electronic computation, we first provide an O(log log N) bus cycles parallel algorithm for weighted distance transforms of an NxN binary image on a linear array with a reconfigurable pipelined bus System using N-2 processors. By increasing the number of processors, the proposed algorithm can be run in O(loglog(q) N) and O(l) bus cycles using qN(2) and N2+1/epsilon processors respectively, where 2 <= q <= root N, epsilon is a constant and epsilon >= 1. These results improve on previously known algorithms developed on various parallel computation models.
In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over am alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biolog...
详细信息
ISBN:
(纸本)3540425225
In the last few years, molecular biology has produced a large amount of data, mainly in the form of sequences, that is, strings over am alphabet of four (DNA/RNA) or twenty symbols (proteins). For computational biologists the main challenge now is to provide efficient tools for the analysis and the comparison of the sequences. In this paper, we introduce and briefly discuss some open problems, and present a parallel algorithm that finds repeated substrings in a DNA sequence or common substrings in a set of sequences. The occurrences of the substrings can be approximate, that is, can differ up to a maximum number of mismatches that depends on the length of the substring itself. The output of the algorithm is sorted according to different statistical measures of significance. The algorithm has been successfully implemented on a cluster of workstations.
We develop and evaluate parallel algorithms for a fundamental problem in numerical computing, namely the evaluation of a polynomial of a matrix. The algorithm consists of many building blocks that can be assembled in ...
详细信息
ISBN:
(纸本)9781450362955
We develop and evaluate parallel algorithms for a fundamental problem in numerical computing, namely the evaluation of a polynomial of a matrix. The algorithm consists of many building blocks that can be assembled in several ways. We investigate parallelism in individual building blocks, develop parallel implemenations, and assemble them into an overall parallel algorithm. We analyze the effects of both the dimension of the matrix and the degree of the polynomial on both arithmetic complexity and on parallelism, and we consequently propose which variants use in different cases. Our theoretical results indicate that one variant of the algorithm, based on applying the Paterson-Stockmeyer method to the entire matrix, parallelizes very effectively on virtually any matrix dimension and polynomial degree. However, it is not the most efficient from the arithmetic complexity viewpoint. Another algorithm, based on the Davies-Higham block recurrence is much more efficient from the arithmetic complexity viewpoint, but one of its building blocks is serial. Experimental results on a dual-socket 28-core server show that the first algorithm can effectively use all the cores, but that on high-degree polynomials the second algorithm is often faster, in spite of the sequential phase. This indicates that our parallel algorithms for the other phases are indeed effective.
parallel implementation of the implicit LU-SGS solver is considered. It leads to the graph coloring problem. A novel recursive graph coloring algorithm has been proposed that requires only three colors on 2: 1 balance...
详细信息
ISBN:
(纸本)9783319629322;9783319629315
parallel implementation of the implicit LU-SGS solver is considered. It leads to the graph coloring problem. A novel recursive graph coloring algorithm has been proposed that requires only three colors on 2: 1 balanced quadtree-based meshes. The algorithm has been shown to allow simple parallel implementations, including GPU architectures, and is fully coherent with local grid coarsing/refining procedures resulting in highly effective co-execution with local grid adaptation.
We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithms have the complexity of O(log n) time with n/log n processors on the CREW PRAM model, O(log log log n) ti...
详细信息
ISBN:
(纸本)9781479938445
We present new parallel algorithms for testing pattern involvement for all length 4 permutations. Our algorithms have the complexity of O(log n) time with n/log n processors on the CREW PRAM model, O(log log log n) time with n/log log log n processors or constant time and n log 3 n processors on a CRCW PRAM model. parallel algorithms were not designed before for some of these patterns and for other patters the previous best algorithms require O(log n) time and n processors on the CREW PRAM model.
Multi-valued Decision Diagrams (MDDs) have been extensively studied in the last ten years. Recently, efficient algorithms implementing operators such as reduction, union, intersection, difference, etc., have been desi...
详细信息
ISBN:
(纸本)9781577358008
Multi-valued Decision Diagrams (MDDs) have been extensively studied in the last ten years. Recently, efficient algorithms implementing operators such as reduction, union, intersection, difference, etc., have been designed. They directly deal with the graph structure of the MDD and a time reduction of several orders of magnitude in comparison to other existing algorithms have been observed. These operators have permitted a new look at MDDs, because extremely large MDDs can finally be manipulated as shown by the models used to solve complex application in music generation. However, MDDs become so large (50GB) that minutes are sometimes required to perform some operations. In order to accelerate the manipulation of MDDs, parallel algorithms are required. In this paper, we introduce such algorithms. We carefully design them in order to overcome inherent difficulties of the parallelization of sequential algorithms such as data dependencies, software lock-out, false sharing, or load balancing. As a result, we observe a speed-up, i.e. ratio between parallel and sequential runtimes, growing linearly with the number of cores.
暂无评论