Let P andQ be two convex,n-vertex polygons. We consider the problem of computing, in parallel, some functions ofP andQ whenP andQ are disjoint. The model of parallel computation we consider is the CREW-PRAM, i.e., it ...
详细信息
Let P andQ be two convex,n-vertex polygons. We consider the problem of computing, in parallel, some functions ofP andQ whenP andQ are disjoint. The model of parallel computation we consider is the CREW-PRAM, i.e., it is the synchronous shared-memory model where concurrent reads are allowed but no two processors can simultaneously attempt to write in the same memory location (even if they are trying to write the same thing). We show that a CREW-PRAM havingn 1/k processors can compute the following functions in O(k1+?) time: (i) the common tangents betweenP andQ, and (ii) the distance betweenP andQ (and hence a straight line separating them). The positive constant ? can be made arbitrarily close to zero. Even with a linear number of processors, it was not previously known how to achieve constant time performance for computing these functions. The algorithm for problem (ii) is easily modified to detect the case of zero distance as well.
In this paper, it is shown that on the CREW model we can test whether a given permutation of 1,..., n is separable in O(log n) time with n processors. If d is the depth of the optimal (minimum) depth separating tree o...
详细信息
In this paper, it is shown that on the CREW model we can test whether a given permutation of 1,..., n is separable in O(log n) time with n processors. If d is the depth of the optimal (minimum) depth separating tree of a separable permutation, then a separating tree of depth 0(d) can be constructed on the CREW model in O(log n) time with O(n(2)) cost or alternatively in O(d log n) time with O(nd) cost. We can test whether the given separable permutation P of 1,..., k has a match in a permutation T of 1,..., n (n greater than or equal to k) in O(d log n) time with O(kn(4)) cost (the same as that of the serial algorithm). We can also find the number of matches of P in T in O(d log n) time with O(kn(6)) cost (the same as that of the serial algorithm). Both algorithms are for the CREW model. We also discuss how the space complexity of the existing serial algorithms for the decision problem can be reduced from O(kn(3)) to O(n(3) log k) and of the counting version from O(kn) to O(n(4) log k). (C) 2004 Elsevier B.V. All rights reserved.
We study efficient deterministic parallel algorithms on two models: restartable fail-stop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processes are subject to arbitrary stop failures and restart...
详细信息
We study efficient deterministic parallel algorithms on two models: restartable fail-stop CRCW PRAMs and asynchronous PRAMs. In the first model, synchronous processes are subject to arbitrary stop failures and restarts determined by an on-line adversary and involving loss of private but not shared memory;the complexity measures are completed work (where processors are charged for completed fixed-size update cycles) and overhead ratio (completed work amortized over necessary work and failures). In the second model, the result of the computation is a serialization of the actions of the processors determined by an on-line adversary;the complexity measure is total work (number of steps taken by all processors). Despite their differences, the two models share key algorithmic techniques. We present new algorithms for the Write-All problem (in which P processors write ones into an array of size N) for the two models. These algorithms can be used to implement a simulation strategy for any N processor PRAM on a restartable fail-stop P processor CRCW PRAM such that it guarantees a terminating execution of each simulated N processor step, with O(log(2) N) overhead ratio, and O(min{N + P log(2) N + M log N, N . p(0.59)}) (subquadratic) completed work (where M is the number of failures during this step's simulation). This strategy has a range of optimality. We also show that the Write-All requires N + Omega(P log P) completed/total work on these models for P less than or equal to N. (C) 1996 Academic Press, Inc.
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices is swapped with each other. Edge switch operations have important applications in graph theory...
详细信息
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices is swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. Published by Elsevier Inc.
Coupled-cluster (CC) methods are now widely used in quantum chemistry to calculate the electron correlation energy and many other properties of atoms and molecules. In this paper we outline the basics of the theory, d...
详细信息
Coupled-cluster (CC) methods are now widely used in quantum chemistry to calculate the electron correlation energy and many other properties of atoms and molecules. In this paper we outline the basics of the theory, discuss some computational aspects, and review work that has been done toward developing and implementing algorithms for CC methods on parallel computers. (C) 2000 Elsevier Science B.V. All rights reserved.
The detection of circles in images is an important task in many computer vision applications. When the three parameters (center coordinates and radius) of a circle are quantized into O(n) values each, a sequential alg...
详细信息
The detection of circles in images is an important task in many computer vision applications. When the three parameters (center coordinates and radius) of a circle are quantized into O(n) values each, a sequential algorithm using the Hough transform runs with a time complexity of O(n4), where n x n is the size of the image. When information about the gradient direction is also used, the complexity of the sequential algorithm reduces to O(n3). This paper proposes three parallel algorithms for circle detection on an n x n mesh of processing elements operating in the SIMD mode. The first two algorithms use the Hough transform and the third is based on a tracing algorithm. The first algorithm uses only the gradient magnitude and takes O(n3) time. The second uses both the gradient magnitude and gradient direction and runs in O(n2) time. The third method uses a midpoint circle scan conversion algorithm and runs with a complexity of O(n2). This algorithm is the most efficient of the three. It does not use the gradient direction and offers an improvement of O(n2) over its sequential counterpart that runs in O(n4) time. When implemented with a table look-up operation, this algorithm has a low proportionality constant and offers a significant improvement in computational speed.
In this paper, we introduce and evaluate the parallel implementations of two video sequences decorrelation algorithms having been developed based on the non-alternating three-dimensional wavelet transform (3D-WT) and ...
详细信息
In this paper, we introduce and evaluate the parallel implementations of two video sequences decorrelation algorithms having been developed based on the non-alternating three-dimensional wavelet transform (3D-WT) and the temporal-window method. The proposed algorithms have been proven to outperform the classic 3D-WT algorithm in terms of a better coding efficiency and lower computational requirements while enabling a lossless coding and a top-quality reconstruction: the two most highly relevant features to medical imaging applications. The parallel implementations of the algorithms are developed and tested on a shared memory system, a SGI Origin 3800 supercomputer, making use of a message-passing paradigm. We evaluate and analyze the performance of the implementations in terms of the response time and speed-up factor by varying the number of processors and various video coding parameters. The key point enabling the development of highly efficient implementations rely on a workload distribution strategy supplemented by the use of parallel I/O primitives, for better exploiting the inherent features of the application and computing platform. Two sets of I/O primitives are tested and evaluated: the ones provided by the C compiler and the ones belonging to the MPI/IO library.
We present two parallel algorithms for finding all the roots of an N-degree polynomial equation on an efficient model of Optoelectronic Transpose Interconnection System (OTIS), called OTIS-2D torus. The parallel algor...
详细信息
We present two parallel algorithms for finding all the roots of an N-degree polynomial equation on an efficient model of Optoelectronic Transpose Interconnection System (OTIS), called OTIS-2D torus. The parallel algorithms are based on the iterative schemes of Durand-Kerner and Ehrlich methods. We show that the algorithm for the Durand-Kerner method requires (N (0.75)+0.5N (0.25)-1) electronic moves + 2(N (0.5)-1) OTIS moves using N processors. The parallel algorithm for Ehrlich method is shown to run in (N (0.75)+0.5N (0.25)-1) electronic moves + 2(N (0.5)-1) OTIS moves with the same number of processors. The algorithms have lower AT cost than the algorithms presented in Jana (parallel Comput 32:301-312, 2006). The scalability of the algorithms is also discussed.
The need to deal with imprecise and vague information in ontologies is rising in importance, as required by several real-world application domains. As a consequence, there is a growing interest in fuzzy ontologies, wh...
详细信息
The need to deal with imprecise and vague information in ontologies is rising in importance, as required by several real-world application domains. As a consequence, there is a growing interest in fuzzy ontologies, which combine ontologies and fuzzy logic theory. In fuzzy ontologies, some reasoning tasks usually become harder to solve, such as the concept subsumption problem and the computation of the Best Degree Bound (BDB) of an axiom. In fact, the current existing algorithms to solve these problems usually require performing some simpler tests several times. In this paper, we present a parallelization of these algorithms, implemented in the DeLorean reasoner, and discuss the encouraging results of an empirical evaluation.
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Re cent research has identified several prom...
详细信息
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Re cent research has identified several promising algorithms for implementing this method on massively parallel computers. however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations on a sphere: considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but we also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional fast Fourier transforms (FFTs) and other parallel transforms.
暂无评论