This paper addresses parallel execution of chain code generation on a linear array architecture. The contours in the proposed algorithm are viewed as a set of edges (or contour segments) that can be traced by a top-do...
详细信息
This paper addresses parallel execution of chain code generation on a linear array architecture. The contours in the proposed algorithm are viewed as a set of edges (or contour segments) that can be traced by a top-down contour tracing method to generate the chain codes for the outer and inner object contours. A parallel algorithm that contains the chain code generating rules and operations needed is also described, and the algorithm is mapped onto a one-dimensional systolic array containing [(1)/(2)(N + 1)] processing elements (PEs) to devise this architecture. The architecture extracts the contours of objects and quickly generates the corresponding chain codes after the image data in all rows are inputted in a linear fashion. The total processing time for generating the chain codes in an N x N image is O(3N). By doing so, the real-time requirement is fulfilled and its execution time is independent of the image content. In addition, a partition method is developed to process an image when the parallel architecture has a fixed number of PEs;say two or more. The total execution time for an N x N image by employing a fixed number of PEs is N(N + 1)/M + 2(M - 1), when M is the fixed number of PEs. (C) 2002 Elsevier Science Inc. All rights reserved.
An efficient L-0-stable parallel algorithm is developed for the two-dimensional diffusion equation with non-local time-dependent boundary conditions. The algorithm is based on subdiagonal Pade approximation to the mat...
详细信息
An efficient L-0-stable parallel algorithm is developed for the two-dimensional diffusion equation with non-local time-dependent boundary conditions. The algorithm is based on subdiagonal Pade approximation to the matrix exponentials arising from the use of the method of lines and may be implemented on a parallel architecture using two processors running concurrently with each processor employing the use of tridiagonal solvers at every time-step. The algorithm is tested on two model problems from the literature for which discontinuities between initial and boundary conditions exist. The CPU times together with the associated error estimates are compared.
We describe a new design of parallel algorithm for solving the two-dimensional longest common substring (2D LCS) problem, taking advantage of the multi-core graphic processing unit architecture offered by Compute Unif...
详细信息
We describe a new design of parallel algorithm for solving the two-dimensional longest common substring (2D LCS) problem, taking advantage of the multi-core graphic processing unit architecture offered by Compute Unified Device Architecture (CUDA). In this article we also define the 2D LCS problem as finding the largest common 4-connected component from two input matrices and present an algorithm which can exactly solve this problem in 0 (mnst/P) time with a P-core GPU.
We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pula.y. P.;Saebo, S.:, Wolinski, K. Chem Phys Lett 2001, 344, 543), It is based on the Saeho-Alml...
详细信息
We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pula.y. P.;Saebo, S.:, Wolinski, K. Chem Phys Lett 2001, 344, 543), It is based on the Saeho-Almlof direct-integral transformation. coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation, Results are presented for systems with up to 2000 basis functions. MP2 energies for molecule,, with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors, (6-8) in a matter of minutes with modem PC-based parallel computers.
This paper generalizes the parallel selected inversion algorithm called PSeIInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel ma...
详细信息
This paper generalizes the parallel selected inversion algorithm called PSeIInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are lower and upper triangular matrices, and P, Q are permutation matrices, respectively. The PSeIInv method computes selected elements of A(-1). The selection is confined by the sparsity pattern of the matrix AT. Our algorithm does not assume any symmetry properties of A, and our parallel implementation is memory efficient, in the sense that the computed elements of A-T over-writes the sparse matrix L U in situ. PSeIInv involves a large number of collective data communication activities within different processor groups of various sizes. In order to minimize idle time and improve load balancing, tree-based asynchronous communication is used to coordinate all such collective communication. Numerical results demonstrate that PSeIInv can scale efficiently to 6,400 cores for a variety of matrices. (C) 2017 Elsevier B.V. All rights reserved.
In this paper, we consider a recursive estimation problem for linear regression where the signal to be estimated admits a sparse representation and measurement samples are only sequentially available. We propose a con...
详细信息
In this paper, we consider a recursive estimation problem for linear regression where the signal to be estimated admits a sparse representation and measurement samples are only sequentially available. We propose a convergent parallel estimation scheme that consists of solving a sequence of l(1)-regularized least-square problems approximately. The proposed scheme is novel in three aspects: 1) all elements of the unknown vector variable are updated in parallel at each time instant, and the convergence speed is much faster than state-of-the-art schemes which update the elements sequentially;2) both the update direction and stepsize of each element have simple closed-form expressions, so the algorithm is suitable for online(real-time) implementation;and 3) the stepsize is designed to accelerate the convergence but it does not suffer from the common intricacy of parameter tuning. Both centralized and distributed implementation schemes are discussed. The attractive features of the proposed algorithm are also illustrated numerically.
While constructing a Voronoi diagram V(P) for a set of P of n points on a mesh-connected computer (MCC), it is necessary to find a set B of edges which are intersected by the dividing chain C during the merge process ...
详细信息
While constructing a Voronoi diagram V(P) for a set of P of n points on a mesh-connected computer (MCC), it is necessary to find a set B of edges which are intersected by the dividing chain C during the merge process of two Voronoi diagrams V(L) and V(R), where L and R contain the leftmost [n/2] points and the rightmost [n/2] points of P respectively. The computation of B requires two operations: First decide for each edge e in V(L) and V(R) whether its end vertices are closer to L or R, and then from that information, determine whether e is intersected by C. However, in the previous parallel algorithm each of the former and latter operations requires planar point location which takes O(square-root n) time on square-root n x square-root n MCC, and in addition the former operation needs to compute convex hulls of L and R. In this paper, we shall show that the latter operation can be done in O(1) time without executing planar point location and the former operation can be executed without the computation of convex hulls. Therefore, the computation of B is reduced to only one planar point location.
An efficient parallel algorithm is presented for computing selected components of A(-1) where A is a structured symmetric sparse matrix. Calculations of this type are useful for several applications, including electro...
详细信息
An efficient parallel algorithm is presented for computing selected components of A(-1) where A is a structured symmetric sparse matrix. Calculations of this type are useful for several applications, including electronic structure analysis of materials in which the diagonal elements of the Green's functions are needed. The algorithm proposed here is a direct method based on a block LDLT factorization. The selected elements of A(-1) we compute lie in the nonzero positions of L+L-T. We use the elimination tree associated with the block LDLT factorization to organize the parallel algorithm, and reduce the synchronization overhead by passing the data level by level along this tree using the technique of local buffers and relative indices. We demonstrate the efficiency of our parallel implementation by applying it to a discretized two dimensional Hamiltonian matrix. We analyze the performance of the parallel algorithm by examining its load balance and communication overhead, and show that our parallel implementation exhibits an excellent weak scaling on a large-scale high performance distributed-memory parallel machine.
This paper presents a PRAM algorithm for computing the n x n Euclidean distance map. This algorithm can be performed in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log n/log log n) time using n...
详细信息
This paper presents a PRAM algorithm for computing the n x n Euclidean distance map. This algorithm can be performed in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log n/log log n) time using n(2) log log n/log n processors on the common CRCW PRAM, respectively. This algorithm is also applicable to many distance maps, for example, cityblock, chessboard, octagonal and chamfer distance maps.
A micro-digital holographic particle tracking velocimetry with high-speed system is constructed by a PC grid environment that employs Windows XP with AD-POWERs as parallel tool. Two algorithms for high-speed system ar...
详细信息
A micro-digital holographic particle tracking velocimetry with high-speed system is constructed by a PC grid environment that employs Windows XP with AD-POWERs as parallel tool. Two algorithms for high-speed system are evaluated under the same PC grid environment. Both methods are based on a computer-generated hologram algorithm. One method is a division algorithm based on time development for the measurements, while the other is a division algorithm based on spatial reconstruction for the measurement. In case of the former, the performance is increased by a factor of 3.3 by using 4 PCs. The present system can compute huge hologram images and output them "on-site" at an experimental facility. (c) 2007 Elsevier B.V. All fights reserved.
暂无评论