This paper addresses parallel execution of chain code generation on a linear array architecture. The contours in the proposed algorithm are viewed as a set of edges (or contour segments) that can be traced by a top-do...
详细信息
This paper addresses parallel execution of chain code generation on a linear array architecture. The contours in the proposed algorithm are viewed as a set of edges (or contour segments) that can be traced by a top-down contour tracing method to generate the chain codes for the outer and inner object contours. A parallel algorithm that contains the chain code generating rules and operations needed is also described, and the algorithm is mapped onto a one-dimensional systolic array containing [(1)/(2)(N + 1)] processing elements (PEs) to devise this architecture. The architecture extracts the contours of objects and quickly generates the corresponding chain codes after the image data in all rows are inputted in a linear fashion. The total processing time for generating the chain codes in an N x N image is O(3N). By doing so, the real-time requirement is fulfilled and its execution time is independent of the image content. In addition, a partition method is developed to process an image when the parallel architecture has a fixed number of PEs;say two or more. The total execution time for an N x N image by employing a fixed number of PEs is N(N + 1)/M + 2(M - 1), when M is the fixed number of PEs. (C) 2002 Elsevier Science Inc. All rights reserved.
We present a new algorithm for solving the Sylvester observer equation arising in the context of the Luenberger observer. The algorithm embodies two main computational phases: the solution of several independent equat...
详细信息
We present a new algorithm for solving the Sylvester observer equation arising in the context of the Luenberger observer. The algorithm embodies two main computational phases: the solution of several independent equation systems and a series of matrix-matrix multiplications. The algorithm is, thus, well suited for parallel and high-performance computing. By reducing the coefficient matrix A to lower-Hessenberg form, one can implement the algorithm efficiently, with few floating-point operations and little workspace. The algorithm has been successfully implemented on a GRAY C90. A comparison, both theoretical and experimental, has been made with the well-known Hessenberg-Schur algorithm which solves an arbitrary Sylvester equation. Our theoretical analysis and experimental results confirm the superiority of the proposed algorithm, both in efficiency and speed, over the Hessenberg-Schur algorithm.
A parallel algorithm is presented in this article to efficiently solve the optimal consensus problem of multiagent systems. By utilizing a Jacobi-type proximal alternating direction multiplier framework, the optimizat...
详细信息
A parallel algorithm is presented in this article to efficiently solve the optimal consensus problem of multiagent systems. By utilizing a Jacobi-type proximal alternating direction multiplier framework, the optimization process is divided into two independent subproblems that can be solved in parallel to improve computational efficiency, followed by the Lagrangian multiplier update. The convergence analysis of the proposed algorithm is performed using the convex optimization theory, deriving the convergence conditions concerning the auxiliary parameters. Furthermore, the accelerated algorithm enjoys a convergence rate of O(1/t(2)) by adjusting the auxiliary parameters adaptively. To leverage the strengths of the collaboration of multiagent systems, the distributed implementation of the proposed parallel algorithm is further developed, where each agent addresses its private subproblems only using its own and its neighbor's information. Numerical simulations demonstrate the effectiveness of the theoretical results.
Compact Local Integrated Radial Basis Function (CLIRBF) methods based on Cartesian grids can be effective numerical methods for solving partial differential equations (PDEs) for fluid flow problems. The combination of...
详细信息
Compact Local Integrated Radial Basis Function (CLIRBF) methods based on Cartesian grids can be effective numerical methods for solving partial differential equations (PDEs) for fluid flow problems. The combination of the domain decomposition method and function approximation using CLIRBF methods yields an effective coarse-grained parallel processing approach. This approach has enabled not only each sub-domain in the original analysis domain to be discretised by a separate CLIRBF network but also compact local stencils to be independently treated. The present algorithm, namely parallel CLIRBF, achieves higher throughput in solving large scale problems by, firstly, parallel processing of sub-regions which constitute the original domain and, secondly, accelerating the convergence rate within each sub-region using groups of CLIRBF stencils in which function approximations are carried out by parallel processes. The procedure is illustrated with several numerical examples of PDEs and lid-driven cavity problem using Message Passing Interface supported by MATLAB.
This paper presents a PRAM algorithm for computing the n x n Euclidean distance map. This algorithm can be performed in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log n/log log n) time using n...
详细信息
This paper presents a PRAM algorithm for computing the n x n Euclidean distance map. This algorithm can be performed in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log n/log log n) time using n(2) log log n/log n processors on the common CRCW PRAM, respectively. This algorithm is also applicable to many distance maps, for example, cityblock, chessboard, octagonal and chamfer distance maps.
We describe a new design of parallel algorithm for solving the two-dimensional longest common substring (2D LCS) problem, taking advantage of the multi-core graphic processing unit architecture offered by Compute Unif...
详细信息
We describe a new design of parallel algorithm for solving the two-dimensional longest common substring (2D LCS) problem, taking advantage of the multi-core graphic processing unit architecture offered by Compute Unified Device Architecture (CUDA). In this article we also define the 2D LCS problem as finding the largest common 4-connected component from two input matrices and present an algorithm which can exactly solve this problem in 0 (mnst/P) time with a P-core GPU.
We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pula.y. P.;Saebo, S.:, Wolinski, K. Chem Phys Lett 2001, 344, 543), It is based on the Saeho-Alml...
详细信息
We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pula.y. P.;Saebo, S.:, Wolinski, K. Chem Phys Lett 2001, 344, 543), It is based on the Saeho-Almlof direct-integral transformation. coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation, Results are presented for systems with up to 2000 basis functions. MP2 energies for molecule,, with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors, (6-8) in a matter of minutes with modem PC-based parallel computers.
This paper generalizes the parallel selected inversion algorithm called PSeIInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel ma...
详细信息
This paper generalizes the parallel selected inversion algorithm called PSeIInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are lower and upper triangular matrices, and P, Q are permutation matrices, respectively. The PSeIInv method computes selected elements of A(-1). The selection is confined by the sparsity pattern of the matrix AT. Our algorithm does not assume any symmetry properties of A, and our parallel implementation is memory efficient, in the sense that the computed elements of A-T over-writes the sparse matrix L U in situ. PSeIInv involves a large number of collective data communication activities within different processor groups of various sizes. In order to minimize idle time and improve load balancing, tree-based asynchronous communication is used to coordinate all such collective communication. Numerical results demonstrate that PSeIInv can scale efficiently to 6,400 cores for a variety of matrices. (C) 2017 Elsevier B.V. All rights reserved.
In this paper, we consider a recursive estimation problem for linear regression where the signal to be estimated admits a sparse representation and measurement samples are only sequentially available. We propose a con...
详细信息
In this paper, we consider a recursive estimation problem for linear regression where the signal to be estimated admits a sparse representation and measurement samples are only sequentially available. We propose a convergent parallel estimation scheme that consists of solving a sequence of l(1)-regularized least-square problems approximately. The proposed scheme is novel in three aspects: 1) all elements of the unknown vector variable are updated in parallel at each time instant, and the convergence speed is much faster than state-of-the-art schemes which update the elements sequentially;2) both the update direction and stepsize of each element have simple closed-form expressions, so the algorithm is suitable for online(real-time) implementation;and 3) the stepsize is designed to accelerate the convergence but it does not suffer from the common intricacy of parameter tuning. Both centralized and distributed implementation schemes are discussed. The attractive features of the proposed algorithm are also illustrated numerically.
While constructing a Voronoi diagram V(P) for a set of P of n points on a mesh-connected computer (MCC), it is necessary to find a set B of edges which are intersected by the dividing chain C during the merge process ...
详细信息
While constructing a Voronoi diagram V(P) for a set of P of n points on a mesh-connected computer (MCC), it is necessary to find a set B of edges which are intersected by the dividing chain C during the merge process of two Voronoi diagrams V(L) and V(R), where L and R contain the leftmost [n/2] points and the rightmost [n/2] points of P respectively. The computation of B requires two operations: First decide for each edge e in V(L) and V(R) whether its end vertices are closer to L or R, and then from that information, determine whether e is intersected by C. However, in the previous parallel algorithm each of the former and latter operations requires planar point location which takes O(square-root n) time on square-root n x square-root n MCC, and in addition the former operation needs to compute convex hulls of L and R. In this paper, we shall show that the latter operation can be done in O(1) time without executing planar point location and the former operation can be executed without the computation of convex hulls. Therefore, the computation of B is reduced to only one planar point location.
暂无评论