A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. A layered PMC is a PMC in which all input nodes are in the external face, and the gates c...
详细信息
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. A layered PMC is a PMC in which all input nodes are in the external face, and the gates can be assigned to layers in such a way that every wire goes between gates in successive layers. Goldschlager, Cook and Dymond, and others have developed NC2 algorithms to evaluate a layered PMC when the output node is in the same face as the input nodes. These algorithms require a large number of processors (Omega(n(6)), where n is the size of the input circuit). In this paper we give an efficient parallel algorithm that evaluates a layered PMC of size n in O (log(2)n) time using only a linear number of processors on an EREW PRAM. Our parallel algorithm is the best possible to within a polylog factor, and is a substantial improvement over the earlier algorithms for the problem.
A new algorithm for nonlinear eigenvalue problems is proposed. The numerical technique is based on a perturbation of the coefficients of differential equation combined with the Adomian decomposition method for the non...
详细信息
A new algorithm for nonlinear eigenvalue problems is proposed. The numerical technique is based on a perturbation of the coefficients of differential equation combined with the Adomian decomposition method for the nonlinear part. The approach provides an exponential convergence rate with a base which is inversely proportional to the index of the eigenvalue under consideration. The eigenpairs can be computed in parallel. Numerical examples are presented to support the theory. They are in good agreement with the spectral asymptotics obtained by other authors.
In multiple areas of image processing, such as Computed Tomography, in which data acquisition is based on counting particles that hit a detector surface, Poisson noise occurs. Using variance-stabilizing transformation...
详细信息
In multiple areas of image processing, such as Computed Tomography, in which data acquisition is based on counting particles that hit a detector surface, Poisson noise occurs. Using variance-stabilizing transformations, the Poisson noise can be approximated by a Gaussian one, for which classical denoising filters can be used. This paper presents an experimental performance study of a parallel implementation of the Poissonian image restoration algorithm, introduced in Harizanov et al. (2013). Hybrid parallelization based on MPI and OpenMP standards is investigated. The convergence rate of the algorithm heavily depends on both the image size and the choice of input parameters (rho, sigma), thus maximizing its, parallel efficiency is vital for real-life applications. The implementation is tested for high-resolution radiographic images, on Linux clusters with Intel processors and on an IBM supercomputer. (C) 2016 Elsevier B.V. All rights reserved.
The distance transform and the nearest feature transform are useful operations in image processing. These transforms are based on various kinds of distance functions because the distance functions have different effic...
详细信息
The distance transform and the nearest feature transform are useful operations in image processing. These transforms are based on various kinds of distance functions because the distance functions have different efficiency or usefulness. In this paper, we consider these transforms based on the weighted distance, which is a generalization of many distances, such as city block, chessboard and chamfer distances. This paper presents a parallel algorithm for these transforms of an n x n binary image. The algorithm runs in O(log n) time using n(2)/log n processors on the EREW PRAM and in O(log log n) time using n(2)/log log n processors on the common CRCW PRAM. The algorithm also runs in O(n(2)/p(2) + n) time on a p x p mesh and in O(n(2)/p(2) + (n log p)/p) time on a p(2) processor hypercube (for 1 less than or equal to p less than or equal to n). From these complexities, the algorithm is cost optimal on all models. Also we obtained an Omega(log n) lower bound for the transform on the CREW PRAM. This implies that the algorithm is time optimal on the EREW PRAM. (C) 1999 Elsevier Science B.V. All rights reserved.
In the literature, there are quite a few sequential and parallel algorithms for solving problems on distance-hereditary graphs. With an n-vertex and m-edge distance-hereditary graph G, we show that the efficient domin...
详细信息
In the literature, there are quite a few sequential and parallel algorithms for solving problems on distance-hereditary graphs. With an n-vertex and m-edge distance-hereditary graph G, we show that the efficient domination problem on G can be solved in O(log(n)(2)) time using O(n + m) processors on a CREW PRAM. Moreover, if a binary tree representation of G is given, the problem can be optimally solved in O(log n) time using O(n/log n) processors on an EREW PRAM.
This paper generalizes the parallel selected inversion algorithm called PSeIInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel ma...
详细信息
This paper generalizes the parallel selected inversion algorithm called PSeIInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as PAQ = LU on a distributed memory parallel machine, where L, U are lower and upper triangular matrices, and P, Q are permutation matrices, respectively. The PSeIInv method computes selected elements of A(-1). The selection is confined by the sparsity pattern of the matrix AT. Our algorithm does not assume any symmetry properties of A, and our parallel implementation is memory efficient, in the sense that the computed elements of A-T over-writes the sparse matrix L U in situ. PSeIInv involves a large number of collective data communication activities within different processor groups of various sizes. In order to minimize idle time and improve load balancing, tree-based asynchronous communication is used to coordinate all such collective communication. Numerical results demonstrate that PSeIInv can scale efficiently to 6,400 cores for a variety of matrices. (C) 2017 Elsevier B.V. All rights reserved.
In this paper, we consider a recursive estimation problem for linear regression where the signal to be estimated admits a sparse representation and measurement samples are only sequentially available. We propose a con...
详细信息
In this paper, we consider a recursive estimation problem for linear regression where the signal to be estimated admits a sparse representation and measurement samples are only sequentially available. We propose a convergent parallel estimation scheme that consists of solving a sequence of l(1)-regularized least-square problems approximately. The proposed scheme is novel in three aspects: 1) all elements of the unknown vector variable are updated in parallel at each time instant, and the convergence speed is much faster than state-of-the-art schemes which update the elements sequentially;2) both the update direction and stepsize of each element have simple closed-form expressions, so the algorithm is suitable for online(real-time) implementation;and 3) the stepsize is designed to accelerate the convergence but it does not suffer from the common intricacy of parameter tuning. Both centralized and distributed implementation schemes are discussed. The attractive features of the proposed algorithm are also illustrated numerically.
Data flow acyclic directed graphs (digraph) are widely used to describe the data dependency of mesh-based scientific computing. The parallel execution of such digraphs can approximately depict the flowchart of paralle...
详细信息
Data flow acyclic directed graphs (digraph) are widely used to describe the data dependency of mesh-based scientific computing. The parallel execution of such digraphs can approximately depict the flowchart of parallel computing. During the period of parallel execution, vertex priorities are key performance factors. This paper firstly takes the distributed digraph and its resource-constrained parallel scheduling as the vertex priorities model, and then presents a new parallel algorithm for the solution of vertex priorities using the well-known technique of forward-backward iterations. Especially, in each iteration, a more efficient vertex ranking strategy is proposed. In the case of simple digraphs, both theoretical analysis and benchmarks show that the vertex priorities produced by such an algorithm will make the digraph scheduling time converge non-increasingly with the number of iterations. In other cases of non-simple digraphs, benchmarks also show that the new algorithm is superior to many traditional approaches. Embedding the new algorithm into the heuristic framework for the parallel sweeping solution of neutron transport applications, the new vertex priorities improve the performance by 20 % or so while the number of processors scales up from 32 to 2048.
The dynamic lot-sizing model (DLS) is one of the most frequently used models in production and inventory system because lot decisions can greatly affect the performance of the system. The practicality of DLS algorithm...
详细信息
The dynamic lot-sizing model (DLS) is one of the most frequently used models in production and inventory system because lot decisions can greatly affect the performance of the system. The practicality of DLS algorithms is hindered by the huge amount of computer resources required for solving these models, even for a modest problem. This study developed a parallel algorithm to solve the lot-sizing problem efficiently. Given that n is the size of the problem, the complexity of the proposed parallel algorithm is O(n(2)p) with p processors. Numerical experiments are provided to verify the complexity of the proposed algorithm. The empirical results demonstrate that the speedup of this parallel algorithm approaches linearity, which means that the proposed algorithm can take full advantage of the distributed computing power as the size of the problem increases. (C) 2001 Elsevier Science Ltd. All rights reserved.
Seismic interferometry is a technique for extracting deterministic signals (i.e., ambient-noise Green's functions) from recordings of ambient-noise wavefields through cross-correlation and other related signal pro...
详细信息
Seismic interferometry is a technique for extracting deterministic signals (i.e., ambient-noise Green's functions) from recordings of ambient-noise wavefields through cross-correlation and other related signal processing techniques. The extracted ambient-noise Green's functions can be used in ambient noise tomography for constructing seismic structure models of the Earth's interior. The amount of calculations involved in the seismic interferometry procedure can be significant, especially for ambient noise datasets collected by large seismic sensor arrays (i.e., "large-N" data). We present an efficient parallel algorithm, named pSIN (parallel Seismic INterferometry), for solving seismic interferometry problems on conventional distributed-memory computer clusters. The design of the algorithm is based on a two-dimensional partition of the ambient-noise data recorded by a seismic sensor array. We pay special attention to the balance of the computational load, inter-process communication overhead and memory usage across all MPI processes and we minimize the total number of I/O operations. We have tested the algorithm using a real ambient-noise dataset and obtained a significant amount of savings in processing time. Scaling tests have shown excellent strong scalability from 80 cores to over 2000 cores. (C) 2016 Elsevier Ltd. All rights reserved.
暂无评论