This paper proposes a complete binary tree topology and two efficient migration methods in fine-grained parallel evolutionary algorithms (FGPEAs) to solve constrained numerical optimization problems. The design of eff...
详细信息
This paper proposes a complete binary tree topology and two efficient migration methods in fine-grained parallel evolutionary algorithms (FGPEAs) to solve constrained numerical optimization problems. The design of effective evolutionary algorithms (EAs) is to obtain a proper balance between exploration and exploitation. The balance can be controlled by the spread rate and the migration of the best individuals. A complete binary tree topology, which slows down the spread rate, is used for exploration to solve the heavily constrained problems. Two migration methods are also employed to prevent a superior individual from taking almost all the subpopulations and to facilitate the possibility of global search. One is the restriction of migration according to the migration times and the other is the modified individual migration by the mutation operators. The simulation results indicate that FGPEA using the proposed migration methods has better performance in constrained numerical optimization problems, and the FGPEA with the tree topology and the proposed migration methods shows good performance on heavily constrained numerical optimization problems.
The B-tree is a fundamental data structure that is used to access and update a large number of keys. In this paper we present a parallel algorithm on the EREW PRAM that deletes keys in a B-tree. Our algorithm runs in ...
详细信息
ISBN:
(纸本)0769515738
The B-tree is a fundamental data structure that is used to access and update a large number of keys. In this paper we present a parallel algorithm on the EREW PRAM that deletes keys in a B-tree. Our algorithm runs in O(t(log k+log, n)) time with k processors, where n is the number of keys in the B-tree, t is the minimum degree of the B-tree, and k is the number of unsorted keys to delete, and it improves upon the previous algorithm by a factor of t.
Since parallel computers have different performance ratios of computation and communication, the optimal computational block sizes are different from one another to generate the maximum performance of an algorithm. To...
详细信息
ISBN:
(纸本)0769505892
Since parallel computers have different performance ratios of computation and communication, the optimal computational block sizes are different from one another to generate the maximum performance of an algorithm. Too small or large a block size makes getting good performance on a machine nearly impossible. In such a case, getting a better performance may require a complete redistribution of the data matrix. We present PoLAPACK factorization routines, including LU, QR, and Cholesky factorizations, with an "algorithmic blocking" on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irrespective of the physical block size. The routines are implemented on the SGI/Cray T3E and compared with the corresponding ScaLAPACK factorization routines.
In this paper, the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers are studied. These methods ar...
详细信息
In this paper, the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers are studied. These methods are competitive with standard pointwise methods on serial architectures, but are significantly faster on parallel computers. We make use of an improved parallel version of the conjugate gradient squared method (ICGS) combining elements of numerical stability and parallel algorithm design, for solving the resulting sequence of time-varying sparse linear differential-algebraic initial-value problems arising at each linearization step with waveform Newton. We reorganize the algorithm such that all the inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the bottleneck of the performance, namely the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting ICGS algorithm maintains the favorable properties of the original algorithm while not increasing the computational costs.
Biological sequence comparison is an important tool for researchers in molecular biology. There are several algorithms for sequence comparison. The Smith-Waterman algorithm, based on dynamic programming, is one of the...
详细信息
ISBN:
(纸本)0769515126
Biological sequence comparison is an important tool for researchers in molecular biology. There are several algorithms for sequence comparison. The Smith-Waterman algorithm, based on dynamic programming, is one of the most fundamental algorithms in bioinformatics. However, the existing parallel Smith-Waterman algorithm needs large memory space. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith-Waterman algorithm has becoming a critical problem. For resolving this problem, we develop a new parallel Smith-Waterman algorithm using the method of divide and conquer, named PSW-DC. Memory space required in the new parallel algorithm is reduced significantly in comparison with existing ones. A key technique, named the C&E method, is developed for implementation of the new parallel Smith-Waterman algorithm.
作者:
J. SaifH. KrawczykElectronics
Telecommunication and Informatics Faculty Gdansk University of Technology Gdansk Poland
The region of interest (ROI) matching problem is defined and its application to endoscopic diagnosis is shown. Two kinds of matching procedures are considered: random search and simulation annealing ones. The suitable...
详细信息
The region of interest (ROI) matching problem is defined and its application to endoscopic diagnosis is shown. Two kinds of matching procedures are considered: random search and simulation annealing ones. The suitable sequential and parallel algorithms are proposed and their suitability for ROI identification is discussed.
An efficient parallel algorithm for forward dynamics computation of human figures is proposed. The algorithm is capable of handling any kinematic chains including structure-varying ones. The asymptotic complexity of t...
详细信息
An efficient parallel algorithm for forward dynamics computation of human figures is proposed. The algorithm is capable of handling any kinematic chains including structure-varying ones. The asymptotic complexity of the algorithm is O(N) in serial computation and O(log N) in parallel computation on O(N) processors for most practical kinematic chains. The idea is to assemble a kinematic chain by adding the joints one by one and compute the constraint forces at the new joints using the principle of virtual work. The parallelism of the algorithm can be adapted for parallel processing systems with any number of processors by simply changing the assembly order. Simulation examples on an 8-node cluster demonstrate the effectiveness of the algorithm.
This paper develops a new parallel algorithm for computing the inverse of a banded matrix when extended in its maximum entropy sense. The algorithm developed here computes the inverse in two parallel steps. The first ...
详细信息
This paper develops a new parallel algorithm for computing the inverse of a banded matrix when extended in its maximum entropy sense. The algorithm developed here computes the inverse in two parallel steps. The first parallel step uses a modified Schur's complement technique to compute the individual inverses in each of the block matrices in parallel. The second parallel step then adds the overlapped sub-blocks inside the band. The parallel time complexity of our algorithm is O(bw/sup 3/) requiring n/((bw-1)/2)-1 processors, where the matrix is of size n/spl times/n having a bandwidth of bw. The parallel time required is independent of the size of the matrix and only depends upon the bandwidth of the matrix if n/((bw-1)/2)-1 processors are employed. We also provide a multithreaded implementation of the algorithm for use in SMP machines so that the algorithm can be used without requiring n/((bw-1)/2)-1 number of processors. Even in the serial implementation, the algorithm developed here is considerably better than existing serial algorithms for computing the banded inverse in the maximum entropy sense.
One of the outstanding challenges of computational science and engineering is large-scale nonlinear parameter estimation of systems governed by partial differential equations. These are known as inverse problems, in c...
详细信息
ISBN:
(纸本)9780769515243
One of the outstanding challenges of computational science and engineering is large-scale nonlinear parameter estimation of systems governed by partial differential equations. These are known as inverse problems, in contradistinction to the forward problems that usually characterize large-scale simulation. Inverse problems are significantly more difficult to solve than forward problems, due to ill-posedness, large dense ill-conditioned operators, multiple minima, space-time coupling, and the need to solve the forward problem repeatedly. We present a parallel algorithm for inverse problems governed by time-dependent PDEs, and scalability results for an inverse wave propagation problem of determining the material field of an acoustic medium. The difficulties mentioned above are addressed through a combination of total variation regularization, preconditioned matrix-free Gauss-Newton-Krylov iteration, algorithmic checkpointing, and multiscale continuation. We are able to solve a synthetic inverse wave propagation problem though a pelvic bone geometry involving 2.1 million inversion parameters in 3 hours on 256 processors of the Terascale Computing System at the Pittsburgh Supercomputing Center.
暂无评论