We identify the class of optimization problem expressible as independence systems that can be solved in real time using a parallel machine with polynomially bounded resources as being exactly the class of matroid for ...
详细信息
We identify the class of optimization problem expressible as independence systems that can be solved in real time using a parallel machine with polynomially bounded resources as being exactly the class of matroid for which the size of the optimal solution can be computed in parallel real time. We also extend previous results, showing that the solution obtained by a parallel algorithm is arbitrarily better than the solution reported by a sequential one not only for the real-time minimum-weight spanning tree (as previously known). Indeed, we show that, for all practical purposes, such a property does in fact hold for any optimization problem that falls into the aforementioned class.
We deal with the problem of analyzing fault susceptibility of a parallel algorithm designed for a multiprocessor array (MIMD structure). This algorithm realizes quite a complex communication protocol in the system. We...
详细信息
We deal with the problem of analyzing fault susceptibility of a parallel algorithm designed for a multiprocessor array (MIMD structure). This algorithm realizes quite a complex communication protocol in the system. We present an original methodology of the analysis based on the use of a software implemented fault injector. The considered algorithm is modeled as a multithreaded application. The experiment set up and results are presented and commented. The performed experiments proved relatively high natural robustness of the analyzed algorithm and showed further possibilities of its improvement.
We report the development of an SPMD parallel application which computes the macroscopic thermal dispersion in porous media. The performance of SPMD programs is strongly affected by dynamic load imbalancing factors. T...
详细信息
We report the development of an SPMD parallel application which computes the macroscopic thermal dispersion in porous media. The performance of SPMD programs is strongly affected by dynamic load imbalancing factors. The use of a suitable load balancing algorithm is essential for overcoming the effects of these imbalancing factors. We developed nine versions of the SPMD application, each one adopting a different load balancing strategy. The main contribution of this work is the performance evaluation and comparison of these nine versions. The experimental results showed the importance of using an appropriate load balancing strategy for the characteristics of this scientific parallel application.
We propose an improved version of the conjugate gradient squared (CGS) method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. The proposed method combines eleme...
详细信息
We propose an improved version of the conjugate gradient squared (CGS) method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. The proposed method combines elements of numerical stability and parallel algorithm design without increasing computational costs. The algorithm is derived such that all matrix-vector multiplication, inner products and vector updates of a single iteration step are independent and the communication time required for inner product can be overlapped efficiently with the computation time of vector updates. Therefore, the cost of global communication which represents the bottleneck of the performance can be significantly reduced. In this paper, the bulk synchronous parallel model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model uses only a few system dependent parameters based on a simple and accurate cost modelling to provide useful insight in the time complexity of the method. The theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from the ocean flow simulation.
This paper proposes a complete binary tree topology and two efficient migration methods in fine-grained parallel evolutionary algorithms (FGPEAs) to solve constrained numerical optimization problems. The design of eff...
详细信息
This paper proposes a complete binary tree topology and two efficient migration methods in fine-grained parallel evolutionary algorithms (FGPEAs) to solve constrained numerical optimization problems. The design of effective evolutionary algorithms (EAs) is to obtain a proper balance between exploration and exploitation. The balance can be controlled by the spread rate and the migration of the best individuals. A complete binary tree topology, which slows down the spread rate, is used for exploration to solve the heavily constrained problems. Two migration methods are also employed to prevent a superior individual from taking almost all the subpopulations and to facilitate the possibility of global search. One is the restriction of migration according to the migration times and the other is the modified individual migration by the mutation operators. The simulation results indicate that FGPEA using the proposed migration methods has better performance in constrained numerical optimization problems, and the FGPEA with the tree topology and the proposed migration methods shows good performance on heavily constrained numerical optimization problems.
The B-tree is a fundamental data structure that is used to access and update a large number of keys. In this paper we present a parallel algorithm on the EREW PRAM that deletes keys in a B-tree. Our algorithm runs in ...
详细信息
ISBN:
(纸本)0769515738
The B-tree is a fundamental data structure that is used to access and update a large number of keys. In this paper we present a parallel algorithm on the EREW PRAM that deletes keys in a B-tree. Our algorithm runs in O(t(log k+log, n)) time with k processors, where n is the number of keys in the B-tree, t is the minimum degree of the B-tree, and k is the number of unsorted keys to delete, and it improves upon the previous algorithm by a factor of t.
Since parallel computers have different performance ratios of computation and communication, the optimal computational block sizes are different from one another to generate the maximum performance of an algorithm. To...
详细信息
ISBN:
(纸本)0769505892
Since parallel computers have different performance ratios of computation and communication, the optimal computational block sizes are different from one another to generate the maximum performance of an algorithm. Too small or large a block size makes getting good performance on a machine nearly impossible. In such a case, getting a better performance may require a complete redistribution of the data matrix. We present PoLAPACK factorization routines, including LU, QR, and Cholesky factorizations, with an "algorithmic blocking" on 2-dimensional block cyclic data distribution. With the algorithmic blocking, it is possible to obtain the near optimal performance irrespective of the physical block size. The routines are implemented on the SGI/Cray T3E and compared with the corresponding ScaLAPACK factorization routines.
In this paper, the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers are studied. These methods ar...
详细信息
In this paper, the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers are studied. These methods are competitive with standard pointwise methods on serial architectures, but are significantly faster on parallel computers. We make use of an improved parallel version of the conjugate gradient squared method (ICGS) combining elements of numerical stability and parallel algorithm design, for solving the resulting sequence of time-varying sparse linear differential-algebraic initial-value problems arising at each linearization step with waveform Newton. We reorganize the algorithm such that all the inner products, matrix-vector multiplications and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the bottleneck of the performance, namely the cost of global communication on parallel distributed memory computers can be significantly reduced. The resulting ICGS algorithm maintains the favorable properties of the original algorithm while not increasing the computational costs.
Biological sequence comparison is an important tool for researchers in molecular biology. There are several algorithms for sequence comparison. The Smith-Waterman algorithm, based on dynamic programming, is one of the...
详细信息
ISBN:
(纸本)0769515126
Biological sequence comparison is an important tool for researchers in molecular biology. There are several algorithms for sequence comparison. The Smith-Waterman algorithm, based on dynamic programming, is one of the most fundamental algorithms in bioinformatics. However, the existing parallel Smith-Waterman algorithm needs large memory space. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith-Waterman algorithm has becoming a critical problem. For resolving this problem, we develop a new parallel Smith-Waterman algorithm using the method of divide and conquer, named PSW-DC. Memory space required in the new parallel algorithm is reduced significantly in comparison with existing ones. A key technique, named the C&E method, is developed for implementation of the new parallel Smith-Waterman algorithm.
作者:
J. SaifH. KrawczykElectronics
Telecommunication and Informatics Faculty Gdansk University of Technology Gdansk Poland
The region of interest (ROI) matching problem is defined and its application to endoscopic diagnosis is shown. Two kinds of matching procedures are considered: random search and simulation annealing ones. The suitable...
详细信息
The region of interest (ROI) matching problem is defined and its application to endoscopic diagnosis is shown. Two kinds of matching procedures are considered: random search and simulation annealing ones. The suitable sequential and parallel algorithms are proposed and their suitability for ROI identification is discussed.
暂无评论