We propose a parallel algorithm for the constrained multiple sequence alignment (CMSA) problem that seeks an optimal multiple alignment constrained to include a given pattern. We consider the dynamic programming compu...
详细信息
We propose a parallel algorithm for the constrained multiple sequence alignment (CMSA) problem that seeks an optimal multiple alignment constrained to include a given pattern. We consider the dynamic programming computations in layers indexed by the symbols of the given pattern. In each layer we compute as a potential part of an optimal alignment for the CMSA problem, shortest paths for multiple sources and multiple destinations. These shortest paths problems are independent from one another (which enables parallel execution), and each can be solved using an A* algorithm specialized for the shortest paths problem for multiple sources and multiple destinations. The final step of our algorithm solves a single source single destination shortest path problem. Our experiments on real sequences show that our algorithm is faster in general than the existing sequential dynamic programming solutions.
The GCA (global cellular automata) model is a very interesting and flexible model which can be used to implement all kind of parallel algorithms. The GCA model consists of afield of cells similar the cellular automata...
详细信息
The GCA (global cellular automata) model is a very interesting and flexible model which can be used to implement all kind of parallel algorithms. The GCA model consists of afield of cells similar the cellular automata model. Each cell has links to a set of remote cells which can be dynamically changed from generation to generation. A cell reads the remote neighbors' states and then changes its own state according to a local rule. The model is massively parallel because all cells can change their states independently and in parallel. We have investigated how the GCA model can be implemented efficiently in hardware using a field-programmable gate array (FPGA) prototyping platform. We have implemented a fully parallel architecture where all cells operate fully in parallel and other architectures where the cells are stored in memories in order to handle a large number of cells. We are showing that in the fully parallel architecture a speed-up of around 190 is realistic on a modern FPGA platform compared to a software implementation on a PC. In the partially parallel architecture based on memories the speed-up will be lower but the number of cells is only restricted by the capacity of the memories.
This paper investigated an optimal method of optimal power flow in large interconnected power grids. A decomposition collaborative model based on partial duality is analyzed, and a parallel algorithm based on DC optim...
详细信息
This paper investigated an optimal method of optimal power flow in large interconnected power grids. A decomposition collaborative model based on partial duality is analyzed, and a parallel algorithm based on DC optimal power flow model is presented in multi-region decomposition of interconnected power grids. The OPF computation of large power grid is decomposed into subproblems of multi regions, which is a quadratic programming problem used to solve a DC optimal power flow. The optimal convergence condition of multi-region is discussed. The interchange information among regions is export price and boundary nodal bus phase angle. The IEEE RTS-96 with two and three interconnected regions is studied to illustrate the effect of the proposed approach and to prove a great future in interconnected power system
Computing 1-D fast Fourier transform (FFT) using the classical 4-step FFT on parallel computers requires intensive all-to-all communication. This all-to-all communication significantly reduces the performance of FFT. ...
详细信息
Computing 1-D fast Fourier transform (FFT) using the classical 4-step FFT on parallel computers requires intensive all-to-all communication. This all-to-all communication significantly reduces the performance of FFT. In this paper, we present the no-communication algorithm that is a parallel algorithm for 1-D FFT without inter-processors communication. The advantage of this algorithm is the absence of all-to-all communication between processors. The disadvantage of this algorithm is the extra computation compared to the classical 4-step FFT. The no-communication algorithm has been implemented and tested in 8-node symmetric multiprocessors (SMP). The results show that the no-communication algorithm performs better than the 4-step FFT for relatively small data sizes. However, 4-step FFT algorithm performs better than the no-communication for relatively large data sizes.
In this paper by means of an abstract model of the SIMD type with vertical data processing (the STAR-machine), we present a simple associative parallel algorithm for finding tree paths in undirected graphs. We study a...
详细信息
ISBN:
(纸本)0769517315
In this paper by means of an abstract model of the SIMD type with vertical data processing (the STAR-machine), we present a simple associative parallel algorithm for finding tree paths in undirected graphs. We study applications of this algorithm to update minimum spanning trees in undirected graphs, to determine maximum flow values in a multiterminal network, and to find a fundamental set of circuits with respect to a given spanning tree. These algorithms are given as the corresponding STAR procedures whose correctness is proved and time complexity is evaluated.
In a cellular network, the base stations are not necessarily uniformly distributed, and their corresponding cell sizes are not necessarily the same. For example, a cell in a well-populated city cell is usually smaller...
详细信息
ISBN:
(纸本)0769517609
In a cellular network, the base stations are not necessarily uniformly distributed, and their corresponding cell sizes are not necessarily the same. For example, a cell in a well-populated city cell is usually smaller than a cell in a rural area. To study a cellular network with non-uniform cell sizes, one approach is to use a virtual cellular network with a uniform cell size such that each virtual cell contains at most one base station. This paper has proposed parallel algorithms for meshes with multiple broadcasting to construct virtual mesh and honeycomb cellular networks for non-uniformly distributed base stations. The constructed virtual cellular networks are optimal in the sense that their corresponding uniform cell sizes reach the largest possible. The algorithms run in O(logn) time on a mesh with multiple broadcasting of size nxn to construct optimal virtual mesh and honeycomb cellular networks for n non-uniformly distributed base stations. Furthermore, those algorithms are time-optimal.
In this paper we present new parallel versions of sequential Goertzel and Reinsch algorithms for calculating trigonometric sums. The new algorithms use a recently introduced, very efficient BLAS-based algorithm for so...
详细信息
ISBN:
(纸本)0769517315
In this paper we present new parallel versions of sequential Goertzel and Reinsch algorithms for calculating trigonometric sums. The new algorithms use a recently introduced, very efficient BLAS-based algorithm for solving linear recurrence systems with constant coefficients. To achieve their portability across different shared-memory parallel architectures, the algorithms have been implemented in Fortran 77 and OpenMP. We also present experimental results performed on a two processor Pentium III computer running under Linux operating system with Atlas as an efficient implementation of BLAS. The new algorithms are up to 60-90% faster than the equivalent sequential Goertzel and Reinsch algorithms, even on one processor.
In this paper, we develop parallel algorithms for pricing a class of multidimensional financial derivatives employing binomial lattice approach. We describe the algorithms, explain their complexities, and study their ...
详细信息
ISBN:
(纸本)0769516807
In this paper, we develop parallel algorithms for pricing a class of multidimensional financial derivatives employing binomial lattice approach. We describe the algorithms, explain their complexities, and study their performance. The limitations posed by the problem size on the recursive algorithm and the solution to overcome this problem by the iterative algorithm are explained through the experimental results using MPL We first present analytical results for the number of computations and communications for both the algorithms. Since the number of nodes in a recombining lattice grows linearly with the problem size, it is feasible to price long-dated options using a recombining lattice. We have extended our algorithm to handle derivatives with many underlying assets and shown that the multi-asset derivatives offer a better problem for parallel computation. This is very important for finance industry since long-dated derivatives with many underlying assets are common in practice.
A parallel algorithm and its hardware implementation are proposed for an inverse halftone operation. The algorithm is based on lookup tables from which the inverse halftone value of a pixel is directly determined usin...
详细信息
A parallel algorithm and its hardware implementation are proposed for an inverse halftone operation. The algorithm is based on lookup tables from which the inverse halftone value of a pixel is directly determined using a pattern of pixels. A method has been developed that allows accessing more than one value from the lookup table at any time. The lookup table is divided into smaller lookup tables, such that each pattern selected at any time goes to a separate smaller lookup table. The 15-pixel parallel version of the algorithm was tested on sample images and a simple and effective method has been used to overcome quality degradation due to pixel loss in the proposed algorithm. It can provide at least 4 times decrease in lookup table size when compared with a serial lookup table method implemented multiple times for the same number of pixels.
Expressed sequence tags, abbreviated ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and understanding important genetic variations...
详细信息
ISBN:
(纸本)0769516777
Expressed sequence tags, abbreviated ESTs, are DNA molecules experimentally derived from expressed portions of genes. Clustering of ESTs is essential for gene recognition and understanding important genetic variations such as those resulting in diseases. In this paper, we present the design and development of a parallel software system for EST clustering. To our knowledge, this is the first such effort to address the problem of EST clustering in parallel. The novel features of our approach include 1) design of space efficient algorithms to keep the space requirement linear in the size of the input data set, 2) a combination of algorithmic techniques to reduce the total work without sacrificing the quality of EST clustering, and 3) use of parallel processing to reduce the run-time and facilitate the clustering of large data sets. Using a combination of these techniques, we report the clustering of 81,414 Arabidopsis ESTs in under 2.5 minutes on a 64-processor IBM SP, a problem that is estimated to take 9 hours of run-time with a state-of-the-art software, provided the memory required to run the software can be made available.
暂无评论