Processing of biological sequences is a compute-intensive problem. The amount of data available in biology is enormous that sequential techniques will take a very long time to process them. In this paper we present ma...
详细信息
ISBN:
(纸本)0780382927
Processing of biological sequences is a compute-intensive problem. The amount of data available in biology is enormous that sequential techniques will take a very long time to process them. In this paper we present many parallel algorithms for biological data processing. We consider a scenario where the operations performed on the sequences are arbitrary. In particular we assume that a sequential algorithm (or program) is given as an input (with the option of making several copies). The input consists of a file of sequences to be processed. The goal is to process all these sequences as efficiently as possible.
Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estima...
详细信息
ISBN:
(纸本)9781450369763
Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estimation suffers from high computational cost and memory consumption. Any algorithm for computing the kth order spectra on a dataset of size n needs Omega(n(k-1)) time since the output size will be Omega(n(k-1)) as well, which makes the direct HOS analysis difficult for long time series, and further prohibits its direct deployment to resource-limited and time-sensitive applications. Existing algorithms for computing HOS are either inefficient or have been implemented on obsolete architectures. Thus it is essential to develop efficient generic algorithms for HOS estimations. In this paper, we present a package of generic sequential and parallel algorithms for computationally and memory efficient HOS estimations which can be employed on any parallel machine or platform. Our proposed algorithms largely reduce the HOS' computational cost and memory usage in spectrum multiplication and smoothing steps through carefully designed prefix sum operations. Moreover, we employ a matrix partitioning technique and design algorithms with optimal memory usage and present the parallel approaches on the PRAM and the mesh models. Furthermore, we implement our algorithms for both bispectrum and trispectrum estimations. We conduct extensive experiments and cross-compare the proposed algorithms' performance. Results show that our algorithms achieve state-of-the-art computational and memory efficiency, and our parallel algorithms achieve close to linear speedups.
parallel algorithms for the one-dimensional and the twodimensional size-constrained maximum-sum segment problems are proposed. The problem, which is a variant of the classic maximum-sum segment problem, is to locate t...
详细信息
ISBN:
(纸本)9783642298226
parallel algorithms for the one-dimensional and the twodimensional size-constrained maximum-sum segment problems are proposed. The problem, which is a variant of the classic maximum-sum segment problem, is to locate the segment of the maximum total sum among those whose sizes are in a certain range, say, between L and U. It has several applications including pattern recognition, data mining, and DNA analyses, and the size requirement enables us to exclude trivial or useless results. Our parallel algorithms solve it in time O(N/P + log P) for one-dimensional arrays of length n and in time O(N-2 (U - L)/P + log P) for N x N two-dimensional arrays on EREW PRAM that consists of p processors. It is worth noting that they achieve asymptotically optimal parallel speedups compared with the best known sequential algorithms that take O(N) and O(N-3) times for the one- and the two-dimensional cases, respectively. Our algorithms are correct by their construction: they are systematically derived from their specifications based on the Bird-Meertens formalism.
Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scienti...
详细信息
ISBN:
(纸本)9781728192192
Existing work-efficient parallel algorithms for floating-point prefix sums exhibit either good performance or good numerical accuracy, but not both. Consequently, prefix-sum algorithms cannot easily be used in scientific-computing applications that require both high performance and accuracy. We have designed and implemented two new algorithms, called CAST_BLK and PAIR_BLK, whose accuracy is significantly higher than that of the high-performing prefix-sum algorithm from the Problem Based Benchmark Suite, while running with comparable performance on modern multicore machines. Specifically, the root mean squared error of the PBBS code on a large array of uniformly distributed 64-bit floating-point numbers is 8 times higher than that of CAST_BLK and 5.8 times higher than that of PAIR_BLK. These two codes employ the PBBS three-stage strategy for performance, but they are designed to achieve high accuracy, both theoretically and in practice. A vectorization enhancement to these two scalar codes trades off a small amount of accuracy to match or outperform the PBBS code while still maintaining lower error.
An optimalO(log logn)-time CRCW-PRAM algorithm for computing all period lengths of a string is presented. Previous parallel algorithms compute the period only if it is shorter than half of the length of the string. Th...
详细信息
Overview and experimental comparative study of parallel algorithms of asynchronous cellular at simulation is presented. The algorithms are tested for the model of physicochemical process of surface CO + O-2 reaction o...
详细信息
ISBN:
(纸本)9783642159787
Overview and experimental comparative study of parallel algorithms of asynchronous cellular at simulation is presented. The algorithms are tested for the model of physicochemical process of surface CO + O-2 reaction over the supported Pd nanoparticles on different parallel computers. For testing we use shared memory computers, distributed memory computers (i.e. clusters), and graphical processing unit. Characterization of these algorithms in respect of methods of parallelism maintenance is given.
In this paper parallel algorithms for solving the continuous-time algebraic Riccati equation both on shared memory multiprocessors and distributed memory multiprocessors are presented. The algorithms are based on the ...
详细信息
In this paper parallel algorithms for solving the continuous-time algebraic Riccati equation both on shared memory multiprocessors and distributed memory multiprocessors are presented. The algorithms are based on the matrix Sign function of the Hamiltonian matrix associated with the equation. In the case of shared memory multiprocessors, the use of LAPACK and BLAS-3 subroutines allows the generation of reliable, portable and efficient subroutines for solving the algebraic Riccati equation. In the case of distributed memory multiprocessors, the distribution of the data wrapped by blocks of columns among the processors and the use of LAPACK and BLAS-3 subroutines improves the efficiency of the algorithms as reported in the experimental results.
Starting from a permutation of {0,...,n-1} we compute in parallel with a workload of O(n log n) a compact data structure of size O(n log n). This data structure allows to obtain the associated permutation graph and th...
详细信息
ISBN:
(纸本)3540606181
Starting from a permutation of {0,...,n-1} we compute in parallel with a workload of O(n log n) a compact data structure of size O(n log n). This data structure allows to obtain the associated permutation graph and the transitive closure and reduction of the associated order of dimension 2 efficiently. The parallel algorithms obtained have a workload of O(m + n log n) where m is the number of edges of the permutation graph. They run in time O(log(2) n) on a CREW PRAM.
In this paper, we examine the advantages of using a distributed version of the Block Preconditioned Conjugate Gradient as solving algorithm for FEM models arising from electrothermal problems. The implementation is do...
详细信息
ISBN:
(纸本)0780359577
In this paper, we examine the advantages of using a distributed version of the Block Preconditioned Conjugate Gradient as solving algorithm for FEM models arising from electrothermal problems. The implementation is done using WMPI as programming environment in a cluster configuration. The targeted application being to model heat transfer in a dry type transformer, we intend to include fluid flow analysis in addition to the conduction analysis.
Coloured Petri nets have proved to be a useful formalism for modeling distributed algorithms, i.e., algorithms where nodes communicate via message passing. Here we describe an approach for automatic extraction of mode...
详细信息
ISBN:
(纸本)9783642351792
Coloured Petri nets have proved to be a useful formalism for modeling distributed algorithms, i.e., algorithms where nodes communicate via message passing. Here we describe an approach for automatic extraction of models of parallel algorithms and programs, i.e., algorithms and programs where processes communicate via shared memory. The models can be verified for correctness, here to prove absence of mutual exclusion violations and to find dead-and live-locks. This makes it possible to verify software using a model-extraction approach using coloured Petri nets, where a formal model is extracted from runnable code. We extract models in a manner so we can also support a model-driven development approach, where code is generated from a model, enabling a combined approach, supporting extracting a model from an abstract description and generation of correct implementation code. We illustrate our idea by applying the technique to a parallel implementation of explicit state-space exploration. Our approach builds on having a coloured Petri net model corresponding to the program and using the model to verify properties. We have already treated generation of code from coloured Petri nets, so in this paper we focus on the translation the other way around. We have an implementation of the translation from code to coloured Petri nets.
暂无评论