MrBayes, a popular program for Bayesian inference of phylogeny, has not been fast enough for Biologists when dealing with large real-world data sets. this paper presents a new parallel algorithm that combines the chai...
详细信息
ISBN:
(纸本)9783642131189
MrBayes, a popular program for Bayesian inference of phylogeny, has not been fast enough for Biologists when dealing with large real-world data sets. this paper presents a new parallel algorithm that combines the chain-partitioned parallel algorithm withthe chain-parallel algorithm to obtain higher concurrency. We test the proposed hybrid algorithm withthe two old algorithms on a heterogeneous cluster. the results show that, the hybrid algorithm actually converts more CPU cores into higher speedup compared withthe two control algorithms for all of four real-world DNA data sets, therefore is more practical.
Medical imaging provides physicians withthe ability to generate 3D images of the human body in order to detect and diagnose a wide variety of ailments. Making medical imaging portable and more accessible provides a u...
详细信息
ISBN:
(纸本)9781450301787
Medical imaging provides physicians withthe ability to generate 3D images of the human body in order to detect and diagnose a wide variety of ailments. Making medical imaging portable and more accessible provides a unique set of challenges. In order to increase portability, the power consumed in image acquisition - currently the most power-consuming activity in an imaging device - must be dramatically reduced. this can only be done, however, by using complex image reconstruction algorithms to correct artifacts introduced by low-power acquisition, resulting in image processing becoming the dominant power-consuming task. Current solutions use combinations of digital signal processors, general-purpose processors and, more recently, general-purpose graphics processing units for medical image processing. these solutions fall short for various reasons including high power consumption and an inability to execute the next generation of image reconstruction algorithms. this paper presents the MEDICS architecture a domain-specific multicore architecture designed specifically for medical imaging applications, but with sufficient generality to make it programmable. the goal is to achieve 100 GFLOPs of performance while consuming orders of magnitude less power than the existing solutions. MEDICS has a throughput of 128 GFLOPs while consuming as little as 1.6W of power on advanced CT reconstruction applications. this represents up to a 20X increase in computation efficiency over current designs.
We explore three commodity parallelarchitectures: multi-core CPUs, the Cell BE processor, and graphics processing units. We have implemented four algorithms on these three architectures: solving the heat equation, in...
详细信息
ISBN:
(纸本)9783642116193
We explore three commodity parallelarchitectures: multi-core CPUs, the Cell BE processor, and graphics processing units. We have implemented four algorithms on these three architectures: solving the heat equation, inpainting using the heat equation, computing the Mandelbrot set, and MJPEG movie compression. We use these four algorithms to exemplify the benefits and drawbacks of each parallel architecture.
In this paper MPI is used on PC Cluster to compute all the eigenvalues of Hermitian Toeplitz Matrices. the parallelalgorithms presented were implemented in C++ with MPI functions inserted and run on a cluster of Leno...
详细信息
ISBN:
(纸本)9783642131189
In this paper MPI is used on PC Cluster to compute all the eigenvalues of Hermitian Toeplitz Matrices. the parallelalgorithms presented were implemented in C++ with MPI functions inserted and run on a cluster of Lenovo thinkCentre machines running RedHat Linux. the two methods, MAHT-P one embarrassingly parallel and the other MPEAHT using master/ slave scheme are compared for performance and results presented. It is seen that computation time is reduced and speedup factor increases withthe number of computers used for the two parallel schemes presented. Load balancing becomes an issue as number of computers in a cluster are increased. A solution is provided to overcome such a case.
In this work we present a design space exploration of the memory subsystem of our configurable CoreVA VLIW architecture. the development of resource efficient processor architectures is based on a two-stage tool flow ...
详细信息
Multishift QR, algorithms are efficient for solving the symmetric tridiagonal eigenvalue problem on a parallel computer. In this paper, we focus on three variants of the multishift QR. algorithm, namely, the conventio...
详细信息
ISBN:
(纸本)9783642131356
Multishift QR, algorithms are efficient for solving the symmetric tridiagonal eigenvalue problem on a parallel computer. In this paper, we focus on three variants of the multishift QR. algorithm, namely, the conventional multishift QR algorithm, the deferred shift QR, algorithm and the fully pipelined multishift QR, algorithm, and construct performance models for them. Our models are designed for shared-memory parallel machines, and given the basic performance characteristics of the target;machine and the problem size, predict the execution time of these algorithms. Experimental results show that our models can predict the relative performance of these algorithms to the accuracy of 10% in many cases. thus our models are useful for choosing the best algorithm to solve a given problem in a specified computational enviromnent, as well as for finding the best value of the performance parameters.
Given two sorted arrays A = (a(1), a(2), ..., a(n)) and B = (b(1), b(2), ..., b(n)) of records such that (1) the n records are sorted according to one field which is called the key, and (2) the values of the keys are ...
详细信息
ISBN:
(纸本)9783642131356
Given two sorted arrays A = (a(1), a(2), ..., a(n)) and B = (b(1), b(2), ..., b(n)) of records such that (1) the n records are sorted according to one field which is called the key, and (2) the values of the keys are serial numbers. Merging data records has many applications in computer science especially in database. We develop an algorithm that runs in O(log n) time on EREW PRAM to merge two sorted arrays of records using n/log n processors even the keys of the data records are repeated. the algorithm is cost-optimal, deterministic, stable and uses linear number of space.
Robust and efficient parallel numerical algorithms and their implementation in easy-to-use portable software components are crucial for computational science and engineering applications. they are strongly influenced ...
详细信息
A parallel scheme for distributed memory hierarchy system is presented to solve the large-scale three-dimensional heat equation. Since managing interprocess communications and coordination is the main difficulty with ...
详细信息
ISBN:
(纸本)9783642131356
A parallel scheme for distributed memory hierarchy system is presented to solve the large-scale three-dimensional heat equation. Since managing interprocess communications and coordination is the main difficulty withthe system, the local physics/global algebraic object paradigm is introduced. Domain decomposition method is used to partition the modeling area, as well as the intensive computational effort and large memory requirement. Efficient storage and assembly of sparse matrix and parallel iterative solution of linear system are considered and developed. the efficiency and scalability of the parallel program are demonstrated by completing two experiments on Linux cluster, in which different preconditioning methods are tested and analyzed. And the results demonstrate this method could achieve desirable parallel performance.
In this paper, we propose an efficient algorithm for parallel prefix computation in recursive dual-net, a newly proposed network. the recursive dual-net RDNk (B) for k > 0 has (2n(0))(2k) /2 nodes and d(0) + k link...
详细信息
ISBN:
(纸本)9783642131189
In this paper, we propose an efficient algorithm for parallel prefix computation in recursive dual-net, a newly proposed network. the recursive dual-net RDNk (B) for k > 0 has (2n(0))(2k) /2 nodes and d(0) + k links per node, where no and do are the number of nodes and the node-degree of the base network B, respectively. Assume that each node holds one data item, the communication and computation time complexities of the algorithm for parallel prefix computation in RDNk (B),k > 0, are 2(k+1) - 2 + 2k * T-comm(0) and 2(k+1) - 2 + 2(k) * T-comp(0), respectively, where T-comm(0) and T-comp(0) are the communication and computation time complexities of the algorithm for parallel prefix computation in the base network B, respectively.
暂无评论