the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recent...
详细信息
ISBN:
(纸本)9783642144028
the availability of real parallelism in multi-core based architectures has resurrected the interest in concurrent computing in general, and parallel computing in particular New languages and libraries have been recently proposed to increase productivity in the context of these architectures In this paper we present a novel approach that resorts to the service abstraction for annotating parallelism
Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple proce...
详细信息
ISBN:
(纸本)9783642143892
Nowadays GPUs become extremely promising multi/many-core architectures for a wide range of demanding applications. Basic features of these architectures include utilization of a large number of relatively simple processing units which operate in the SIMD fashion, as well as hardware supported, advanced multithreading. However, the utilization of GPUs in an every-day practice is still limited, mainly because of necessity of deep adaptation of implemented algorithms to a target architecture. hi this work, we propose how to perform such an adaptation to achieve an efficient parallel implementation of the conjugate gradient (CG) algorithm, which is widely used for solving large sparse linear systems of equations, arising e.g. in FEM problems. Aiming at efficient;implementation of the main operation of the CG algorithm, which is sparse matrix-vector multiplication (SpMV), different techniques of optimizing access to the hierarchical memory of GPUs are proposed and studied. the experimental investigation of a proposed CUDA-based implementation of the CG algorithm is carried out On two CPU architectures: GeForce 8800 and Tesla C1060. It has been shown that optimization of access to CPU memory allows us to reduce considerably the execution time of the SpMV operation, and consequently to achieve a significant speedup over CPUs when implementing the whole CC algorithm.
Many recent studies have revealed that the Optical Transpose Interconnection Systems (OTIS) are promising candidates for future high-performance parallel computers. In this paper, we present and evaluate a general met...
详细信息
ISBN:
(纸本)9783642131356
Many recent studies have revealed that the Optical Transpose Interconnection Systems (OTIS) are promising candidates for future high-performance parallel computers. In this paper, we present and evaluate a general method for algorithm development on the OTIS-Arrangement network (OTIS-AN) as an example of OTIS network. the proposed method could be used and customized for any other OTIS network. Furthermore it allows efficient mapping of a wide class of algorithms into the OTIS-AN. this method is based on grids as popular structure that support a vast body of parallel applications including linear algebra, divide-and-conquer type of algorithms, sorting, and FFT computation. this study confirms the viability of the OTIS-AN as an attractive alternative for large-scale parallelarchitectures.
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the sy...
详细信息
ISBN:
(纸本)9783642143892
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric eigenvalue problem, on general-purpose multicore processors. In response to the advances of hardware accelerators, we also modify the code in SBR. to accelerate the computation by off-loading a significant part of the operations to a graphics processor (GPU). Performance results illustrate the parallelism and scalability of these algorithms on current high-performance multi-core architectures.
this paper describes a new parallel Branch-and-Bound algorithm for solving the classical permutation flow shop scheduling problem as well as its implementation on a cluster of six computers. the experimental study of ...
详细信息
ISBN:
(纸本)9783642131356
this paper describes a new parallel Branch-and-Bound algorithm for solving the classical permutation flow shop scheduling problem as well as its implementation on a cluster of six computers. the experimental study of our distributed parallel algorithm gives promising results and shows clearly the benefit of the parallel paradigm to solve large-scale instances in moderate CPU time.
MrBayes, a popular program for Bayesian inference of phylogeny, has not been fast enough for Biologists when dealing with large real-world data sets. this paper presents a new parallel algorithm that combines the chai...
详细信息
ISBN:
(纸本)9783642131189
MrBayes, a popular program for Bayesian inference of phylogeny, has not been fast enough for Biologists when dealing with large real-world data sets. this paper presents a new parallel algorithm that combines the chain-partitioned parallel algorithm withthe chain-parallel algorithm to obtain higher concurrency. We test the proposed hybrid algorithm withthe two old algorithms on a heterogeneous cluster. the results show that, the hybrid algorithm actually converts more CPU cores into higher speedup compared withthe two control algorithms for all of four real-world DNA data sets, therefore is more practical.
Medical imaging provides physicians withthe ability to generate 3D images of the human body in order to detect and diagnose a wide variety of ailments. Making medical imaging portable and more accessible provides a u...
详细信息
ISBN:
(纸本)9781450301787
Medical imaging provides physicians withthe ability to generate 3D images of the human body in order to detect and diagnose a wide variety of ailments. Making medical imaging portable and more accessible provides a unique set of challenges. In order to increase portability, the power consumed in image acquisition - currently the most power-consuming activity in an imaging device - must be dramatically reduced. this can only be done, however, by using complex image reconstruction algorithms to correct artifacts introduced by low-power acquisition, resulting in image processing becoming the dominant power-consuming task. Current solutions use combinations of digital signal processors, general-purpose processors and, more recently, general-purpose graphics processing units for medical image processing. these solutions fall short for various reasons including high power consumption and an inability to execute the next generation of image reconstruction algorithms. this paper presents the MEDICS architecture a domain-specific multicore architecture designed specifically for medical imaging applications, but with sufficient generality to make it programmable. the goal is to achieve 100 GFLOPs of performance while consuming orders of magnitude less power than the existing solutions. MEDICS has a throughput of 128 GFLOPs while consuming as little as 1.6W of power on advanced CT reconstruction applications. this represents up to a 20X increase in computation efficiency over current designs.
We explore three commodity parallelarchitectures: multi-core CPUs, the Cell BE processor, and graphics processing units. We have implemented four algorithms on these three architectures: solving the heat equation, in...
详细信息
ISBN:
(纸本)9783642116193
We explore three commodity parallelarchitectures: multi-core CPUs, the Cell BE processor, and graphics processing units. We have implemented four algorithms on these three architectures: solving the heat equation, inpainting using the heat equation, computing the Mandelbrot set, and MJPEG movie compression. We use these four algorithms to exemplify the benefits and drawbacks of each parallel architecture.
the aim of this paper is to show that a kind of boundary value problem for second-order ordinary differential equations which reduces to the problem of solving tridiagonal system of linear equations with almost Toepli...
详细信息
ISBN:
(纸本)9783642143892
the aim of this paper is to show that a kind of boundary value problem for second-order ordinary differential equations which reduces to the problem of solving tridiagonal system of linear equations with almost Toeplitz structure can be efficiently solved on modern multicore architectures using a parallel tiled algorithm based on the divide and conquer approach for solving linear recurrence systems with constant coefficients and novel data formats for dense matrices.
In this paper MPI is used on PC Cluster to compute all the eigenvalues of Hermitian Toeplitz Matrices. the parallelalgorithms presented were implemented in C++ with MPI functions inserted and run on a cluster of Leno...
详细信息
ISBN:
(纸本)9783642131189
In this paper MPI is used on PC Cluster to compute all the eigenvalues of Hermitian Toeplitz Matrices. the parallelalgorithms presented were implemented in C++ with MPI functions inserted and run on a cluster of Lenovo thinkCentre machines running RedHat Linux. the two methods, MAHT-P one embarrassingly parallel and the other MPEAHT using master/ slave scheme are compared for performance and results presented. It is seen that computation time is reduced and speedup factor increases withthe number of computers used for the two parallel schemes presented. Load balancing becomes an issue as number of computers in a cluster are increased. A solution is provided to overcome such a case.
暂无评论