Erbium and ytterbium doped fiber lasers are becoming important Sources from telecom to industrial applications. this work focuses on laser architectures for non-conventional telecommunication bands and high power puls...
详细信息
ISBN:
(纸本)9781424426256
Erbium and ytterbium doped fiber lasers are becoming important Sources from telecom to industrial applications. this work focuses on laser architectures for non-conventional telecommunication bands and high power pulsed Sources for micromachining and material processing.
Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches hav...
详细信息
ISBN:
(纸本)9783642246685
Proteins are one of the most vital macromolecules on the cellular level. In order to understand the function of a protein, its structure needs to be determined. For this purpose, different computational approaches have been introduced. Genetic algorithms can be used to search the vast space of all possible conformations of a protein in order to find its native structure. A framework for design of such algorithmsthat is both generic, easy to use and performs fast on distributed systems may help further development of genetic algorithm based approaches. We propose such a framework based on a parallel master-slave model which is implemented in C++ and Message Passing interface. We evaluated its performance on distributed systems with a different number of processors and achieved a linear acceleration in proportion to the number of processing units.
this paper introduces invasive computing, a new paradigm for programming parallelarchitectures. the goals are to enable the development and execution of resource aware programs that can dynamically allocate and free ...
详细信息
ISBN:
(纸本)9780889868649
this paper introduces invasive computing, a new paradigm for programming parallelarchitectures. the goals are to enable the development and execution of resource aware programs that can dynamically allocate and free new resources in phases with more parallelism. To allocate more resources, applications use the invade operation and to free them the retreat. the research is conducted within the framework of the Transregional Collaborative Research Centre 89 funded by the German Science Foundation.
Biological sequence comparison is an important tool for researchers in molecular biology. there are several algorithms for sequence comparison. the Smith-Waterman algorithm, based on dynamic programming, is one of the...
详细信息
Graph algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. this work presents an approach to transparently (without programmer intervention)...
详细信息
ISBN:
(纸本)9781467395243
Graph algorithms on distributed-memory systems typically perform heavy communication, often limiting their scalability and performance. this work presents an approach to transparently (without programmer intervention) allow fine-grained graph algorithms to utilize algorithmic communication reduction optimizations. In many graph algorithms, the same information is communicated by a vertex to its neighbors, which we coin algorithmic redundancy. Our approach exploits algorithmic redundancy to reduce communication between vertices located on different processing elements. We employ algorithmaware coarsening of messages sent during vertex visitation, reducing boththe number of messages and the absolute amount of communication in the system. To achieve this, the system structure is represented by a hierarchical graph, facilitating communication optimizations that can take into consideration the machine's memory hierarchy. We also present an optimization for small-world scale-free graphs wherein hub vertices (i.e., vertices of very large degree) are represented in a similar hierarchical manner, which is exploited to increase parallelism and reduce communication. Finally, we present a framework that transparently allows fine-grained graph algorithms to utilize our hierarchical approach without programmer intervention, while improving scalability and performance. Experimental results of our proposed approach on 131, 000+ cores show improvements of up to a factor of 8 times over the non-hierarchical version for various graph mining and graph analytics algorithms.
We present a generalized self-scattering method for generating carrier free flight times in Monte Carlo simulation. Compared to traditional approaches, the added flexibility of this approach results in fewer fictitiou...
We present a generalized self-scattering method for generating carrier free flight times in Monte Carlo simulation. Compared to traditional approaches, the added flexibility of this approach results in fewer fictitious scatterings, which is especially appealing for load balance and efficiency when a SIMD parallel computer is used. Speedups from 19% to 69% over an optimized variable-Gamma approach are shown for an implementation on the Connection Machine CM-2. the performance sensitivities to applied fields and grid spacings are also presented. the conversion of existing variable-Gamma software to this new approach requires only a few changes.
Digital FIR filters can be efficiently implemented using distributed arithmetic (DA). Original DA provides low throughput. parallel DA is proven to be a promising technique for efficient DA implementation. Block-based...
详细信息
In many applications, matrix multiplication involves different shapes of matrices. the shape of the matrix can significantly impact the performance of matrix multiplication algorithm. this paper describes extensions o...
详细信息
ISBN:
(纸本)0769521525
In many applications, matrix multiplication involves different shapes of matrices. the shape of the matrix can significantly impact the performance of matrix multiplication algorithm. this paper describes extensions of the SRUMMA parallel matrix multiplication algorithm [1] to improve performance of transpose and rectangular matrices. Our approach relies on a set of hybrid algorithms which are chosen based on the shape of matrices and transpose operator involved. the algorithm exploits performance characteristics of clusters and shared memory systems: it differs from the other parallel matrix multiplication algorithms by the explicit use of shared memory and remote memory access (RMA) communication rather than message passing. the experimental results on clusters and shared memory systems demonstrate consistent performance advantages over pdgemm from the ScaLAPACK parallel linear algebra package.
More and more computers use hybrid architectures combining multi-core processors and hardware accelerators such as graphics processing units (GPUs). We present in this paper a new method for scheduling efficiently par...
详细信息
More and more computers use hybrid architectures combining multi-core processors and hardware accelerators such as graphics processing units (GPUs). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the application can be processed either on a core (CPU) or on a GPU. the objective is to minimize the maximum completion time (makespan). the corresponding scheduling problem is Non-deterministic Polynomial (NP)-time hard, Copyright (c) 2014 John Wiley & Sons, Ltd.
We propose an improved version of the CGS method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. the proposed method combines elements of numerical stability an...
详细信息
ISBN:
(纸本)0769515126
We propose an improved version of the CGS method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices. the proposed method combines elements of numerical stability and parallel algorithm design without increasing computational costs. the algorithm is derived such that all matrix-vector multiplication, inner products and vector updates of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. therefore, the cost of global communication which represents the bottleneck of the performance can be significantly reduced. In this paper, the Bulk Synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. this performance model uses only a few system dependent parameters based on a simple and accurate cost modelling to provide useful insight in the time complexity of the method. the theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.
暂无评论