the Symposium materials contain 118 papers on new developments in parallelprocessing. algorithms, architectures, mapping/scheduling, applications, special-purpose architectures, interconnection networks, software, an...
详细信息
ISBN:
(纸本)0818626720
the Symposium materials contain 118 papers on new developments in parallelprocessing. algorithms, architectures, mapping/scheduling, applications, special-purpose architectures, interconnection networks, software, and distributed systems are among the main topics covered.
the computational algorithms for device synthesis and nondestructive evaluation (NDE) are often the same. In both we have a goal a particular field configuration yielding the design performance in synthesis or to matc...
详细信息
ISBN:
(纸本)9780735412125
the computational algorithms for device synthesis and nondestructive evaluation (NDE) are often the same. In both we have a goal a particular field configuration yielding the design performance in synthesis or to match exterior measurements in NDE. the geometry of the design or the postulated interior defect is then computed. Several optimization methods are available for this. the most efficient like conjugate gradients are very complex to program for the required derivative information the least efficient zeroth order algorithms like the genetic algorithm take much computational time but little programming effort. this paper reports launching a Genetic Algorithm kernel on thousands of compute unified device architecture (CUDA) threads exploiting the NVIDIA graphics processing unit (GPU) architecture. the efficiency of parallelization, although below that on shared memory supercomputer architectures, is quite effective in cutting down solution time into the realm of the practicable. We carry this further into multi-physics electro-heat problems where the parameters of description are in the electrical problem and the object function in the thermal problem. Indeed, this is where the derivative of the object function in the heat problem with respect to the parameters in the electrical problem is the most difficult to compute for gradient methods, and where the genetic algorithm is most easily implemented.
A parallel design and implementation of FM-index is presented in this paper. In applications, the performance of the FM-index is crucial, which is a self-contained, highly compressed indexing algorithm. Withthe popul...
详细信息
ISBN:
(纸本)9780769533520
A parallel design and implementation of FM-index is presented in this paper. In applications, the performance of the FM-index is crucial, which is a self-contained, highly compressed indexing algorithm. Withthe popularity of multi-core processors, parallel computing allows the FM-index to run faster by performing multiple computations simultaneously when possible. Our approach works by splitting input data into overlapping blocks with equal size, and running them through the FM-index algorithm simultaneously on multiple processors. After analyzing and refactoring the sequential version, we organize the data flows of all operations according to a unified parallel frame-work. the experimental results show, that, in general our approach has achieved a significant and sub-linear speedup on widespread symmetrical multi-processingarchitectures. this will greatly reduce the running time of executing operations oil large data sets.
this work deals with evaluation of hardware implementations of image processingalgorithms for real time applications, using SRAM based Field Programmable Gate Arrays. We discuss a generic architectural model adapted ...
详细信息
the solution of large-scale problems in Computational Science and Engineering relies on the availability of accurate, robust and efficient numerical algorithms and software that are able to exploit the power offered b...
详细信息
ISBN:
(纸本)9783642400476
the solution of large-scale problems in Computational Science and Engineering relies on the availability of accurate, robust and efficient numerical algorithms and software that are able to exploit the power offered by modern computer architectures. Such algorithms and software provide building blocks for prototyping and developing novel applications, and for improving existing ones, by relieving the developers from details concerning numerical methods as well as their implementation in new computing environments.
In several digital signal processingalgorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. parallel computation of such algorithms with reduced number of proce...
详细信息
ISBN:
(纸本)0769522262
In several digital signal processingalgorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, a systematic design methodology for stride permutation networks is derived. these permutations are represented with Boolean matrices, which are decomposed and mapped directly onto register-based networks. the resulting networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Since the proposed methodology is systematic, it can be exploited in automated design generation.
Active messages have proven to be an effective approach for certain communication problems in high performance computing. Many MPI implementations, as well as runtimes for Partitioned Global Address Space languages, u...
详细信息
ISBN:
(纸本)9781450301787
Active messages have proven to be an effective approach for certain communication problems in high performance computing. Many MPI implementations, as well as runtimes for Partitioned Global Address Space languages, use active messages in their low-level transport layers. However, most active message frameworks have low-level programming interfaces that require significant programming effort to use directly in applications and that also prevent optimization opportunities. In this paper we present AM++, a new user-level library for active messages based on generic programming techniques. Our library allows message handlers to be run in an explicit loop that can be optimized and vectorized by the compiler and that can also be executed in parallel on multicore architectures. Runtime optimizations, such as message combining and filtering, are also provided by the library, removing the need to implement that functionality at the application level. Evaluation of AM++ with distributed-memory graph algorithms shows the usability benefits provided by these library features, as well as their performance advantages.
Within the fusion community, a large number of codes are in use to simulate various aspects of the plasma behaviour. Many codes have been written by physicists with a big emphasis on the physics without using the late...
详细信息
ISBN:
(纸本)9780769543284
Within the fusion community, a large number of codes are in use to simulate various aspects of the plasma behaviour. Many codes have been written by physicists with a big emphasis on the physics without using the latest technologies available in computer science. For improving this situation, a project with acronym EUFORIA [1] was created. the main target of this project is to increase the performance of key existing codes, either parallelizing the sequential codes or improving their parallelization. However, in general these codes were not thought for new supercomputers based on muticore architectures and only take advantage of the task level parallelism, basically using Message Passing Interface (MPI). In this paper we discuss several possible ways to apply hybrid parallelization techniques in fusion codes for exploiting new multiprocessor supercomputers under development.
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by al...
详细信息
ISBN:
(纸本)0769507166
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the frill-size array mapped into one processor of the partitioned processor array in art arbitrary order: Several constraints are derived to ensure the causality of computations and to prevent access conflicts to bath modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. the proposed methods are also applicable for the mapping of algorithms to parallelarchitectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible.
暂无评论