This work aims at distilling a systematic methodology to modernize existing sequential scientific codes with a limited re-designing effort, turning an old codebase into modern code, i.e., parallel and robust code. We ...
详细信息
ISBN:
(数字)9781728165820
ISBN:
(纸本)9781728165837
This work aims at distilling a systematic methodology to modernize existing sequential scientific codes with a limited re-designing effort, turning an old codebase into modern code, i.e., parallel and robust code. We propose an automatable methodology to parallelize scientific applications designed with a purely sequential programming mindset, thus possibly using global variables, aliasing, random number generators, and stateful functions. We demonstrate the methodology by way of an astrophysical application, where we model at the same time the kinematic profiles of 30 disk galaxies with a Monte Carlo Markov Chain (MCMC), which is sequential by definition. The parallel code exhibits a 12 times speedup on a 48-core platform.
This paper introduces a Grid software architecture offering fault tolerance, dynamic and aggressive load balancing and two complementary parallel programming paradigms. Experiments with financial applications on a rea...
详细信息
ISBN:
(纸本)9781424430116
This paper introduces a Grid software architecture offering fault tolerance, dynamic and aggressive load balancing and two complementary parallel programming paradigms. Experiments with financial applications on a real multi-site Grid assess this solution. This architecture has been designed to run industrial and financial applications, that are frequently time constrained and CPU consuming, feature both tightly and loosely coupled parallelism requiring generic programming paradigm, and adopt client-server business architecture.
The overwhelming wealth of parallelism exposed by Extreme-scale computing is rekindling the interest for finegrain multithreading, particularly at the intranode level. Indeed, popular parallel programming models, such...
详细信息
The overwhelming wealth of parallelism exposed by Extreme-scale computing is rekindling the interest for finegrain multithreading, particularly at the intranode level. Indeed, popular parallel programming models, such as OpenMP, are integrating fine-grain tasking in their newest standards. Yet, classical coarse-grain constructs are still largely preferred, as they are considered simpler to express parallelism. In this paper, we present a Multigrain parallel programming environment that allows programmers to use these well-known coarse-grain constructs to generate a fine-grain multithreaded application to be run on top of a fine-grain event-driven program execution model. Experimental results with four scientific benchmarks (Graph500, NAS Data Cube, NWChem-SCF, and ExMatEx's CoMD) show that fine-grain applications generated by and run on our environment are competitive and even outperform their OpenMP counterparts, especially for data-intensive workloads with irregular and dynamic parallelism, reaching speedups as high as 2.6x for Graph500 and 50x for NAS Data Cube.
Run-time systems are critical to the implementation of concurrent object oriented programming languages. The paper describes a concurrent object oriented programming language, Balinda C++, running on a distributed mem...
详细信息
Run-time systems are critical to the implementation of concurrent object oriented programming languages. The paper describes a concurrent object oriented programming language, Balinda C++, running on a distributed memory system and its run-time implementation. The run-time system is built on the top of the Nexus communication library. The tuplespace is the key to Balinda C++. A distributed tuplespace model is presented to improve data locality. Some experiments have been done to verify our model. The results indicate that our model is effective at improving system performance.
The use of continuations in the BBN Butterfly Lisp multiprocessor is examined. In Butterfly Lisp, continuations are first class objects, much like vectors or numbers, and are used to implement its parallel computing e...
详细信息
The use of continuations in the BBN Butterfly Lisp multiprocessor is examined. In Butterfly Lisp, continuations are first class objects, much like vectors or numbers, and are used to implement its parallel computing extensions. Continuations can also be used as a way of modeling parallelism in general. While they may be encoded, certain information remains invariant even as the model of parallelism being used changes. Following the continuation behaviour can yield valuable insight into the parallel structure of programming.< >
Towards gaining the performance improvement benefited from threaded MPI while supporting MPI standard well, in this paper, we propose a thread-based MPI program accelerator (MPIActor). MPIActor is a transparent middle...
详细信息
Towards gaining the performance improvement benefited from threaded MPI while supporting MPI standard well, in this paper, we propose a thread-based MPI program accelerator (MPIActor). MPIActor is a transparent middleware to assist general MPI libraries. People can choose to adopt or abandon MPIActor freely in compiling time for any MPI program (Currently only support C code). With the join of MPIActor, in each node, the MPI processes will be mapped as several threads of one process, and the intra-node point-to-point communication and collective communication will have been enhanced by take advantage of thread based mechanism. We have implemented the point-to-point communication module of our design and evaluated it on a real platform. Comparing with MVAPICH2, the experimental results of OSU PINGPONG benchmark show a significant performance improvement from 114% to 321% for transferring messages which size is between 4KB and 2MB.
Distributed shared objects are a well known approach to achieve independence of the memory model for parallel programming. The illusion of shared (global) objects is a convenient abstraction which leads to ease of pro...
详细信息
Distributed shared objects are a well known approach to achieve independence of the memory model for parallel programming. The illusion of shared (global) objects is a convenient abstraction which leads to ease of programming on both kinds of parallel architectures, shared memory and distributed memory machines. We present several different implementation variants for distributed shared objects on distributed platforms. We have considered these variants while implementing a high level parallel programming model known as coordinators (J. Knopp, 1996). These are global objects coordinating accesses to the encapsulated data according to statically defined access patterns. Coordinators have been implemented on both shared memory multiprocessors and networks of workstations (NOWs). We describe their implementation as distributed shared objects and give basic performance results on a NOW.
Fast Fourier Transform (FFT) is an important part of many applications, such as in wireless communication based on OFDM (Orthogonal Frequency Division Multiplexing). With Cloud Radio Access Networks, implementing FFTs...
详细信息
ISBN:
(纸本)9781479953424
Fast Fourier Transform (FFT) is an important part of many applications, such as in wireless communication based on OFDM (Orthogonal Frequency Division Multiplexing). With Cloud Radio Access Networks, implementing FFTs on multiprocessor clusters is a challenging task. For instance, supporting the Long Term Evolution (LTE) protocol requires processing 100 independent FFTs (with sizes ranging from 128 to 2048 points) in 66.7 μs. In this work, seven native FFT candidate implementations are compared. The considered implementation environments are: OpenMP (Open Multi-Processing) on 1 core, MPI (Message Passing Interface) on 1 core, 2 cores, and 3 cores, Hybrid OpenMP+MPI on 1 core and 3 cores, and MPI on an heterogeneous platform composed of Xeon-Phi and 3 cores. The reported experimental results show that the latter method meets the latency requirements of LTE. It is shown that the OpenMP and MPI paradigms running only on MICs (Many Integrated Cores) cannot benefit fully from the computing capability of many-core architectures. The heterogeneous combination of Xeon+MICs provides a better performance.
Two-terminal reliability is an important parameter in the design of reliable networks. The two-terminal reliability analysis problem for general stochastic networks is in the Hash P-complete class of computationally d...
详细信息
Two-terminal reliability is an important parameter in the design of reliable networks. The two-terminal reliability analysis problem for general stochastic networks is in the Hash P-complete class of computationally difficult problems. Thus, approximate methods are useful for quick reliability analysis. Two general algorithms for approximating the two-terminal reliability of hypercubes based on shortest paths between a source and a terminal are presented. Emphasis is placed on exploiting the regular structure of hypercubes to reduce the complexity of the computation.< >
This research presents some of the critical information required to understand the concept of parallel programming and the implementation of OpenMP in parallel programming. parallelism is the preferred tool for expedi...
详细信息
ISBN:
(纸本)9781665416351
This research presents some of the critical information required to understand the concept of parallel programming and the implementation of OpenMP in parallel programming. parallelism is the preferred tool for expediting an algorithm, as demonstrated by the evolution of computing architectures (multi-core and many-core) towards a greater number of processing cores. The report will focus on OpenMP parallel programming models and further examine its implementation and features. parallel programming OpenMP model is increasingly preferred for its ability to deliver real-time processing, thereby, meeting system requirements performance wise. Furthermore, the study of implementing OpenMP in enhancing the efficiency of 3D discontinuous deformation analysis (3D-DDA) for expansive simulation using parallel block Jacobi (BJ) and Pre-conditioned conjugate gradient (PCG) algorithms. The absence of synchronization of data in parallel programming makes the system more prone to errors in programming since the parallel environment is much more complicated than perceived. The studies performed will highlight how synchronization is managed using OpenMP model. In the field of biometrics, the most important issue faced in DNA sequencing and pattern discovery is locating the longest common subsequence (LCS) among sequences. To identify the LCS of DNA sequences, we will look into the solutions achieved using OpenMP tools based on CPU, that extend major improvements in processing speed, capital, and ubiquity, and the results based on the analysis are discussed.
暂无评论