MILLIPEDE is a project aimed at developing a distributed shared memory environment for parallel programming. A major goal of this project is to support easy-to-grasp parallel programming languages that will also make ...
详细信息
MILLIPEDE is a project aimed at developing a distributed shared memory environment for parallel programming. A major goal of this project is to support easy-to-grasp parallel programming languages that will also make it straightforward to parallelize existing code. Other targets are forward compatibility and availability of both the user programs (hence the shared memory support and the C-like parallel language PARC) and the system itself (which is thus implemented in user-level and using the operating system exported services). Locality of memory references, which implies efficiency and speedups, Is maintained by MILLIPEDE using page and thread migration, through which dynamic load-balancing and weak memory are implemented. (C) 1997 by John Wiley & Sons, Ltd.
We present methods that can dramatically improve numerical consistency for parallel calculations across varying numbers of processors. By calculating global sums with enhanced precision techniques based on Kahan or Kn...
详细信息
We present methods that can dramatically improve numerical consistency for parallel calculations across varying numbers of processors. By calculating global sums with enhanced precision techniques based on Kahan or Knuth summations, the consistency of the numerical results can be greatly improved with minimal memory and computational cost. This study assesses the value of the enhanced numerical consistency in the context of general finite difference or finite volume calculations. (C) 2011 Elsevier B.V. All rights reserved.
Concurrent programming is very often used to program massively parallelalgorithms. Usually, an imperative programming language is used with a message passing communicationlibrary like message passing interface (MPI) o...
详细信息
Concurrent programming is very often used to program massively parallelalgorithms. Usually, an imperative programming language is used with a message passing communicationlibrary like message passing interface (MPI) or parallel virtual machine (PVM). This approach isvery general since it allows to define any parallel algorithm, including the details of itscommunication protocols. Nevertheless this freedom does not come for free, and the development ofsuch programs is difficult because they may contain indeterminism and deadlocks. This is confirmedby the high complexity of related validation problems. The semantics of a concurrent program beingin general very complex, the time required to run it (related to its operational semantics) is alsodifficult to determine, which hinders the portability of performances.
This paper describes the definition and implementation of an OpenMP-like set of directives and library routines for shared memory parallel programming in Java, A specification of the directives and routines is propose...
详细信息
This paper describes the definition and implementation of an OpenMP-like set of directives and library routines for shared memory parallel programming in Java, A specification of the directives and routines is proposed and discussed. A prototype implementation, consisting of a compiler and a runtime library, both written entirely in Java, is presented, which implements most of the proposed specification. Some preliminary performance results are reported. Copyright (C) 2001 John Wiley & Sons, Ltd.
The graphic processor unit (GPU) is an ideal solution to problems involving parallel data computations. A serial CPU-based program of dynamic analysis for multi-body systems is rebuilt as a parallel program that uses ...
详细信息
The graphic processor unit (GPU) is an ideal solution to problems involving parallel data computations. A serial CPU-based program of dynamic analysis for multi-body systems is rebuilt as a parallel program that uses the GPU's advantages. We developed an analysis code named GMAP to investigate how the dynamic analysis algorithm of multi-body systems is implemented in the GPU parallel programming. The numerical accuracy of GMAP is compared with the commercial program MSC/ADAMS. The numerical efficiency of GMAP is compared with the sequential CPU-based program. Multiple pendulums with bodies and joints and the net-shape system with bodies and spring-dampers are employed for computer simulations. The simulation results indicate that the accuracy of GMAP's solution is the same as that of ADAMS. In the net type system that has 2370 spring-dampers, GMAP indicates an improved efficiency of about 566.7 seconds (24.7% improvement). It is noted that the larger the size of the system, the better the time efficiency.
Application of parallel programming methods for simulating the impact of polymer dispersed systems on oil reservoirs on a hybrid computer system that uses the central processor cores along with the graphics processing...
详细信息
Application of parallel programming methods for simulating the impact of polymer dispersed systems on oil reservoirs on a hybrid computer system that uses the central processor cores along with the graphics processing unit is discussed. The efficiency of the proposed approach for solving practical problems of simulating waterflooding of oil reservoirs using polymer dispersed systems on computers with hybrid architecture is demonstrated.
Experimental results show that parallel programs can be evolved more easily than sequential programs in genetic parallel programming (GPP). GPP is a novel genetic programming paradigm which evolves parallel program so...
详细信息
Experimental results show that parallel programs can be evolved more easily than sequential programs in genetic parallel programming (GPP). GPP is a novel genetic programming paradigm which evolves parallel program solutions. With the rapid development of lookup-table-based (LUT-based) field programmable gate arrays (FPGAs), traditional circuit design and optimization techniques cannot fully exploit the LUTs in LUT-based FPGAs. Based on the GPP paradigm, we have developed a combinational logic circuit learning system, called GPP logic circuit synthesizer (GPPLCS), in which a multilogic-unit processor is used to evaluate LUT circuits. To show the effectiveness of the GPPLCS, we have performed a series of experiments to evolve combinational logic circuits with two- and four-input LUTs. In this paper, we present eleven multi-output Boolean problems and their evolved circuits. The results show that the GPPLCS can evolve more compact four-input LUT circuits than the well-known LUT-based FPGA synthesis algorithms.
The impact of the parallel programming model on scientific computing is examined. A comparison is made between SISAL, a functional language with implicit parallelism, and SR, an imperative language with explicit paral...
详细信息
The impact of the parallel programming model on scientific computing is examined. A comparison is made between SISAL, a functional language with implicit parallelism, and SR, an imperative language with explicit parallelism. Both languages are modern, high-level, concurrent programming languages. Five different scientific applications were programmed in each language, and evaluated for programmability and performance. The performance of these two concurrent languages on a shared-memory multiprocessor is compared to each other and to programs written in C with parallelism provided by library calls. (C) 1996 Academic Press, Inc.
Contribution This study reveals that the programming paradigm is relevant to obtain advanced programming skills. Background parallel computing has become mandatory for computer science students. The increasing amount ...
详细信息
Contribution This study reveals that the programming paradigm is relevant to obtain advanced programming skills. Background parallel computing has become mandatory for computer science students. The increasing amount of computational resources required by emerging applications need experienced programmers that fully exploit hardware resources. However, the hardware platforms and programming languages to leverage them evolve at a dizzying pace, making very challenging for students the successful learning of the continuously changing high-performance computing concepts. Research Questions (a) Is the learning curve of the programming language too steep to begin learning parallel programming fundamentals? (b) Are emergent learning methodologies making even more difficult to learn parallel programming in general? Methodology It is analyzed the main challenges for succeeding in parallel programming courses at the undergraduate level in two different learning modalities, namely on-campus and online. It is analyzed the main tools available within a learning management system, showing their impact on online studies. Findings Our results reveal that the steep learning curve for parallel programming is one of the main barriers to student success, leading to an early drop out of the subject. On-campus studies mitigate this problem through a close relationship between students and educators. Online studies, however, do not have this tight relationship by its definition.
The transactional memory in multicore processors has been a major research area over past several years. Many transactional memory systems have been proposed to be used to solve the synchronization problem of multicor...
详细信息
The transactional memory in multicore processors has been a major research area over past several years. Many transactional memory systems have been proposed to be used to solve the synchronization problem of multicore processors. Hardware transactional memory is one of the critical methods to speedup communications in multicore environment. In this paper, we give a review of the current hardware transactional memory systems for multicore processors. We take a top-down approach to characterizing and classifying various hardware transactional design issues and present a taxonomy of hardware transactional memory systems which is consist of the five fundamental design issues: version management, conflict detection, contention management, virtualization and nesting. Finally, we discussed the active research challenge: the relationship between transactional memory and Input/Output operations and system calls. Crown Copyright (C) 2010 Published by Elsevier BM. All rights reserved.
暂无评论