Despite the fact that we are firmly in the multicore era, the use of parallel programming is not as widespread as it could be - in the software industry or in education. There have been many calls to incorporate more ...
详细信息
ISBN:
(纸本)9781467376853
Despite the fact that we are firmly in the multicore era, the use of parallel programming is not as widespread as it could be - in the software industry or in education. There have been many calls to incorporate more parallel programming content into undergraduate computer science education. One obstacle to doing this is that the programming languages most commonly used for parallel programming are detailed, low-level languages such as C, C++, Fortran (with OpenMP or MPI), OpenCL and CUDA. These languages allow programmers to write very efficient code, but that is not so important for those whose goal is to learn the concepts of parallel computing. This paper introduces a parallel programming language called Tetra which provides parallel programming features as first class language features, and also provides garbage collection and is designed to be as simple as possible. Tetra also includes an integrated development environment which is specifically geared for debugging parallel programs and visualizing program execution across multiple threads.
parallel task-based programming models like OpenMP support the declaration of task data dependences. This information is used to delay the task execution until the task data is available. The dependences between tasks...
详细信息
parallel task-based programming models like OpenMP support the declaration of task data dependences. This information is used to delay the task execution until the task data is available. The dependences between tasks are calculated at runtime using shared graphs that are updated concurrently by all threads. However, only one thread can modify the task graph at a time to ensure correctness; others need to wait before doing their modifications. This waiting limits the application's parallelism and becomes critical in many-core systems. This paper characterizes this behavior, analyzing how it hinders performance and presenting an alternative organization suitable for the runtimes of task-based programming models. This organization allows managing the runtime structures asynchronously or synchronously, adapting the runtime to reduce the waste of computation resources and increase theperformance. Results show that the new runtime structure outperforms the peak speedup of the original runtime model whencontention is huge and achieves similar or better performance results for real applications.
This paper proposed a virtualized self-adaptive heterogeneous high productivity computers parallel programming framework (VAPPF), which is composed of virtualization-based runtime system (VRTS) and virtualized adaptiv...
详细信息
This paper proposed a virtualized self-adaptive heterogeneous high productivity computers parallel programming framework (VAPPF), which is composed of virtualization-based runtime system (VRTS) and virtualized adaptive parallel programming model (VAPPM). Virtualization-based runtime system is composed of node-level virtual machine monitor (NVMM) and system-level virtual infrastructure (SVI). VAPPM program model is not only compatible with conventional data parallel, but also support task parallel. Moreover, with the concept of domains and virtualized process locale, virtualization-based runtime system can map between computation and processors according to system-level resources view and performance model. By conceal the hardware details through both runtime system level and programming model level by virtualization, the framework provides programmers a middle-level view independent of hardware details. Programmers can do their programming and debugging works on this middle-level view, and then, the runtime system map it into specific hardware environment. By this way, programming can be relatively separated from specific hardware architectures, this model realized an efficient work division between programmers and systems, and can help to improve the systempsilas programmability, scalability, portability, robustness, performance, and productivity.
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed m...
详细信息
This paper presents CHAOS-MCAPI (Communication Header and Operating Support-Multicore Communication API), an IPC mechanism targeting parallel programming based on message passing on multicore platforms. The proposed mechanism is built on top of the D-Bus protocol for message transmission, which allows a higher abstraction level and control when compared to lower-level mechanisms such as UNIX Pipes. Optimizations adopted by the implementation of CHAOS-MCAPI resulted in significant performance gains in relation to the original D-Bus implementation, which should be further improved by the adoption of KDBus, a 'zero-copy' mechanism recently made available natively in the Linux Kernel. That should make CHAOS-MCAPI a viable alternative for the design and implementation of parallel programs targeting multicore platforms, both in terms of scalability and programmer's productivity.
The Pact (parallel actions) parallel programming environment provides an easy-to-use parallel execution and synchronization model based on task parallelization. To give the programmer an abstraction for global data (e...
详细信息
The Pact (parallel actions) parallel programming environment provides an easy-to-use parallel execution and synchronization model based on task parallelization. To give the programmer an abstraction for global data (even on distributed memory machines) the Pact runtime system uses virtual shared memory. Execution's efficiency is improved with data-dependent dynamic load balancing and latency-masking by multithreaded servers. Fault tolerance in Pact is based on atomic actions and is guaranteed by the runtime system in a fully user-transparent way. This article describes the Pact runtime system's design together with its logging and recovery algorithms for an implementation on a massively parallel distributed memory computer.
The SB-PRAM is a lock-step-synchronous, massively parallel multiprocessor currently being built at Saarbrucken University, with up to 4096 RISC-style processing elements and with a (from the programmer's view) phy...
详细信息
The SB-PRAM is a lock-step-synchronous, massively parallel multiprocessor currently being built at Saarbrucken University, with up to 4096 RISC-style processing elements and with a (from the programmer's view) physically shared memory of up to SGByte with uniform memory access time. Fork95 is a redesign of the PRAM language FORK, based on ANSI C, with additional constructs to create parallel processes, hierarchically dividing processor groups into subgroups, managing shared and private address subspaces. Fork95 makes the assembly-level synchronicity of the underlying hardware available to the programmer at the language level. Nevertheless, it provides comfortable facilities for locally asynchronous computation where desired by the programmer. We show that Fork95 offers full expressibility for the implementation of practically relevant parallel algorithms. We do this by examining all known parallel programming paradigms used for the parallel solution of real-world problems, such as strictly synchronous execution, asynchronous processes, pipelining and systolic algorithms, parallel divide and conquer, parallel prefix computation, data parallelism, etc., and show how these parallel programming paradigms are supported by the Fork95 language and run time system.
parallel Java is a parallel programming API whose goals are (1) to support both shared memory (thread-based) parallel programming and cluster (message-based) parallel programming in a single unified API, allowing one ...
详细信息
parallel Java is a parallel programming API whose goals are (1) to support both shared memory (thread-based) parallel programming and cluster (message-based) parallel programming in a single unified API, allowing one to write parallel programs combining both paradigms; (2) to provide the same capabilities as OpenMP and MPI in an object oriented, 100% Java API; and (3) to be easily deployed and run in a heterogeneous computing environment of single-core CPUs, multi-core CPUs, and clusters thereof. This paper describes parallel Java's features and architecture; compares and contrasts parallel Java to other Java-based parallel middleware libraries; and reports performance measurements of parallel Java programs.
The CRAY APP is a highly parallel network compute server designed to accelerate Fortran and C programs in a UNIX environment. It can run complete programs in a simple shared memory environment, including support for U...
详细信息
The CRAY APP is a highly parallel network compute server designed to accelerate Fortran and C programs in a UNIX environment. It can run complete programs in a simple shared memory environment, including support for UNIX system calls. A very efficient HiPPI interface makes the CRAY APP cluster-capable and well-suited to interact with other programs running on a network in a client/server mode. Several novel features of the CRAY APP system design and programming environment allow for simple porting and incremental tuning of existing applications. These include a highly efficient microkernel operating system, low overhead library-based parallel support software, and a simple refinement to existing vectorization techniques called data vectorization.< >
The development of efficient parallel out-of-core applications is often tedious, because of the need to explicitly manage the movement of data between files and data structures of the parallel program. Several large-s...
详细信息
ISBN:
(纸本)9781424400546
The development of efficient parallel out-of-core applications is often tedious, because of the need to explicitly manage the movement of data between files and data structures of the parallel program. Several large-scale applications require multiple passes of processing over data too large to fit in memory, where significant concurrency exists within each pass. This paper describes a global-address-space framework for the convenient specification and efficient execution of parallel out-of-core applications operating on block-sparse data. The programming model provides a global view of block-sparse matrices and a mechanism for the expression of parallel tasks that operate on block-sparse data. The tasks are automatically partitioned into phases that operate on memory-resident data, and mapped onto processors to optimize load balance and data locality. Experimental results are presented that demonstrate the utility of the approach
The paper discusses the relationships between hierarchically composite MPP architectures and the software technology derived from the structured parallel programming methodology, in particular the architectural suppor...
详细信息
The paper discusses the relationships between hierarchically composite MPP architectures and the software technology derived from the structured parallel programming methodology, in particular the architectural support to successive modular refinements of parallel applications, and the architectural support to the parallel programming paradigms and their combinations. The structured parallel programming methodology referred here is an application of the Skeletons model. The considered hierarchically composite architectures are MPP machine models for PetaFlops computing, composed of proper combinations of current architectural models of different granularities, where the Processors-In-Memory model is adopted at the finest granularity level. The methodologies are discussed with reference to the current PQE2000 Project on MPP general purpose systems.
暂无评论