This paper presents a new approach to fault-tolerant language systems without a single point of failure for irregular parallel applications. Work-stealing frameworks provide good load balancing for many parallel appli...
详细信息
ISBN:
(纸本)9781450362955
This paper presents a new approach to fault-tolerant language systems without a single point of failure for irregular parallel applications. Work-stealing frameworks provide good load balancing for many parallel applications, including irregular ones written in a divide-and-conquer style. However, work-stealing frameworks with fault-tolerant features such as checkpointing do not always work well. This paper proposes a completely opposite "work omission" paradigm and its more detailed concept as a "hierarchical omission"-based parallel execution model called HOPE. HOPE programmers' task is to specify which regions in imperative code can be executed in sequential but arbitrary order and how their partial results can be accessed. HOPE workers spawn no tasks/threads at all;rather, every worker has the entire work of the program with its own planned execution order, and then the workers and the underlying message mediation systems automatically exchange partial results to omit hierarchical subcomputations. Even with fault tolerance, the HOPE framework provides parallel speedups for many parallel applications, including irregular ones.
A new high-level Petri net (HLPN) model is introduced as a graphical syntax for Horn Clause Logic (HCL) programs. We call these nets: Horn Clause Logic Coal-Directed Nets (HCLGNs). It is shown that there is a bijectio...
详细信息
A new high-level Petri net (HLPN) model is introduced as a graphical syntax for Horn Clause Logic (HCL) programs. We call these nets: Horn Clause Logic Coal-Directed Nets (HCLGNs). It is shown that there is a bijection between the queried definite programs and the class of HCLGNs. In addition, a visualization of SLD-resolution is realized through the enabling and firing rules and net markings. The correctness of these rules with respect to SLD-resolution is also proven. Using these notions, we model SLD-refutations and failing computations. Through minor modification of the definition of HCLGNs for pure HCL programs and of the enabling and firing rules, it is shown how HCLGNs can be used to model built-in atoms and provide a new AND/OR-parallel execution model. HCLGNs have also been used to: model a subset of Prolog;provide a framework for modeling variations on SLD-resolution, such as SLD-ALG;specify an operational semantics for committed-choice (flat-guarded) concurrent logic languages using FGHC as an example. Recently, several software packages have become available for editing and executing HLPNs. These graphical editors can now play the same role that string editors have played for many years. The simulation capabilities of the HLPN software offer opportunities to perform automated, interactive code walk-throughs and also have potential for providing a framework for visual debugging environments. We note however that HCLGNs differ from the major classes of HLPNs for which software tools have been developed in primarily two ways: 1) the tokens in the markings can have variables;and 2) the firing of a transition may not only update the marking of the adjacent places, but may instantiate variables in tokens in the markings of places that are non-adjacent to the fired transition. Thus, the existing packages can only provide graphical syntax editing and are not appropriate for graphical simulation of HCLGNs. In the paper, we provide an algebraic characterization of
The two most common parallel execution models for many-core CPUs today are multiprocess (e.g., MPI) and multithread (e.g., OpenMP). The multiprocess model allows each process to own a private address space, although p...
详细信息
ISBN:
(纸本)9781450357852
The two most common parallel execution models for many-core CPUs today are multiprocess (e.g., MPI) and multithread (e.g., OpenMP). The multiprocess model allows each process to own a private address space, although processes can explicitly allocate shared-memory regions. The multithreaded model shares all address space by default, although threads can explicitly move data to thread-private storage. In this paper, we present a third model called process-in-process (PiP), where multiple processes are mapped into a single virtual address space. Thus, each process still owns its process-private storage (like the multiprocess model) but can directly access the private storage of other processes in the same virtual address space (like the multithread model). The idea of address-space sharing between multiple processes itself is not new. What makes PiP unique, however, is that its design is completely in user space, making it a portable and practical approach for large supercomputing systems where porting existing OS-based techniques might be hard. The PiP library is compact and is designed for integrating with other runtime systems such as MPI and OpenMP as a portable low-level support for boosting communication performance in HPC applications. We showcase the uniqueness of the PiP environment through both a variety of parallel runtime optimizations and direct use in a data analysis application. We evaluate PiP on several platforms including two high-ranking supercomputers, and we measure and analyze the performance of PiP by using a variety of micro-and macro-kernels, a proxy application as well as a data analysis application.
The conventional unified parallel computation model becomes more and more complicated which has weak pertinence and little guidance for each parallel computing phase. Therefore, a general layered and heterogeneous ide...
详细信息
ISBN:
(纸本)9780769548791
The conventional unified parallel computation model becomes more and more complicated which has weak pertinence and little guidance for each parallel computing phase. Therefore, a general layered and heterogeneous idea for parallel computation model research was proposed in this paper. The general layered heterogeneous parallel computation model was composed of parallel algorithm design model, parallel programming model, parallel execution model, and each model correspond to the three computing phases respectively. The properties of each model were described and research spots were also given. In parallel algorithm design model, an advanced language was designed for algorithm designers, and the corresponding interpretation system which based on text scanning was proposed to map the advanced language to machine language that runs on the heterogeneous software and hardware architectures. The parallel method library and parameter library were also provided to achieve the comprehensive utilization of the different computing resources and assign parallel tasks reasonably. Theoretical analysis results show that the general layered heterogeneous parallel computation model is clear and single goaled for each parallel computing phase.
Shared memory mechanisms, e.g., POSIX shmem or XPMEM, are widely used to implement efficient intra-node communication among processes running on the same node. While POSIX shmem allows other processes to access only n...
详细信息
ISBN:
(数字)9783031104190
ISBN:
(纸本)9783031104190;9783031104183
Shared memory mechanisms, e.g., POSIX shmem or XPMEM, are widely used to implement efficient intra-node communication among processes running on the same node. While POSIX shmem allows other processes to access only newly allocated memory, XPMEM allows accessing any existing data and thus enables more efficient communication because the send buffer content can directly be copied to the receive buffer. Recently, the shared address space model has been proposed, where processes on the same node are mapped into the same address space at the time of process creation, allowing processes to access any data in the shared address space. Process-in-Process (PiP) is an implementation of such mechanism. The functionalities of shared memory mechanisms and the shared address space model look very similar both allow accessing the data of other processes -, however, the shared address space model includes the shared memory model. Their internal mechanisms are also notably different. This paper clarifies the differences between the shared memory and the shared address space models, both qualitatively and quantitatively. This paper is not to showcase applications of the shared address space model, but through minimal modifications to an existing MPI implementation it highlights the basic differences between the two models. The following four MPI configurations are evaluated and compared;1) POSIX Shmem, 2) XPMEM, 3) PiP-Shmem, where intra-node communication is implemented to utilize POSIX shmem but MPI processes share the same address space, and 4) PiP-XPMEM, where XPMEM functions are implemented by the PiP library (without the need for linking to XPMEM library). Evaluation is done using the Intel MPI benchmark suite and six HPC benchmarks (HPCCG, miniGhost, LULESH2.0, miniMD, miniAMR and mpiGraph). Most notably, mpiGraph performance of PiP-XPMEM outperforms the XPMEM implementation by almost 1.5x. The performance numbers of HPCCG, miniGhost, miniMD, LULESH2.0 running with PiP-Shmem
A parallel logic programming system which includes a precompiler, a compiler, and an execution system is presented. An annotated parallel language which is a parallel extension of Prolog is introduced. The techniques ...
详细信息
ISBN:
(纸本)0818620846
A parallel logic programming system which includes a precompiler, a compiler, and an execution system is presented. An annotated parallel language which is a parallel extension of Prolog is introduced. The techniques used in the precompile phase, such as abstract interpretation and the CAAP (compiling approach for exploiting AND-parallelism) scheme, are described. An optimized compiler, the RAP/LOP (restricted AND-parallelism and limited OR-parallelism) parallel execution model, and the execution system are presented.
Thread-Level Speculation (TLS) is an approach to enhance the opportunity of parallelization by executing tasks in parallel based on the assumption that the task has no dependencies on any earlier task in program order...
详细信息
ISBN:
(纸本)9781728116518
Thread-Level Speculation (TLS) is an approach to enhance the opportunity of parallelization by executing tasks in parallel based on the assumption that the task has no dependencies on any earlier task in program order. But if any dependency is detected during the execution, the task should be aborted and re-executed. So the frequency of aborts is one of the factors that damage the performance of the speculative execution. In this paper we propose the "code shelving" scheme to avoid aborts or eliminate the penalty of the abort. We have implemented it on our TLS system, which is named Speculative Memory (SM), and investigated its performance characteristics. Our evaluation results reveal the code shelving can significantly improve the performance of pure speculation that does not use the code shelving.
暂无评论