检索结果-内蒙古大学图书馆

48th International Conference on parallel Processing (ICPP)

作者： Yasugi, Masahiro Muraoka, Daisuke Hiraishi, Tasuku Umatani, Seiji Emoto, Kento Kyushu Inst Technol Iizuka Fukuoka Japan Kyoto Univ Kyoto Japan Kanagawa Univ Hiratsuka Kanagawa Japan SmartNews Inc Sakuragaoka Japan

ISBN: (纸本)9781450362955

This paper presents a new approach to fault-tolerant language systems without a single point of failure for irregular parallel applications. Work-stealing frameworks provide good load balancing for many parallel applications, including irregular ones written in a divide-and-conquer style. However, work-stealing frameworks with fault-tolerant features such as checkpointing do not always work well. This paper proposes a completely opposite "work omission" paradigm and its more detailed concept as a "hierarchical omission"-based parallel execution model called HOPE. HOPE programmers' task is to specify which regions in imperative code can be executed in sequential but arbitrary order and how their partial results can be accessed. HOPE workers spawn no tasks/threads at all;rather, every worker has the entire work of the program with its own planned execution order, and then the workers and the underlying message mediation systems automatically exchange partial results to omit hierarchical subcomputations. Even with fault tolerance, the HOPE framework provides parallel speedups for many parallel applications, including irregular ones.

关键词： fault tolerance work omission parallel execution model language systems

来源：评论

学校读者我要写书评

暂无评论

On the Difference Between Shared Memory and Shared Address Space in HPC Communication 1

引用

7th Asian Conference on Supercomputing Frontiers (SCFA)

作者： Hori, Atsushi Ouyang, Kaiming Gerofi, Balazs Ishikawa, Yutaka Natl Inst Informat Tokyo Japan Univ Calif Riverside Riverside CA 92521 USA RIKEN Ctr Computat Sci Kobe Hyogo Japan

ISBN: (数字)9783031104190

ISBN: (纸本)9783031104190;9783031104183

Shared memory mechanisms, e.g., POSIX shmem or XPMEM, are widely used to implement efficient intra-node communication among processes running on the same node. While POSIX shmem allows other processes to access only newly allocated memory, XPMEM allows accessing any existing data and thus enables more efficient communication because the send buffer content can directly be copied to the receive buffer. Recently, the shared address space model has been proposed, where processes on the same node are mapped into the same address space at the time of process creation, allowing processes to access any data in the shared address space. Process-in-Process (PiP) is an implementation of such mechanism. The functionalities of shared memory mechanisms and the shared address space model look very similar both allow accessing the data of other processes -, however, the shared address space model includes the shared memory model. Their internal mechanisms are also notably different. This paper clarifies the differences between the shared memory and the shared address space models, both qualitatively and quantitatively. This paper is not to showcase applications of the shared address space model, but through minimal modifications to an existing MPI implementation it highlights the basic differences between the two models. The following four MPI configurations are evaluated and compared;1) POSIX Shmem, 2) XPMEM, 3) PiP-Shmem, where intra-node communication is implemented to utilize POSIX shmem but MPI processes share the same address space, and 4) PiP-XPMEM, where XPMEM functions are implemented by the PiP library (without the need for linking to XPMEM library). Evaluation is done using the Intel MPI benchmark suite and six HPC benchmarks (HPCCG, miniGhost, LULESH2.0, miniMD, miniAMR and mpiGraph). Most notably, mpiGraph performance of PiP-XPMEM outperforms the XPMEM implementation by almost 1.5x. The performance numbers of HPCCG, miniGhost, miniMD, LULESH2.0 running with PiP-Shmem

关键词： Shared memory Shared address space parallel execution model HPC Communication MPI

来源：评论

学校读者我要写书评

暂无评论

Shelving a Code Block for Thread-Level Speculation 20

Shelving a Code Block for Thread-Level Speculation

引用

20th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and parallel/Distributed Computing (SNPD 2019)

作者： Matsunaga, Daiki Nunome, Atsushi Hirata, Hiroaki Kyoto Inst Technol Grad Sch Informat Sci Kyoto Japan Kyoto Inst Technol Fac Informat & Human Sci Kyoto Japan

ISBN: (纸本)9781728116518

Thread-Level Speculation (TLS) is an approach to enhance the opportunity of parallelization by executing tasks in parallel based on the assumption that the task has no dependencies on any earlier task in program order. But if any dependency is detected during the execution, the task should be aborted and re-executed. So the frequency of aborts is one of the factors that damage the performance of the speculative execution. In this paper we propose the "code shelving" scheme to avoid aborts or eliminate the penalty of the abort. We have implemented it on our TLS system, which is named Speculative Memory (SM), and investigated its performance characteristics. Our evaluation results reveal the code shelving can significantly improve the performance of pure speculation that does not use the code shelving.

关键词： thread-level speculation parallel execution model

来源：评论

学校读者我要写书评

暂无评论

Process-in-Process: Techniques for Practical Address-Space Sharing 18

Process-in-Process: Techniques for Practical Address-Space S...

引用

27th ACM International Symposium on High-Performance parallel and Distributed Computing (HPDC)

作者： Hori, Atsushi Si, Min Gerofi, Balazs Takagi, Masamichi Dayal, Jai Balaji, Pavan Ishikawa, Yutaka RIKEN Wako Saitama Japan Argonne Natl Lab Argonne IL 60439 USA Intel Corp Santa Clara CA 95051 USA

ISBN: (纸本)9781450357852

The two most common parallel execution models for many-core CPUs today are multiprocess (e.g., MPI) and multithread (e.g., OpenMP). The multiprocess model allows each process to own a private address space, although processes can explicitly allocate shared-memory regions. The multithreaded model shares all address space by default, although threads can explicitly move data to thread-private storage. In this paper, we present a third model called process-in-process (PiP), where multiple processes are mapped into a single virtual address space. Thus, each process still owns its process-private storage (like the multiprocess model) but can directly access the private storage of other processes in the same virtual address space (like the multithread model). The idea of address-space sharing between multiple processes itself is not new. What makes PiP unique, however, is that its design is completely in user space, making it a portable and practical approach for large supercomputing systems where porting existing OS-based techniques might be hard. The PiP library is compact and is designed for integrating with other runtime systems such as MPI and OpenMP as a portable low-level support for boosting communication performance in HPC applications. We showcase the uniqueness of the PiP environment through both a variety of parallel runtime optimizations and direct use in a data analysis application. We evaluate PiP on several platforms including two high-ranking supercomputers, and we measure and analyze the performance of PiP by using a variety of micro-and macro-kernels, a proxy application as well as a data analysis application.

关键词： parallel execution model Intra-Node Communication MPI In-situ

来源：评论

学校读者我要写书评

暂无评论

Layered models for General parallel Computation Based on Heterogeneous System

Layered Models for General Parallel Computation Based on Het...

引用

13th International Conference on parallel and Distributed Computing, Applications, and Technologies (PDCAT)

作者： Sheng, Yanxiu Gui, Lin Wei, Zhiqiang Duan, Jibing Liu, Yingying Ocean Univ China Coll Informat Sci & Engn Qingdao Shandong Peoples R China

ISBN: (纸本)9780769548791

The conventional unified parallel computation model becomes more and more complicated which has weak pertinence and little guidance for each parallel computing phase. Therefore, a general layered and heterogeneous idea for parallel computation model research was proposed in this paper. The general layered heterogeneous parallel computation model was composed of parallel algorithm design model, parallel programming model, parallel execution model, and each model correspond to the three computing phases respectively. The properties of each model were described and research spots were also given. In parallel algorithm design model, an advanced language was designed for algorithm designers, and the corresponding interpretation system which based on text scanning was proposed to map the advanced language to machine language that runs on the heterogeneous software and hardware architectures. The parallel method library and parameter library were also provided to achieve the comprehensive utilization of the different computing resources and assign parallel tasks reasonably. Theoretical analysis results show that the general layered heterogeneous parallel computation model is clear and single goaled for each parallel computing phase.

关键词： layered models heterogeneous system parallel algorithm design model interpretation system parallel programming model parallel execution model

来源：评论

学校读者我要写书评

暂无评论

A high-level Petri net for goal-directed semantics of Horn Clause Logic

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1996年第2期8卷 241-259页

作者： Jeffrey, J Lobo, J Murata, T UNIV ILLINOIS DEPT ELECT ENGN & COMP SCICHICAGOIL 60680

A new high-level Petri net (HLPN) model is introduced as a graphical syntax for Horn Clause Logic (HCL) programs. We call these nets: Horn Clause Logic Coal-Directed Nets (HCLGNs). It is shown that there is a bijection between the queried definite programs and the class of HCLGNs. In addition, a visualization of SLD-resolution is realized through the enabling and firing rules and net markings. The correctness of these rules with respect to SLD-resolution is also proven. Using these notions, we model SLD-refutations and failing computations. Through minor modification of the definition of HCLGNs for pure HCL programs and of the enabling and firing rules, it is shown how HCLGNs can be used to model built-in atoms and provide a new AND/OR-parallel execution model. HCLGNs have also been used to: model a subset of Prolog;provide a framework for modeling variations on SLD-resolution, such as SLD-ALG;specify an operational semantics for committed-choice (flat-guarded) concurrent logic languages using FGHC as an example. Recently, several software packages have become available for editing and executing HLPNs. These graphical editors can now play the same role that string editors have played for many years. The simulation capabilities of the HLPN software offer opportunities to perform automated, interactive code walk-throughs and also have potential for providing a framework for visual debugging environments. We note however that HCLGNs differ from the major classes of HLPNs for which software tools have been developed in primarily two ways: 1) the tokens in the markings can have variables;and 2) the firing of a transition may not only update the marking of the adjacent places, but may instantiate variables in tokens in the markings of places that are non-adjacent to the fired transition. Thus, the existing packages can only provide graphical syntax editing and are not appropriate for graphical simulation of HCLGNs. In the paper, we provide an algebraic characterization of

关键词： high-level Petri nets logic programming visual languages parallel execution model Prolog SLD-resolution SLD-ALG-resolution FGHC

来源：评论

学校读者我要写书评

暂无评论

DESIGN AND IMPLEMENTATION OF A parallel LOGIC PROGRAMMING SYSTEM

DESIGN AND IMPLEMENTATION OF A PARALLEL LOGIC PROGRAMMING SY...

引用

2ND INTERNATIONAL IEEE CONF ON TOOLS FOR ARTIFICIAL INTELLIGENCE ( TAI 90 )

作者： HU, SR GAO, YQ HWANG, ZY CI, YG Department of Computer Sciences Changsha Institute of Technology Changsha Hunan China

ISBN: (纸本)0818620846

A parallel logic programming system which includes a precompiler, a compiler, and an execution system is presented. An annotated parallel language which is a parallel extension of Prolog is introduced. The techniques used in the precompile phase, such as abstract interpretation and the CAAP (compiling approach for exploiting AND-parallelism) scheme, are described. An optimized compiler, the RAP/LOP (restricted AND-parallelism and limited OR-parallelism) parallel execution model, and the execution system are presented.

关键词： parallel logic programming system precompiler compiler execution system annotated parallel language Prolog abstract interpretation CAAP RAP/LOP parallel execution model execution system logic programming parallel programming program compilers PROLOG

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：