检索结果-内蒙古大学图书馆

23rd acm SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Kotsifakou, Maria Srivastava, Prakalp Sinclair, Matthew D. Komuravelli, Rakesh Adve, Vikram Adve, Sarita Univ Illinois Dept Comp Sci Champaign IL 61820 USA Qualcomm Technol Inc San Diego CA USA

ISBN: (纸本)9781450349826

We propose a parallel program representation for heterogeneous systems, designed to enable performance portability across a wide range of popular parallel hardware, including GPUs, vector instruction sets, multicore CPUs and potentially FPGAs. Our representation, which we call HPVM, is a hierarchical dataflow graph with shared memory and vector instructions. HPVM supports three important capabilities for programming heterogeneous systems: a compiler intermediate representation (IR), a virtual instruction set (ISA), and a basis for runtime scheduling;previous systems focus on only one of these capabilities. As a compiler IR, HPVM aims to enable effective code generation and optimization for heterogeneous systems. As a virtual ISA, it can be used to ship executable programs, in order to achieve both functional portability and performance portability across such systems. At runtime, HPVM enables flexible scheduling policies, both through the graph structure and the ability to compile individual nodes in a program to any of the target devices on a system. We have implemented a prototype HPVM system, defining the HPVM IR as an extension of the LLVM compiler IR, compiler optimizations that operate directly on HPVM graphs, and code generators that translate the virtual ISA to NVIDIA GPUs, Intel's AVX vector units, and to multicore X86-64 processors. Experimental results show that HPVM optimizations achieve significant performance improvements, HPVM translators achieve performance competitive with manually developed OpenCL code for both GPUs and vector hardware, and that runtime scheduling policies can make use of both program and runtime information to exploit the flexible compilation capabilities. Overall, we conclude that the HPVM representation is a promising basis for achieving performance portability and for implementing parallelizing compilers for heterogeneous parallel systems.

关键词： Virtual ISA Compiler parallel IR Heterogeneous Systems GPU Vector SIMD

来源：评论

学校读者我要写书评

暂无评论

Performance portable C++ programming with RAJA 19

Performance portable C++ programming with RAJA

引用

Proceedings of the 24th symposium on principles and practice of parallel programming

作者： David Beckingsale Richard Hornung Tom Scogland Arturo Vargas Lawrence Livermore National Laboratory

ISBN: (纸本)9781450362252

With the rapid change of computing architectures, and variety of programming models; the ability to develop performance portable applications has become of great importance. this is particularly true in large production codes where developing and maintaining hardware specific versions is *** simplify the development of performance portable code, we introduce RAJA, our C++ library that allows developers to write single-source applications that can target multiple hardware and programming model back-ends. We provide a thorough introduction to all of RAJA features, and walk through some hands-on examples that will allow attendees to understand how RAJA might benefit their own applications. Attendees should bring a laptop computer to participate in the hands-on *** tutorial will introduce attendees to RAJA, a C++ library for developing performance portable applications. Attendees will learn how to write performance portable code that can execute on a range of programming models (OpenMP, CUDA, Intel TBB, and HCC) and hardware (CPU, GPU, Xeon Phi).Specifically, attendees will learn how to convert existing C++ applications to use RAJA, and how to use RAJA's programming abstractions to expose existing parallelism in their applications without complex algorithm rewrites. We will also cover specific guidelines for using RAJA in a large application, including some common "gotchas" and how to handle memory management. Finally, attendees will learn how to categorize loops to allow for simple and systematic performance tuning on any architecture.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Declarative GUIs: Simple, Consistent, and Verified 18

Declarative GUIs: Simple, Consistent, and Verified

引用

20th International symposium on principles and practice of Declarative programming (PPDP)

作者： Adelsberger, Stephan Setzer, Anton Walkingshaw, Eric Vienna Univ Econ Dept Informat Syst & Operat A-1020 Vienna Austria Swansea Univ Dept Comp Sci Swansea SA2 8PP W Glam Wales Oregon State Univ Sch EECS Corvallis OR 97331 USA

ISBN: (纸本)9781450364416

Graphical user interfaces (GUIs) are ubiquitous in real-world software and a notorious source of bugs that are difficult to catch through software testing. Model checking has been used to prove the absence of certain kinds of bugs, but model checking works on an abstract model of the GUI application, which might be inconsistent with its implementation. We present a library for developing directly verified, state-dependent GUI applications in the dependently typed programming language Agda. In the library, the type of a GUI's controller depends on a specification of the GUI itself, statically enforcing consistency between them. Arbitrary properties can be defined and proved in terms of user interactions and state transitions. Our library connects to a custom-built Haskell back-end for declarative vector-based GUI elements. Compared to an earlier version of our library built on an existing imperative GUI framework, the more declarative back-end supports simpler definitions and proofs. As a practical application of our library to a safety-critical domain, we present a case study developed in cooperation with the Medical University of Vienna. the case study implements a healthcare process for prescribing anticoagulants, which is highly error-prone when followed manually. Our implementation generates GUIs from an abstract description of a data-aware business process, making our approach easy to reuse and adapt to other safety-critical processes. We prove medically relevant safety properties about the executable GUI application, such as that given certain inputs, certain states must or must not be reached.

关键词： Agda interactive theorem proving dependently typed programming graphical user interfaces GUI verification state-dependent GUIs reachability dependable software data-aware business processes verification of business processes

来源：评论

学校读者我要写书评

暂无评论

Groute: An Asynchronous Multi-GPU programming Model for Irregular Computations 17

Groute: An Asynchronous Multi-GPU Programming Model for Irre...

引用

22nd acm SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Ben-Nun, Tal Sutton, Michael Pai, Sreepathi Pingali, Keshav Hebrew Univ Jerusalem Jerusalem Israel Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9781450344937

Nodes with multiple GPUs are becoming the platform of choice for high-performance computing. However, most applications are written using bulk-synchronous programming models, which may not be optimal for irregular algorithms that benefit from low-latency, asynchronous communication. this paper proposes constructs for asynchronous multi-GPU programming, and describes their implementation in a thin runtime environment called Groute. Groute also implements common collective operations and distributed work-lists, enabling the development of irregular applications without substantial programming effort. We demonstrate that this approach achieves state-of-the-art performance and exhibits strong scaling for a suite of irregular applications on 8-GPU and heterogeneous systems, yielding over 7x speedup for some algorithms.

关键词： Multi-GPU Asynchronous programming Irregular Algorithms

来源：评论

学校读者我要写书评

暂无评论

Deadlock-free buffer configuration for stream computing

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2017年第5期31卷 441-450页

作者： Li, Peng Beard, Jonathan C. Buhler, Jeremy D. Washington Univ Dept Comp Sci & Engn St Louis MO 63130 USA

Stream computing is a popular paradigm for parallel and distributed computing, where compute nodes are connected by first-in first-out data channels. Each channel can be considered as a concatenation of several data buffers, including an output buffer for the sender and an input buffer for the receiver. the configuration of buffer sizes impacts the performance as well as the correctness of the application. In this article, we focus on application deadlocks that are caused by incorrect configuration of buffer sizes. We describe three types of deadlock in streaming applications, categorized by how they can be created. To avoid them, we first prove necessary and sufficient conditions for deadlock-free computations;then based on the theorems, we propose both compile-time and runtime solutions for deadlock avoidance.

关键词： Buffer configuration deadlock avoidance feedback channels parallel and distributed computing streaming computing

来源：评论

学校读者我要写书评

暂无评论

Tapir: Embedding Fork-Join parallelism into LLVM's Intermediate Representation 17

Tapir: Embedding Fork-Join Parallelism into LLVM's Intermedi...

引用

22nd acm SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Schardl, Tao B. Moses, William S. Leiserson, Charles E. MIT Comp Sci & Artificial Intelligence Lab 32 Vassar St Cambridge MA 02139 USA

ISBN: (纸本)9781450344937

this paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. these calls prevent the compiler from performing optimizations across parallel control constructs. Remedying this situation is generally thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir is a compiler IR that represents logically parallel tasks asymmetrically in the program's control flow graph. Tapir allows the compiler to optimize across parallel control constructs with only minor changes to its existing analyses and code transformations. To prototype Tapir in the LLVM compiler, for example, we added or modified about 6000 lines of LLVM's 4-million-line codebase. Tapir enables LLVM's existing compiler optimizations for serial code - including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination - to work with parallel control constructs such as spawning and parallel loops. Tapir also supports parallel optimizations such as loop scheduling.

关键词： Cilk compiling control-flow graph fork-join parallelism LLVM multicore OpenMP optimization parallel computing serial semantics Tapir

来源：评论

学校读者我要写书评

暂无评论

Eunomia: Scaling Concurrent Search Trees under Contention Using HTM 17

Eunomia: Scaling Concurrent Search Trees under Contention Us...

引用

22nd acm SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Wang, Xin Zhang, Weihua Wang, Zhaoguo Wei, Ziyun Chen, Haibo Zhao, Wenyun Fudan Univ Software Sch Shanghai Peoples R China Fudan Univ Shanghai Key Lab Data Sci Shanghai Peoples R China Fudan Univ Sch Comp Sci Shanghai Peoples R China Shanghai Jiao Tong Univ Inst Parallel & Distributed Syst Shanghai Peoples R China NYU Comp Sci Dept New York NY 10003 USA

ISBN: (纸本)9781450344937

While hardware transactional memory (HTM) has recently been adopted to construct efficient concurrent search tree structures, such designs fail to deliver scalable performance under contention. In this paper, we first conduct a detailed analysis on an HTM-based concurrent B+Tree, which uncovers several reasons for excessive HTM aborts induced by both false and true conflicts under contention. Based on the analysis, we advocate Eunomia, a design pattern for search trees which contains several principles to reduce HTM aborts, including splitting HTM regions with version based concurrency control to reduce HTM working sets, partitioned data layout to reduce false conflicts, proactively detecting and avoiding true conflicts, and adaptive con currency control. To validate their effectiveness, we apply such designs to construct a scalable concurrent B+Tree using HTM. Evaluation using key-value store benchmarks on a 20-core HTM-capable multi-core machine shows that Eunomia leads to 5X-11X speedup under high contention, while incurring small overhead under low contention.

关键词： Hardware Transactional Memory Concurrent Search Tree Opportunistic Consistency

来源：评论

学校读者我要写书评

暂无评论

Energy-optimal configuration selection for manycore chips with variation

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2017年第5期31卷 451-466页

作者： Langer, Akhil Totoni, Ehsan Palekar, Udatta Kale, Laxmikant V. Intel Fed 1906 Fox Dr Champaign IL 61820 USA Univ Illinois Coll Business Urbana IL 61801 USA Univ Illinois Dept Comp Sci 1304 W Springfield Ave Urbana IL 61801 USA

Operating chips at high energy efficiency is one of the major challenges for modern large-scale supercomputers. Low-voltage operation of transistors increases the energy efficiency but leads to frequency and power variation across cores on the same chip. Finding energy-optimal configurations for such chips is a hard problem. In this work, we study how integer linear programming techniques can be used to obtain energy-efficient configurations of chips that have heterogeneous cores. Our proposed methodologies give optimal configurations as compared with competent but sub-optimal heuristics while having negligible timing overhead. the proposed ParSearch method gives up to 13.2% and 7% savings in energy while causing only 2% increase in execution time of two HPC applications: miniMD and Jacobi, respectively. Our results show that integer linear programming can be a very powerful online method to obtain energy-optimal configurations.

关键词： energy power optimization multicore chips low-voltage computing near-threshold voltage computing process variation heterogeneity integer programming quadratic integer programming

来源：评论

学校读者我要写书评

暂无评论

Self-Checkpoint: An In-Memory Checkpoint Method Using Less Space and Its practice on Fault-Tolerant HPL 17

Self-Checkpoint: An In-Memory Checkpoint Method Using Less S...

引用

22nd acm SIGPLAN symposium on principles and practice of parallel programming (PPoPP)

作者： Tang, Xiongchao Zhai, Jidong Yu, Bowen Chen, Wenguang Zheng, Weimin Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China

ISBN: (纸本)9781450344937

Fault tolerance is increasingly important in high performance computing due to the substantial growth of system scale and decreasing system reliability. In-memory/diskless checkpoint has gained extensive attention as a solution to avoid the IO bottleneck of traditional disk-based checkpoint methods. However, applications using previous in-memory checkpoint suffer from little available memory space. To provide high reliability, previous in-memory checkpoint methods either need to keep two copies of checkpoints to tolerate failures while updating old checkpoints or trade performance for space by flushing in-memory checkpoints into disk. In this paper, we propose a novel in-memory checkpoint method, called self-checkpoint, which can not only achieve the same reliability of previous in-memory checkpoint methods, but also increase the available memory space for applications by almost 50%. To validate our method, we apply the self-checkpoint to an important problem, fault tolerant HPL. We implement a scalable and fault tolerant HPL based on this new method, called SKT-HPL, and validate it on two large-scale systems. Experimental results with 24,576 processes show that SKT-HPL achieves over 95% of the performance of the original HPL. Compared to the state-of-the-art in-memory checkpoint method, it improves the available memory size by 47% and the performance by 5%.

关键词： Fault Tolerance In-Memory Checkpoint Fault-Tolerant HPL Memory Consumption

来源：评论

学校读者我要写书评

暂无评论

RaftLib: A C plus plus template library for high performance stream parallel processing

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2017年第5期31卷 391-404页

作者： Beard, Jonathan C. Li, Peng Chamberlain, Roger D. ARM Res 5707 Southwest Pkwy Suite 100 Austin Austin TX 78735 USA Amazon Inc Seattle WA USA Washington Univ Dept Comp Sci & Engn St Louis MO 63130 USA

Stream processing is a compute paradigm that has been around for decades, yet until recently has failed to garner the same attention as other mainstream languages and libraries (e.g. C++, OpenMP, MPI). Stream processing has great promise: the ability to safely exploit extreme levels of parallelism to process huge volumes of streaming data. there have been many implementations, both libraries and full languages. the full languages implicitly assume that the streaming paradigm cannot be fully exploited in legacy languages, while library approaches are often preferred for being integrable with the vast expanse of extant legacy code. Libraries, however are often criticized for yielding to the shape of their respective languages. RaftLib aims to fully exploit the stream processing paradigm, enabling a full spectrum of streaming graph optimizations, while providing a platform for the exploration of integrability with legacy C/C++ code. RaftLib is built as a C++ template library, enabling programmers to utilize the robust C++ standard library, and other legacy code, along with RaftLib's parallelization framework. RaftLib supports several online optimization techniques: dynamic queue optimization, automatic parallelization, and real-time low overhead performance monitoring.

关键词： Stream processing big-data C plus plus template library high performance computing RaftLib performance monitoring parallel processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：