检索结果-内蒙古大学图书馆

Formal Verification of Practical MPI Programs

acm sigplan NOTICES 2009年第4期44卷 261-269页

作者： Vo, Anh Vakkalanka, Sarvani DeLisi, Michael Gopalakrishnan, Ganesh Kirby, Robert M. Thakur, Rajeev Univ Utah Sch Comp Salt Lake City UT 84112 USA Argonne Natl Lab Div Math & Comp Sci Argonne IL 60439 USA

This paper considers the problem of formal verification of MPI programs operating under a fixed test harness for safety properties without building verification models. In our approach, we directly model-check the MPI/C source code, executing its interleavings with the help of a verification scheduler. Unfortunately, the total feasible number of interleavings is exponential, and impractical to examine even for our modest goals. Our earlier publications formalized and implemented a partial order reduction approach that avoided exploring equivalent interleavings, and presented a verification tool called ISP. This paper presents algorithmic and engineering innovations to ISP, including the use of OpenMP parallelization, that now enables it to handle practical MPI programs, including: (i) ParMETIS - a widely used hypergraph partitioner, and (ii) MADRE - a Memory Aware Data Re-distribution Engine, both developed outside our group. Over these benchmarks, ISP has automatically verified up to 14K lines of MPI/C code, producing error traces of deadlocks and assertion violations within seconds.

关键词： Verification MPI Message Passing Interface distributed programming model checking dynamic partial order reduction

来源：评论

学校读者我要写书评

暂无评论

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP: Chairs' welcome

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 2008年 iii页

作者： Chatterjee, Sidhartha Scott, Michael L. IBM Research United States University of Rochester United States

来源：评论

学校读者我要写书评

暂无评论

parallel thinking 09

Parallel thinking

引用

proceedings of the 14th acm sigplan symposium on principles and practice of parallel programming

作者： Guy E. Blelloch Carnegie Mellon University Pittsburgh PA USA

ISBN: (纸本)9781605583976

Assuming that the multicore revolution plays out the way the microprocessor industry expects, it seems that within a decade most programming will involve parallelism at some level. One needs to ask how this affects the the way we teach computer science, or even how we have people think about computation. With regards to teaching there seem to be three basic choices: (1) we only train a small number of experts in parallel computation who develop a collection of libraries, and everyone else just uses them; (2) we leave our core curriculum pretty much as is, but add some advanced courses on parallelism or perhaps tack on a few lectures at the end of existing courses; or (3) we start teaching parallelism from the start and embed it throughout the curriculum with the idea of getting students to think about parallelism as the most natural form of computation and sequential computation as a special *** talk will examine some of the implications of the third option. It will argue that thinking about parallelism, when treated in an appropriate way, might be as easy or easier that thinking sequentially. A key prerequisite, however, is to identify what the core ideas in parallelism are and how they might be layered and integrated with existing concepts. Another more difficult issue is how to cleanly integrate these ideas among courses. After all much of the success of sequential computation follows from the concept of a random access machine and its ability to serve as a simple, albeit imperfect, interface between programming languages, algorithm analysis, and hardware design. The talk will go through an initial list of some core ideas in parallelism, and an approach to integrating these ideas between parallel algorithms, programming languages, and, to some extent, hardware. This requires, however, moving away from the concept of a machine model as a interface for thinking about computation.

关键词： algorithms education parallelism

来源：评论

学校读者我要写书评

暂无评论

Probabilistic Advanced Reservations for Batch-scheduled parallel Machines 08

Probabilistic Advanced Reservations for Batch-scheduled Para...

引用

acm sigplan symposium on principles and practice of parallel programming (PPoPP 08)

作者： Nurmi, Daniel Wolski, Rich Brevik, John Univ Calif Santa Barbara Santa Barbara CA 93106 USA

No abstract available.

ISBN: (纸本)9781595939609

No abstract available.

关键词： super-computing batch queue advance reservation

来源：评论

学校读者我要写书评

暂无评论

Enhancing the Performance of MPI-IO Applications by Overlapping I/O, Computation and Communication 08

Enhancing the Performance of MPI-IO Applications by Overlapp...

引用

acm sigplan symposium on principles and practice of parallel programming (PPoPP 08)

作者： Patrick, Christina M. Son, Seung Woo Kandemir, Mahmut Penn State Univ University Pk PA 16802 USA

No abstract available.

ISBN: (纸本)9781595939609

No abstract available.

关键词： MPI I/O Optimization parallel I/O Overlap

来源：评论

学校读者我要写书评

暂无评论

Automated Application-Level Checkpointing Based on Live-variable Analysis in MPI Programs 08

Automated Application-Level Checkpointing Based on Live-vari...

引用

acm sigplan symposium on principles and practice of parallel programming (PPoPP 08)

作者： Wang, Panfeng Yang, Xuejun Fu, Hongyi Du, Yunfei Wang, Zhiyuan Jia, Jia Natl Univ Def Technol Coll Comp Natl Lab Parallel & Distributed Proc Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9781595939609

This paper proposes an optimization method of data saving for application-level checkpointing based on the live-variable analysis method for MPI programs. We presents the implementation of a source-to-source precompiler (CAC) for automating application-level checkpointing based on the optimization method. The experiment shows that CAC is capable of automating application-level checkpointing correctly and reducing checkpoint data effectively.

关键词： fault-tolerance application-level checkpointing compliler-assisted live-variable analysis

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation of a High-Performance MPI for C# and the Common Language Infrastructure 08

Design and Implementation of a High-Performance MPI for C# a...

引用

acm sigplan symposium on principles and practice of parallel programming (PPoPP 08)

作者： Gregor, Douglas Lumsdaine, Andrew Indiana Univ Bloomington IN 47405 USA

ISBN: (纸本)9781595939609

As high-performance computing enters the mainstream, parallel programming mechanisms (including the Message Passing Interface, or MPI) must be supported in new environments such as C# and the Common Language Infrastructure (CLI). Making effective use of MPI with the CLI requires an interface that reflects the high-level object-oriented nature of C# and that also supports its programming idioms. However, for performance reasons, this high-level functionality must ultimately be mapped to low-level native MPI libraries. In addition to abstraction penalty concerns, avoiding unwanted overhead in this mapping process is significantly complicated by the safety and portability features of the CLI virtual machine, such as garbage collection and just-in-time compilation. In this paper, we describe our approach to using features of C# and the CLI-such as reflection, unsafe code regions, and run-time code generation-to realize an elegant, yet highly efficient, C# interface to MPI. Experimental results demonstrate that there is no appreciable overhead introduced by our approach when compared to the native MS-MPI library.

关键词： Generic programming message passing interface C#

来源：评论

学校读者我要写书评

暂无评论

Massive parallel LDPC Decoding on GPU 08

Massive Parallel LDPC Decoding on GPU

引用

acm sigplan symposium on principles and practice of parallel programming (PPoPP 08)

作者： Falcao, Gabriel Sousa, Leonel Silva, Vitor Univ Coimbra Inst Telecomunicacoes Dep Elect & Comp Eng P-3000 Coimbra Portugal

ISBN: (纸本)9781595939609

Low-Density Parity-Check (LDPC) codes are powerful error correcting codes (ECC). They have recently been adopted by several data communication standards such as DVB-S2 and WiMax. LDPCs are represented by bipartite graphs, also called Tanner graphs, and their decoding demands very intensive computation. For that reason, VLSI dedicated architectures have been investigated and developed over the last few years. This paper proposes a new approach for LDPC decoding on graphics processing units (GPUs). Efficient data structures and an new algorithm are proposed to represent the Tanner graph and to perform LDPC decoding according to the stream-based computing model. GPUs were programmed to efficiently implement the proposed algorithms by applying data-parallel intensive computing. Experimental results show that GPUs perform LDPC decoding nearly three orders of magnitude faster than modem CPUs. Moreover, they lead to the conclusion that GPUs with their tremendous processing power can be considered as a consistent alternative to state-of-the-art hardware LDPC decoders.

关键词： parallel processing Graphics Processing Unit Low-Density Parity-Check codes LDPC Computer Unified Device Architecture CUDA

来源：评论

学校读者我要写书评

暂无评论

FastForward for Efficient Pipeline parallelism A Cache-Optimized Concurrent Lock-Free Queue 08

FastForward for Efficient Pipeline Parallelism A Cache-Optim...

引用

acm sigplan symposium on principles and practice of parallel programming (PPoPP 08)

作者： Giacomoni, John Moseley, Tipp Vachharajani, Manish Univ Colorado Boulder CO 80309 USA

ISBN: (纸本)9781595939609

Low overhead core-to-core communication is critical for efficient pipeline-parallel software applications. This paper presents FastForward, a cache-optimized single-producer/single-consumer concurrent lock-free queue for pipeline parallelism on multicore architectures, with weak to strongly ordered consistency models. Enqueue and dequeue times on a 2.66 GHz Opteron 2218 based system are as low as 28.5 ns, up to 5x faster than the next best solution. FastForward's effectiveness is demonstrated for real applications by applying it to line-rate soft network processing on Gigabit Ethernet with general purpose commodity hardware.

关键词： FastForward linearizability lock-free multicore multiprocessors nonblocking synchronization pipeline parallel queue

来源：评论

学校读者我要写书评

暂无评论

Nested parallelism in Transactional Memory 08

Nested Parallelism in Transactional Memory

引用

acm sigplan symposium on principles and practice of parallel programming (PPoPP 08)

作者： Agrawal, Kunal Fineman, Jeremy T. Sukha, Jim MIT Comp Sci & Artificial Intelligence Lab Cambridge MA 02139 USA

ISBN: (纸本)9781595939609

This paper investigates adding transactions with nested parallelism and nested transactions to a dynamically multithreaded parallel programming language that generates only series-parallel programs. We describe XConflict, a data structure that facilitates conflict detection for a software transactional memory system which supports transactions with nested parallelism and unbounded nesting depth. For languages that use a Cilk-like work-stealing scheduler, XConflict answers concurrent conflict queries in O(1) time and can be maintained efficiently. In particular, for a program with T-1 work and a span (or critical-path length) of T, the running time on p processors of the program augmented with XConflict is only O(T-1/Tp+p). Using XConflict, we describe CWSTM, a runtime-system design for software transactional memory which supports transactions with nested parallelism and unbounded nesting depth of transactions. The CWSTM design provides transactional memory with eager updates, eager conflict detection, strong atomicity, and lazy cleanup on aborts. In the restricted case when no transactions abort and there are no concurrent readers, CWSTM executes a transactional computation on p processors also in time O(T-1/Tp+p). Although this bound holds only under rather optimistic assumptions, to our knowledge, this result is the first theoretical performance bound on a TM system that supports transactions with nested parallelism which is independent of the maximum nesting depth of transactions.

关键词： Cilk data structure fork join multithreading nested parallel computations series-parallel computations transactional memory transaction conflict detection work stealing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：