检索结果-内蒙古大学图书馆

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis

作者： Yizi Gu John Mellor-Crummey Rice University

ISBN: (纸本)9781538683859

Two concurrent accesses to a shared variable that are unordered by synchronization are said to be a data race if at least one access is a write. Data races cause shared memory parallel programs to behave unpredictably. This paper describes ROMP - a tool for detecting data races in executions of scalable parallel applications that employ OpenMP for node-level parallelism. The complexity of OpenMP, which includes primitives for managing data environments, SPMD and SIMD parallelism, work sharing, tasking, mutual exclusion, and ordering, presents a formidable challenge for data race detection. ROMP is a hybrid data race detector that tracks accesses, access orderings and mutual exclusion. Unlike other OpenMP race detectors, ROMP detects races with respect to concurrency rather than implementation threads. Experiments show that ROMP yields precise race reports for a broader set of OpenMP constructs than prior state-of-the-art race detectors.

关键词： Task analysis Detectors Synchronization parallel processing Concurrent computing Cognition Instruction sets instruction sets Racial Stocks task analysis parallel PROCESSING (COMPUTERS) parallel programming detector Cognition mutual exclusion shared variable

来源：评论

学校读者我要写书评

暂无评论

Framework for Scalable Intra-Node Collective Operations using Shared Memory 18

Framework for Scalable Intra-Node Collective Operations usin...

引用

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis

作者： Surabhi Jain Rashid Kaleem Marc Gamell Balmana Akhil Langer Dmitry Durnov Alexander Sannikov Maria Garzaran Intel Corporation

ISBN: (纸本)9781538683859

Collective operations are used in MPI programs to express common communication patterns, collective computations, or synchronization. In many collectives, such as MPI_Allreduce, the intra-node component of the collective lies on the critical path, as the inter-node communication cannot start until the intra-node component has completed. With increasing number of core counts in each node, intra-node optimizations that leverage shared memory become more important. In this paper, we focus on the performance benefit of optimizing intra-node collectives using POSIX shared memory for synchronization and data sharing. We implement several collectives using basic primitives or steps as building blocks. Key components of our implementation include a dedicated intra- node collectives layer, careful layout of the data structures, as well as optimizations to exploit the memory hierarchy to balance parallelism and latencies of data movement. A comparison of our implementation on top of MPICH shows significant performance speedups with respect to the original MPICH implementation, MVAPICH, and OpenMPI.

关键词： Synchronization Optimization Layout Lead Data structures Concurrent computing Topology Data structures shared memory layouts topology parallel programming Lead memory hierarchy Frameworks Unix operating system Data Sharing

来源：评论

学校读者我要写书评

暂无评论

A study of integer sorting on multicores

arXiv

引用

arXiv 2018年

作者： Gerbessiotis, Alexandros V. CS Department New Jersey Institute of Technology NewarkNJ07102 United States

Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random sampling parallel sorting, and network-based algorithms such as Batcher’s bitonic sorting algorithm. In this work we present an experimental study of integer sorting on multicore processors. We have implemented serial and parallel radix-sort for various radixes, deterministic regular oversampling and random oversampling parallel sorting, and also some previously little explored or unexplored variants of bitonic-sort and odd-even transposition sort. The study uses multithreading and multiprocessing parallel programming libraries with the C language implementations working under Open MPI, MulticoreBSP, and BSPlib utilizing the same source code. A secondary objective is to attempt to model the performance of these algorithm implementations under the MBSP (Multi-memory BSP) model. We first provide some general high-level observations on the performance of these implementations. If we can conclude anything is that accurate prediction of performance by taking into consideration architecture dependent features such as the structure and characteristics of multiple memory hierarchies is difficult and more often than not untenable. To some degree this is affected by the overhead imposed by the high-level library used in the programming effort. We can still draw however some reliable conclusions and reason about the performance of these implementations using the MBSP model, thus making MBSP useful and usable. Copyright © 2018, The Authors. All rights reserved.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Thursday Keynote New Paradigms for Energy-Efficient Computing Beyond the Moore's Law Era

Thursday Keynote New Paradigms for Energy-Efficient Computin...

引用

IEEE International SOC Conference

作者： Tarek El-Ghazawi George Washington University Law School Washington DC US

Due to the end of the Moore's law in clocking and Dennard's scaling, we are reaching very crippling limits with our current von Neumann processor paradigms. All the help is sought from both technology and architectures to innovate and engender new processing paradigms that can overcome those limitations and define the future of computing. New ideas and directions ranged from neuromorphic processors, to analog, mersisters, quantum and the use of nano photonics. This talk will examine a number of these emerging directions and work by the community including ours and evaluate some of the associated implications for the future of computing.

关键词： Computer architecture Energy efficiency IEEE Fellows Computers Heterogeneous networks parallel programming Conferences

来源：评论

学校读者我要写书评

暂无评论

Lessons Learned from Analyzing Dynamic Promotion for User-Level Threading 18

Lessons Learned from Analyzing Dynamic Promotion for User-Le...

引用

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis

作者： Shintaro Iwasaki Abdelhalim Amer Kenjiro Taura Pavan Balaji The University of Tokyo Tokyo Japan Argonne National Laboratory The University of Tokyo

ISBN: (纸本)9781538683859

A performance vs. practicality trade-off exists between user-level threading techniques. The community has settled mostly on a black-and-white perspective; fully fledged threads assume that suspension is imminent and incur overheads when suspension does not take place, and run-to-completion threads are more lightweight but less practical since they cannot suspend. Gray areas exist, however, whereby threads can start with minimal capabilities and then can be dynamically promoted to acquire additional capabilities when needed. This paper investigates the full spectrum of threading techniques from a performance vs. practicality trade-off perspective on modern multicore and many-core systems. Our results indicate that achieving the best trade-off highly depends on the suspension likelihood; dynamic promotion is more appropriate when suspension is unlikely and represents a solid replacement for run to completion, thanks to its lower programming constraints, while fully fledged threads remain the technique of choice when suspension likelihood is high.

关键词： Context Registers Instruction sets Switches programming Libraries Concurrent computing instruction sets Registers programming Switches Suspensions Suspended load Libraries parallel programming TAPPING THREADS Suspension

来源：评论

学校读者我要写书评

暂无评论

Comparative Analysis of de Bruijn Graph parallel Genome Assemblers

Comparative Analysis of de Bruijn Graph Parallel Genome Asse...

引用

International Work Conference on Bio-inspired Intelligence (IWOBI)

作者： Carlos Gamboa-Venegas Esteban Meneses School of Computing Costa Rica Institute of Technology

Finding the genome of new species remains as one of the most crucial tasks in molecular biology. To achieve that end, de novo sequence assembly feeds from the vast amount of data provided by Next-Generation Sequencing technology. Therefore, genome assemblers demand a high amount of computational resources, and parallel implementations of those assemblers are readily available. This paper presents a comparison of three well-known de novo genome assemblers: Velvet, ABySS and SOAPdenovo, all of them using de Bruijn graphs and having a parallel implementation. We based our analysis on parallel execution time, scalability, quality of assembly, and sensitivity to the choice of a critical parameter ( k -mer size). We found one of the tools clearly stands out for providing faster execution time and better quality in the output. Also, all assemblers are mildly sensitive to the choice of k-mer size and they all show limited scalability. We expect the findings of this paper provide a guide to the development of new algorithms and tools for scalable parallel genome sequence assemblers.

关键词： Bioinformatics Genomics Program processors Sequential analysis Message systems Scalability parallel programming

来源：评论

学校读者我要写书评

暂无评论

Runtime Data Management on Non-Volatile Memory-based Heterogeneous Memory for Task-parallel Programs 18

Runtime Data Management on Non-Volatile Memory-based Heterog...

引用

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis

作者： Kai Wu Jie Ren Dong Li University of California

ISBN: (纸本)9781538683859

Non-volatile memory (NVM) provides a scalable solution to replace DRAM as main memory. Because of relatively high latency and low bandwidth of NVM (comparing with DRAM), NVM often pairs with DRAM to build a heterogeneous main memory system (HMS). Deciding data placement on NVM-based HMS is critical to enable future NVM-based HPC. In this paper, we study task-parallel programs, and introduce a runtime system to address the data placement problem on NVM-based HMS. Leveraging semantics and execution mode of task-parallel programs, we efficiently characterize memory access patterns of tasks and reduce data movement overhead. We also introduce a performance model to predict performance for tasks with various data placements on HMS. Evaluating with a set of HPC benchmarks, we show that our runtime system achieves higher performance than a conventional HMS-oblivious runtime (24% improvement on average) and two state-of-the-art HMS-aware solutions (16% and 11% improvement on average, respectively).

关键词： Task analysis Nonvolatile memory Random access memory Runtime programming Data models Analytical models Runtime Nonvolatile memory parallel programming task analysis random access machine data models ANALYSIS MODELS programming DRAM1 gene main memory Religious Missions

来源：评论

学校读者我要写书评

暂无评论

HIPS 2018 Keynote

HIPS 2018 Keynote

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Christian Trott

Provides an abstract of the keynote presentation and may include a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings.

ISBN: (纸本)9781538655566;9781538655559

关键词： Computational modeling Laboratories Portfolios Legged locomotion parallel programming Next generation networking

来源：评论

学校读者我要写书评

暂无评论

Introduction to HIPS 2018

Introduction to HIPS 2018

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Karl Fuerlinger Philip C. Roth

Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.

ISBN: (纸本)9781538655566;9781538655559

关键词： Conferences Hip Tools Task analysis Distributed processing parallel programming

来源：评论

学校读者我要写书评

暂无评论

Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications 17

Isoefficiency in Practice: Configuring and Understanding the...

引用

22nd ACM SIGPLAN Symposium on Principles and Practice of parallel programming (PPoPP)

作者： Shudler, Sergei Calotoiu, Alexandru Hoefler, Torsten Wolf, Felix Tech Univ Darmstadt Darmstadt Germany Swiss Fed Inst Technol Zurich Switzerland

ISBN: (纸本)9781450344937

Task-based programming offers an elegant way to express units of computation and the dependencies among them, making it easier to distribute the computational load evenly across multiple cores. However, this separation of problem decomposition and parallelism requires a sufficiently large input problem to achieve satisfactory efficiency on a given number of cores. Unfortunately, finding a good match between input size and core count usually requires significant experimentation, which is expensive and sometimes even impractical. In this paper, we propose an automated empirical method for finding the isoefficiency function of a task based program, binding efficiency, core count, and the input size in one analytical expression. This allows the latter two to be adjusted according to given (realistic) efficiency objectives. Moreover, we not only find (i) the actual isoefficiency function but also (ii) the function one would yield if the program execution was free of resource contention and (iii) an upper bound that could only be reached if the program was able to maintain its average parallelism throughout its execution. The difference between the three helps to explain low efficiency, and in particular, it helps to differentiate between resource contention and structural conflicts related to task dependencies or scheduling. The insights gained can be used to co-design programs and shared system resources.

关键词： parallel programming tasking isoefficiency performance modeling performance analysis co-design

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：