检索结果-内蒙古大学图书馆

5th International Workshop on Accelerator programming Using Directives (WACCPD)

作者： Gayatri, Rahulkumar Yang, Charlene Kurth, Thorsten Deslippe, Jack Lawrence Berkeley Natl Lab LBNL Natl Energy Res Sci Comp Ctr NERSC Berkeley CA 94720 USA

ISBN: (纸本)9783030122744;9783030122737

In recent years, the HPC landscape has shifted away from traditional multi-core CPU systems to energy-efficient architectures, such as many-core CPUs and accelerators like GPUs, to achieve high performance. The goal of performance portability is to enable developers to rapidly produce applications which can run efficiently on a variety of these architectures, with little to no architecture specific code adoptions required. We implement a key kernel from a material science application using OpenMP 3.0, OpenMP 4.5, OpenACC, and CUDA on Intel architectures, Xeon and Xeon Phi, and NVIDIA GPUs, P100 and V100. We will compare the performance of the OpenMP 4.5 implementation with that of the more architecture-specific implementations, examine the performance of the OpenMP 4.5 implementation on CPUs after back-porting, and share our experience optimizing large reduction loops, as well as discuss the latest compiler status for OpenMP 4.5 and OpenACC.

关键词： OpenMP 3.0 OpenMP 4.5 OpenACC CUDA parallel programming models P100 V100 Xeon Phi Haswell

来源：评论

学校读者我要写书评

暂无评论

The AXIOM project (Agile, eXtensible, fast I/O Module) 15

The AXIOM project (Agile, eXtensible, fast I/O Module)

引用

International Conference on Embedded Computer Systems Architectures Modeling and Simulation

作者： Theodoropoulos, Dimitris Pnevmatikatos, Dionisis Alvarez, Carlos Ayguade, Eduard Bueno, Javier Filgueras, Antonio Jimenez-Gonzalez, Daniel Martorell, Xavier Navarro, Nacho Segura, Carlos Fernandez, Carles Oro, David Saeta, Javier Rodriguez Gai, Paolo Rizzo, Antonio Giorgi, Roberto Fdn Res & Technol Hellas FORTH Inst Comp Sci GR-70013 Iraklion Greece Barcelona Supercomp Ctr Dept Comp Sci Barcelona Spain Univ Politecn Cataluna Comp Architecture Dept Barcelona Spain Herta Secur SL Barcelona Spain Evidence SRL Pisa Italy Univ Siena Siena Italy

ISBN: (纸本)9781467373111

The AXIOM project (Agile, eXtensible, fast I/O Module) aims at researching new software/hardware architectures for the future Cyber-Physical Systems (CPSs). These systems are expected to react in real-time, provide enough computational power for the assigned tasks, consume the least possible energy for such task (energy efficiency), scale up through modularity, allow for an easy programmability across performance scaling, and exploit at best existing standards at minimal costs. Current solutions for providing enough computational power are mainly based on multi-or many-core architectures. For example, some current research projects (such as ADEPT or P-SOCRATES) are already investigating how to join efforts from the High-Performance Computing (HPC) and the Embedded Computing domains, which are both focused on high power efficiency, while GPUs and new Dataflow platforms such as Maxeler, or in general FPGAs, are claimed as the most energy efficient. We present the project's initial approach, ideas and key concepts, and describe the AXIOM preliminary architecture. Our starting point uses power efficient multi-core nodes, such as ARM cores and FPGA accelerators on the same die, as in the Xilinx Zynq. We will work to provide an integrated environment that supports programmability of the parallel, interconnected nodes that form a CPS system, and evaluate our ideas using demanding test application scenarios.

关键词： Cyber-Physical Systems parallel programming models Smart Video Surveillance Smart Living/Home

来源：评论

学校读者我要写书评

暂无评论

Nested data parallelism - An introductory overview

引用

Informatik - Forschung und Entwicklung 1999年第4期14卷 179-192页

作者： Pfannenstiel, W. Technische Universität Berlin Fachbereich Informatik Fachgebiet Softwaretechnik D-10587 Berlin Franklinstraße 28/29 Germany

Today, data-parallel programming models are the most successful programming models for parallel computers both in terms of efficiency of execution and ease of use for the programmer. However, there is no parallel programming model that is conceptually simple and abstract, and that can be ported efficiently to the variety of parallel architectures available. The nested data-parallel programming model has some of the desired properties of a parallel programming model. In contrast to flat data parallel models, with this model it is possible to express irregular data structures and irregular parallel computations directly. In this paper, a collection-oriented approach to nested data parallelism is introduced. The state of the art of related research is presented and open questions are identified.

关键词： High-level parallel languages Irregular parallelism Massively parallel systems Nested parallelism parallel programming models Program transformations

来源：评论

学校读者我要写书评

暂无评论

Improving Performance of HPC Kernels on FPGAs Using High-Level Resource Management 31

Improving Performance of HPC Kernels on FPGAs Using High-Lev...

引用

31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

作者： Filgueras, Antonio Vidal, Miquel Jimenez-Gonzalez, Daniel Alvarez, Carlos Martorell, Xavier Univ Politecn Cataluna Barcelona Spain Barcelona Supercomp Ctr Barcelona Spain

In state-of-the-art FPGA, especially in chiplet-based devices, place and route has become an important challenge due to an increase in device size and complexity. In the same way, off-chip memory resources have grown ... 详细信息

ISBN: (纸本)9798350312058

关键词： FPGA FPGA design tools high performance computing high-level synthesis matrix multiplication parallel programming models

来源：评论

学校读者我要写书评

暂无评论

An Integrated Hardware-Software Approach to Task Graph Management 16

An Integrated Hardware-Software Approach to Task Graph Manag...

引用

16th IEEE International Conference on High Performance Computing and Communications HPCC 2014\11th IEEE International Conference on Embedded Software and Systems ICESS 2014\6th International Symposium on Cyberspace Safety and Security CSS 2014

作者： Engelhardt, Nina Dallou, Tamer Elhossini, Ahmed Juurlink, Ben Tech Univ Berlin Embedded Syst Architecture Einsteinufer 17 D-10587 Berlin Germany

ISBN: (纸本)9781479961238

Task-based parallel programming models with explicit data dependencies, such as OmpSs, are gaining popularity, due to the ease of describing parallel algorithms with complex and irregular dependency patterns. These advantages, however, come at a steep cost of runtime overhead incurred by dynamic dependency resolution. Hardware support for task management has been proposed in previous work as a possible solution. We present VSs, a runtime library for the OmpSs programming model that integrates the Nexus++ hardware task manager, and evaluate the performance of the VSs-Nexus++ system. Experimental results show that applications with fine-grain tasks can achieve speedups of up to 3.4x, while applications optimized for current runtimes attain 1.3x. Providing support for hardware task managers in runtime libraries is therefore a viable approach to improve the performance of OmpSs applications.

关键词： OmpSs parallel programming models task dataflow hardware task scheduler runtime library

来源：评论

学校读者我要写书评

暂无评论

Self-Adaptive OmpSs Tasks in Heterogeneous Environments

Self-Adaptive OmpSs Tasks in Heterogeneous Environments

引用

IEEE 27th International parallel and Distributed Processing Symposium (IPDPS)

作者： Planas, Judit Badia, Rosa M. Ayguade, Eduard Labarta, Jesus Univ Politecn Cataluna Barcelona Supercomp Ctr E-08028 Barcelona Spain CSIC Artificial Intelligence Res Inst IIIA Barcelona Supercompu Ctr Barcelona Spain

ISBN: (纸本)9780769549712

As new heterogeneous systems and hardware accelerators appear, high performance computers can reach a higher level of computational power. Nevertheless, this does not come for free: the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource management. OmpSs is a task-based programming model and framework focused on the runtime exploitation of parallelism from annotated sequential applications. This paper presents a set of extensions to this framework: we show how the application programmer can expose different specialized versions of tasks (i.e. pieces of specific code targeted and optimized for a particular architecture) and how the system can choose between these versions at runtime to obtain the best performance achievable for the given application. From the results obtained in a multi-GPU system, we prove that our proposal gives flexibility to application's source code and can potentially increase application's performance.

关键词： multi-gpu management heterogeneous architectures parallel programming models scheduling techniques

来源：评论

学校读者我要写书评

暂无评论

Pure: Evolving Message Passing To Better Leverage Shared Memory Within Nodes 24

Pure: Evolving Message Passing To Better Leverage Shared Mem...

引用

29th ACM SIGPLAN Annual Symposium on Principles and Practice of parallel programming (PPoPP)

作者： Psota, James Solar-Lezama, Armando MIT CSAIL Cambridge MA 02139 USA

ISBN: (纸本)9798400704352

Pure is a new programming model and runtime system explicitly designed to take advantage of shared memory within nodes in the context of a mostly message passing interface enhanced with the ability to use tasks to make use of idle cores. Pure leverages shared memory in two ways: (a) by allowing cores to steal work from each other while waiting on messages to arrive, and, (b) by leveraging *** lock-free data structures in shared memory to achieve highperformance messaging and collective operations between the ranks within nodes. We use microbenchmarks to evaluate Pure's key messaging and collective features and also show application speedups up to 2.1 Chi on the CoMD molecular dynamics and the miniAMR adaptive mesh *** applications scaling up to 4,096 cores.

关键词： parallel programming models distributed runtime systems task-based parallelism concurrent data structures lock-free data structures

来源：评论

学校读者我要写书评

暂无评论

Adapting the Local Clusters for Execution parallel Applications and Training 13

Adapting the Local Clusters for Execution Parallel Applicati...

引用

Joint Event of the RENAM 8th Conference / RoEduNet 13th Conference

作者： Hancu, Boris Calmis, Elena Moldova State Univ USM Fac Math & Comp Sci Chitinau Moldova

ISBN: (纸本)9781479968602

In this article is described the development of soft and hard environment for integrating individual cluster systems in a single, integrate parallel HPC systems. The elaborated applications can be ported to the resources of the integrated HPC system. To acquire the necessary theoretical and practical skills on using regional HPC clusters, currently it is preparing an interactive educational course for teaching students in the area of parallel programming, HPC clusters, and use of parallel software.

关键词： computer science HPC system mathematical model sakai bigbluebutton online course parallel programming models Rocks cluster md-grid

来源：评论

学校读者我要写书评

暂无评论

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver 20

Experiences on the characterization of parallel applications...

引用

49th International Conference on parallel Processing (ICPP)

作者： Munera, Adrian Royuela, Sara Llort, German Mercadal, Estanislao Wartel, Franck Quinones, Eduardo Barcelona Supercomp Ctr Barcelona Spain Airbus Def & Space Taufkirchen Germany

ISBN: (纸本)9781450388160

Cutting-edge functionalities in embedded systems require the use of parallel architectures to meet their performance requirements. This imposes the introduction of a new layer in the software stacks of embedded systems: the parallel programming model. Unfortunately, the tools used to analyze embedded systems fall short to characterize the performance of parallel applications at a parallel programming model level, and correlate this with information about non-functional requirements such as real-time, energy, memory usage, etc. HPC tools, like Extrae, are designed with that level of abstraction in mind, but their main focus is on performance evaluation. Overall, providing insightful information about the performance of parallel embedded applications at the parallel programming model level, and relate it to the non-functional requirements, is of paramount importance to fully exploit the performance capabilities of parallel embedded architectures. This paper contributes to the state-of-the-art of analysis tools for embedded systems by: (1) analyzing the particular constraints of embedded systems compared to HPC systems (e.g., static setting, restricted memory, limited drivers) to support HPC analysis tools;(2) porting Extrae, a powerful tracing tool from the HPC domain, to the GR740 platform, a SoC used in the space domain;and (3) augmenting Extrae with new features needed to correlate the parallel execution with the following non-functional requirements: energy, temperature and memory usage. Finally, the paper presents the usefulness of Extrae to characterize OpenMP applications and its non-functional requirements, evaluating different aspects of the applications running in the GR740.

关键词： Embedded systems parallel programming models performance evaluation analysis tools OpenMP

来源：评论

学校读者我要写书评

暂无评论

MC Mutants: Evaluating and Improving Testing for Memory Consistency Specifications 2023

MC Mutants: Evaluating and Improving Testing for Memory Cons...

引用

28th ACM International Conference on Architectural Support for programming Languages and Operating Systems (ASPLOS)

作者： Levine, Reese Guo, Tianhao Cho, Mingun Baker, Alan Levien, Raph Neto, David Quinn, Andrew Sorensen, Tyler UC Santa Cruz Santa Cruz CA 95064 USA NYU New York NY 10003 USA Univ Calif Davis Davis CA USA Google Vancouver BC Canada Google Mountain View CA 94043 USA

ISBN: (纸本)9781450399166

Shared memory platforms provide a memory consistency specification (MCS) so that developers can reason about the behaviors of their parallel programs. Unfortunately, ensuring that a platform conforms to its MCS is difficult, as is exemplified by numerous bugs in well-used platforms. While existing MCS testing approaches find bugs, their efficacy depends on the testing environment (e.g. if synthetic memory pressure is applied). MCS testing environments are difficult to evaluate since legitimate MCS violations are too rare to use as an efficacy metric. As a result, prior approaches have missed critical MCS bugs. This work proposes a mutation testing approach for evaluating MCS testing environments: MC Mutants. This approach mutates MCS tests such that the mutants simulate bugs that might occur. A testing environment can then be evaluated using a mutation score. We utilizeMCMutants in two novel contributions: (1) a parallel testing environment, and (2) An MCS testing confidence strategy that is parameterized over a time budget and confidence threshold. We implement our contributions in WebGPU, a new web-based GPU programming specification, and evaluate our techniques across four GPUs. We improve testing speed by three orders of magnitude over prior work, empowering us to create a conformance test suite that reproduces many mutated tests with high confidence and requires only 64 seconds per test. We identified two bugs in WebGPU implementations, one of which led to a specification change. Moreover, the official WebGPU conformance test suite has adopted our approach due to its efficiency, effectiveness, and broad applicability.

关键词： memory consistency parallel programming models mutation testing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：