检索结果-内蒙古大学图书馆

From physics model to results: An optimizing framework for cross-architecture code generation

SCIENTIFIC PROGRAMMING 2013年第1-2期21卷 1-16页

作者： Blazewicz, Marek Hinder, Ian Koppelman, David M. Brandt, Steven R. Ciznicki, Milosz Kierzynka, Michal Loeffler, Frank Schnetter, Erik Tao, Jian Poznan Supercomp & Networking Ctr Applicat Dept Poznan Poland Poznan Univ Tech Poznan Poland Albert Einstein Inst Max Planck Inst Gravitat Phys Potsdam Germany Louisiana State Univ Ctr Computat & Technol Baton Rouge LA 70803 USA Louisiana State Univ Div Elect & Comp Engn Baton Rouge LA 70803 USA Louisiana State Univ Div Comp Sci Baton Rouge LA 70803 USA Perimeter Inst Theoret Phys Waterloo ON Canada Univ Guelph Dept Phys Guelph ON N1G 2W1 Canada

Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.

关键词： Automatic parallelization hybrid computing GPU computing parallel application frameworks numerical methods

来源：评论

学校读者我要写书评

暂无评论

A new parallelization scheme for adaptive mesh refinement

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2016年第0期16卷 79-88页

作者： Loffler, Frank Cao, Zhoujian Brandt, Steven R. Du, Zhihui Louisiana State Univ Ctr Computat & Technol Baton Rouge LA 70803 USA Chinese Acad Sci Acad Math & Syst Sci Inst Appl Math Beijing 100190 Peoples R China Louisiana State Univ Dept Comp Sci Baton Rouge LA 70803 USA Tsinghua Univ Dept Comp Sci & Technol Tsinghua Natl Lab Informat Sci & Technol Beijing 100084 Peoples R China

We present a new method for parallelization of adaptive mesh refinement called Concurrent Structured Adaptive Mesh Refinement (CSAMR). This new method offers the lower computational cost (i.e. wall time x processor count) of subcycling in time, but with the runtime performance (i.e. smaller wall time) of evolving all levels at once using the time step of the finest level (which does more work than subcycling but has less parallelism). We demonstrate our algorithm's effectiveness using an adaptive mesh refinement code, AMSS-NCKU, and show performance on Blue Waters and other high performance clusters. For the class of problem considered in this paper, our algorithm achieves a speedup of 1.7-1.9 when the processor count for a given AMR run is doubled, consistent with our theoretical predictions. (C) 2016 The Authors. Published by Elsevier B.V.

关键词： parallel application frameworks parallel algorithms parallel applications Adaptive mesh refinement

来源：评论

学校读者我要写书评

暂无评论

HeAT - a Distributed and GPU-accelerated Tensor Framework for Data Analytics 8

HeAT - a Distributed and GPU-accelerated Tensor Framework fo...

引用

8th IEEE International Conference on Big Data (Big Data)

作者： Goetz, Markus Debus, Charlotte Coquelin, Daniel Krajsek, Kai Comito, Claudia Knechtges, Philipp Hagemeier, Bjorn Tarnawa, Michael Hanselmann, Simon Siggel, Martin Basermann, Achim Streit, Achim German Aerosp Ctr Inst Software Technol SC Cologne Germany Forschungszentrum Julich Inst Rio & Geosci Agrosphere IBG 3 Julich Germany Forschungszentrum Julich FZJ Julich Supercomp Ctr JSC Julich Germany Karlsruhe Inst Technol KIT Steinbuch Ctr Comp SCC Karlsruhe Germany

ISBN: (纸本)9781728162515

To cope with the rapid growth in available data, the efficiency of data analysis and machine learning libraries has recently received increased attention. Although great advancements have been made in traditional array-based computations, most are limited by the resources available on a single computation node. Consequently, novel approaches must be made to exploit distributed resources, e.g. distributed memory architectures. To this end, we introduce IleAT, an array-based numerical programming framework for large-scale parallel processing with an easy-to-use NumPy-like API. HeAT utilizes PyTorch as a node-local eager execution engine and distributes the workload on arbitrarily large high-performance computing systems via MPI. It provides both low-level array computations, as well as assorted higher-level algorithms. With HeAT, it is possible for a NumPy user to take full advantage of their available resources, significantly I owering the bartier to distributed data analysis. When compared to similar frameworks, HeAT achieves speedups of up to two orders of magnitude.

关键词： IleAT Tensor Framework High-performance Computing PyTorch NumPy Message Passing Interface CPU Rig Data Analytics Machine Learning Dask Model parallelism parallel application frameworks

来源：评论

学校读者我要写书评

暂无评论

Parla: A Python Orchestration System for Heterogeneous Architectures

Parla: A Python Orchestration System for Heterogeneous Archi...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (HPC)

作者： Lee, Hochan Ruys, William Henriksen, Ian Peters, Arthur Yan, Yineng Stephens, Sean You, Bozhi Fingler, Henrique Burtscher, Martin Gligoric, Milos Schulz, Karl Pingali, Keshav Rossbach, Christopher J. Erez, Mattan Biros, George Univ Texas Austin Austin TX 78712 USA Texas State Univ San Marcos TX USA

ISBN: (纸本)9781665454445

Python's ease of use and rich collection of numeric libraries make it an excellent choice for rapidly developing scientific applications. However, composing these libraries to take advantage of complex heterogeneous nodes is still difficult. To simplify writing multi-device code, we created Parla, a heterogeneous task-based programming framework that fully supports Python's scientific programming stack. Parla's API is based on Python decorators and allows users to wrap code in Parla tasks for parallel execution. Parla arrays enable automatic movement of data between devices. The Parla runtime handles resourceaware mapping, scheduling, and execution of tasks. Compared to other Python tasking systems, Parla is unique in its parallelization of tasks within a single process, its GPU context and resourceaware runtime, and its design around gradual adoption to provide easy migration of and integration into existing Python applications. We show that Parla can achieve performance competitive with hand-optimized code while improving ease of development.

关键词： parallel application frameworks task based parallelism heterogeneous computing load balancing and scheduling algorithms

来源：评论

学校读者我要写书评

暂无评论

Parla: a Python orchestration system for heterogeneous architectures 22

Parla: a Python orchestration system for heterogeneous archi...

引用

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

作者： Hochan Lee William Ruys Ian Henriksen Arthur Peters Yineng Yan Sean Stephens Bozhi You Henrique Fingler Martin Burtscher Milos Gligoric Karl Schulz Keshav Pingali Christopher J. Rossbach Mattan Erez George Biros The University of Texas at Austin Texas State University

Python's ease of use and rich collection of numeric libraries make it an excellent choice for rapidly developing scientific applications. However, composing these libraries to take advantage of complex heterogeneous nodes is still difficult. To simplify writing multi-device code, we created Parla, a heterogeneous task-based programming framework that fully supports Python's scientific programming stack. Parla's API is based on Python decorators and allows users to wrap code in Parla tasks for parallel execution. Parla arrays enable automatic movement of data between devices. The Parla runtime handles resource-aware mapping, scheduling, and execution of tasks. Compared to other Python tasking systems, Parla is unique in its parallelization of tasks within a single process, its GPU context and resource-aware runtime, and its design around gradual adoption to provide easy migration of and integration into existing Python applications. We show that Parla can achieve performance competitive with hand-optimized code while improving ease of development.

关键词： heterogeneous computing parallel application frameworks load balancing and scheduling algorithms task based parallelism

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：