检索结果-内蒙古大学图书馆

Data layout optimization for multi-valued containers in OpenCL

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2012年第9期72卷 1073-1082页

作者： Strzodka, Robert Max Planck Inst Informat D-66123 Saarbrucken Germany

Scientific data is mostly multi-valued, e.g., coordinates, velocities, moments or feature components, and it comes in large quantities. The data layout of such containers has an enormous impact on the achieved performance, however, layout optimization is very time-consuming and error-prone because container access syntax in standard programming languages is not sufficiently abstract. This means that changing the data layout of a container necessitates syntax changes in all parts of the code where the container is used. Object oriented languages allow to solve this problem by hiding the data layout behind a class interface. However, the additional coding effort is enormous in comparison to a simple structure. A clever coding pattern, previously presented by the author, significantly reduces the code overhead, however, it relies heavily on advanced C++ features, a language that is not supported on most accelerators. This paper develops a concise macro based solution that requires only support for structures and unions and can therefore be utilized in OpenCL, a widely supported programming language for parallel processors. This enables the development of high performance code without an a-priori commitment to a certain layout and includes the possibility to optimize it subsequently. This feature is used to identify the best data layouts for different processing patterns of multi-valued containers on a multi-CPU system. (C) 2011 Elsevier Inc. All rights reserved.

关键词： Multi-valued Multi-component Data layout Array of structures AoS structure of arrays SoA Array of structures of arrays ASA OpenCL Multi-GPU

来源：评论

学校读者我要写书评

暂无评论

Optimization Strategies for WRF Single-Moment 6-Class Microphysics Scheme (WSM6) on Intel Microarchitectures 5

Optimization Strategies for WRF Single-Moment 6-Class Microp...

引用

5th International Symposium on Computing and Networking (CANDAR)

作者： Ouermi, T. A. J. Knoll, Aaron Kirby, Robert M. Berzins, Martin Univ Utah SCI Inst Salt Lake City UT 84112 USA

ISBN: (纸本)9781538620878

Optimizations in the petascale era require modifications of existing codes to take advantage of new architectures with large core counts and SIMD vector units. This paper examines high-level and low-level optimization strategies for numerical weather prediction (NWP) codes. These strategies employ thread-local structures of arrays (SOA) and an OpenMP directive such as OMP SIMD. These optimization approaches are applied to the Weather Research Forecasting single-moment 6-class microphysics schemes (WSM6) in the US Navy NEPTUNE system. The results of this study indicate that the high-level approach with SOA and low-level OMP SIMD improves thread and vector parallelism by increasing data and temporal locality. The modified version of WSM6 runs 70x faster than the original serial code. This improvement is about 23.3x faster than the performance achieved by Ouermi et al. [1], and 14.9x faster than the performance achieved by Michalakes et al. [2]

关键词： structure of arrays OpenMP Knights Landing Numerical Weather Prediction Thread parallelism Vector parallelism

来源：评论

学校读者我要写书评

暂无评论

Zero-overhead Interfaces for High-performance Computing Libraries and Kernels

Zero-overhead Interfaces for High-performance Computing Libr...

引用

25th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Schaefer, Andreas Fey, Dietmar Friedrich Alexander Univ Erlangen Nurnberg FAU Dept Comp Sci Erlangen Germany

ISBN: (纸本)9780769549569;9781467362184

In this paper we propose a domain-specific language-based approach to reduce the overhead associated with accessing external data from computational kernels. Libraries which aid application developers in parallelizing and optimizing their codes need a way to expose their internal data stores to user code. An efficient interface as well as an optimized data layout are imperative for high application performance. We focus on codes which operate on regular grids and require only local interactions. These stencil-based programs form a class of algorithms found at the heart of many computer simulations and PDE solvers. Many stencil codes are memory bound, meaning that their performance depends heavily on an efficient usage of the computers' memory subsystem. This work's contribution is to give an extensive review of the available implementation alternatives and to put them in context with the state of the art. From this we derive our domain-specific language (DSL) which alleviates many of the shortcomings of previous designs, especially related to the utilization of SIMD units and simplifying the address generation. Simultaneously it provides a natural, object-oriented way of expressing data structures and accesses. We validate our DSL with benchmark results obtained from two kernels: one reverse time migration and one Lattice Boltzmann method.

关键词： stencil codes vectorization structure of arrays domain-specific language

来源：评论

学校读者我要写书评

暂无评论

Zero-overhead Interfaces for High-performance Computing Libraries and Kernels

Zero-overhead Interfaces for High-performance Computing Libr...

引用

Parallel Data Storage Workshop

作者： Andreas Schafer Dietmar Fey Department of Computer Science Friedrich-Alexander-Universitat Erlangen-Nurnberg (FAU)

ISBN: (纸本)9781467362184

关键词： Stencil codes Vectorization structure of arrays Domain-specific language

来源：评论

学校读者我要写书评

暂无评论

Children?s conceptions on the structure of an array: Using quick images as a gateway to multiplicative ideas

引用

JOURNAL OF MATHEMATICAL BEHAVIOR 2023年 69卷

作者： Bajwa, Neet Priya Tobias, Jennifer M. Lawton, Carrie Illinois State Univ Campus Box 4520 Normal IL 61790 USA

In this paper we report the conceptions about arrays that came to the fore as one class of second -grade students participated in whole classroom discussions and activities focused on the structure of arrays presented as a Quick Images routine. Before the intervention, students were not intro-duced to formal multiplication but had completed a unit on arrays. A constant comparative method was used to identify numeric and spatial structuring strategies that allowed for students' conceptions about the structure of the array to emerge. Results indicated that not all students automatically use arrays as a composite of rows. We found that the use of Quick Images with larger arrays and non-arrays within the whole classroom discussion was successful at eliciting and directing students' attention towards the spatial features of an array, including seeing an array as made of a composite of rows (or columns).

关键词： arrays Elementary education Multiplicative ideas Quick Images structure of arrays

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：