Scientific data is mostly multi-valued, e.g., coordinates, velocities, moments or feature components, and it comes in large quantities. The data layout of such containers has an enormous impact on the achieved perform...
详细信息
Scientific data is mostly multi-valued, e.g., coordinates, velocities, moments or feature components, and it comes in large quantities. The data layout of such containers has an enormous impact on the achieved performance, however, layout optimization is very time-consuming and error-prone because container access syntax in standard programming languages is not sufficiently abstract. This means that changing the data layout of a container necessitates syntax changes in all parts of the code where the container is used. Object oriented languages allow to solve this problem by hiding the data layout behind a class interface. However, the additional coding effort is enormous in comparison to a simple structure. A clever coding pattern, previously presented by the author, significantly reduces the code overhead, however, it relies heavily on advanced C++ features, a language that is not supported on most accelerators. This paper develops a concise macro based solution that requires only support for structures and unions and can therefore be utilized in OpenCL, a widely supported programming language for parallel processors. This enables the development of high performance code without an a-priori commitment to a certain layout and includes the possibility to optimize it subsequently. This feature is used to identify the best data layouts for different processing patterns of multi-valued containers on a multi-CPU system. (C) 2011 Elsevier Inc. All rights reserved.
Optimizations in the petascale era require modifications of existing codes to take advantage of new architectures with large core counts and SIMD vector units. This paper examines high-level and low-level optimization...
详细信息
ISBN:
(纸本)9781538620878
Optimizations in the petascale era require modifications of existing codes to take advantage of new architectures with large core counts and SIMD vector units. This paper examines high-level and low-level optimization strategies for numerical weather prediction (NWP) codes. These strategies employ thread-local structures of arrays (SOA) and an OpenMP directive such as OMP SIMD. These optimization approaches are applied to the Weather Research Forecasting single-moment 6-class microphysics schemes (WSM6) in the US Navy NEPTUNE system. The results of this study indicate that the high-level approach with SOA and low-level OMP SIMD improves thread and vector parallelism by increasing data and temporal locality. The modified version of WSM6 runs 70x faster than the original serial code. This improvement is about 23.3x faster than the performance achieved by Ouermi et al. [1], and 14.9x faster than the performance achieved by Michalakes et al. [2]
In this paper we propose a domain-specific language-based approach to reduce the overhead associated with accessing external data from computational kernels. Libraries which aid application developers in parallelizing...
详细信息
ISBN:
(纸本)9780769549569;9781467362184
In this paper we propose a domain-specific language-based approach to reduce the overhead associated with accessing external data from computational kernels. Libraries which aid application developers in parallelizing and optimizing their codes need a way to expose their internal data stores to user code. An efficient interface as well as an optimized data layout are imperative for high application performance. We focus on codes which operate on regular grids and require only local interactions. These stencil-based programs form a class of algorithms found at the heart of many computer simulations and PDE solvers. Many stencil codes are memory bound, meaning that their performance depends heavily on an efficient usage of the computers' memory subsystem. This work's contribution is to give an extensive review of the available implementation alternatives and to put them in context with the state of the art. From this we derive our domain-specific language (DSL) which alleviates many of the shortcomings of previous designs, especially related to the utilization of SIMD units and simplifying the address generation. Simultaneously it provides a natural, object-oriented way of expressing data structures and accesses. We validate our DSL with benchmark results obtained from two kernels: one reverse time migration and one Lattice Boltzmann method.
In this paper we propose a domain-specific language-based approach to reduce the overhead associated with accessing external data from computational kernels. Libraries which aid application developers in parallelizing...
详细信息
ISBN:
(纸本)9781467362184
In this paper we propose a domain-specific language-based approach to reduce the overhead associated with accessing external data from computational kernels. Libraries which aid application developers in parallelizing and optimizing their codes need a way to expose their internal data stores to user code. An efficient interface as well as an optimized data layout are imperative for high application performance. We focus on codes which operate on regular grids and require only local interactions. These stencil-based programs form a class of algorithms found at the heart of many computer simulations and PDE solvers. Many stencil codes are memory bound, meaning that their performance depends heavily on an efficient usage of the computers' memory subsystem. This work's contribution is to give an extensive review of the available implementation alternatives and to put them in context with the state of the art. From this we derive our domain-specific language (DSL) which alleviates many of the shortcomings of previous designs, especially related to the utilization of SIMD units and simplifying the address generation. Simultaneously it provides a natural, object-oriented way of expressing data structures and accesses. We validate our DSL with benchmark results obtained from two kernels: one reverse time migration and one Lattice Boltzmann method.
In this paper we report the conceptions about arrays that came to the fore as one class of second -grade students participated in whole classroom discussions and activities focused on the structure of arrays presented...
详细信息
In this paper we report the conceptions about arrays that came to the fore as one class of second -grade students participated in whole classroom discussions and activities focused on the structure of arrays presented as a Quick Images routine. Before the intervention, students were not intro-duced to formal multiplication but had completed a unit on arrays. A constant comparative method was used to identify numeric and spatial structuring strategies that allowed for students' conceptions about the structure of the array to emerge. Results indicated that not all students automatically use arrays as a composite of rows. We found that the use of Quick Images with larger arrays and non-arrays within the whole classroom discussion was successful at eliciting and directing students' attention towards the spatial features of an array, including seeing an array as made of a composite of rows (or columns).
暂无评论