Manycore accelerators offer the potential of significantly improving the performance of scientific applications when of- oading compute intensive portions of programs to the accelerators. Directive-based programming m...
详细信息
ISBN:
(纸本)9781450326551
Manycore accelerators offer the potential of significantly improving the performance of scientific applications when of- oading compute intensive portions of programs to the accelerators. Directive-based programming models such as Open ACC and Open MP are high-level programming model for users to create applications for accelerators by annotating region of code for offloading with directives. In these programming models, most of the offloaded kernels are data parallel loops processing one or multiple multi-dimensional arrays, and it is often that scalar variables are used in the parallel loop body for reduction operations. Since reduction operation itself has loop-carried dependency preventing the parallelization of the loops, this could have a significant impact on the performance if not handled properly. In this paper, we present the design and parallelization of reduction operations in parallel loops for GPGPU accelerators. Using OpenACC as the high-level directive-based programing model, we discuss how reduction operations are parallelized when appearing in each level of the loop nest and thread hierarchy. We present how we handle the map- ping of the loops and parallelized reduction to single- or multiple-level parallelism of GPGPU architectures. These algorithms have been implemented in the open source OpenACC compiler OpenUH. We compare our implementation with two other commercial OpenACC compilers using test cases and applications, and demonstrate better robustness and competitive performance than others. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features-Reduction General Terms Design, algorithms, Measurement.
The proceedings contain 29 papers. The topics discussed include: distributed cooperative Q-learning for mobility-sensitive handover optimization in LTE SON;topology selection criteria for a virtual topology controller...
ISBN:
(纸本)9781479942770
The proceedings contain 29 papers. The topics discussed include: distributed cooperative Q-learning for mobility-sensitive handover optimization in LTE SON;topology selection criteria for a virtual topology controller based on neural memories;GP-m: mobile middleware infrastructure for ambient assisted living;formal modeling and checking of an enhanced variant of the IEEE 802.11 CSMAICA protocol;measuring the Internet's threat level: a global-local approach;decomposition of memory consumption footprints to identify problematic threads;monitoring applications and services to improve the cloud foundry PaaS;an efficient MAC-signature scheme for authentication in XOR network coding;near-clouds: bringing public clouds to users' doorsteps;programmable mobile core network;a slot assignment for wireless body area networks;modeling, optimization and performance prediction of parallelalgorithms;and automating the Hadoop configuration for easy setup in resilient cloud systems.
Astrophysical databases have used proprietary formats (especially the FITS format) to represent measured data and related metadata. The design of the FITS format was influenced by punch cards, thus it is extremely ina...
详细信息
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Triangle counting in a graph is a building block for clustering coefficients which is a widely used social network analytic for finding key players in a network based on their local connectivity. In this paper we show...
详细信息
Page-based memory management (paging) is utilized by most of the current operating systems (OSs) due to its rich features such as prevention of memory fragmentation and fine-grained access control. Page-based virtual ...
详细信息
ISBN:
(纸本)9781450329507
Page-based memory management (paging) is utilized by most of the current operating systems (OSs) due to its rich features such as prevention of memory fragmentation and fine-grained access control. Page-based virtual memory, however, stores virtual to physical mappings in page tables that also reside in main memory. Because translating virtual to physical addresses requires walking the page tables, which in turn implies additional memory accesses, modern CPUs employ translation lookaside buffers (TLBs) to cache the mappings. Nevertheless, TLBs are limited in size and applications that consume a large amount of memory and exhibit little or no locality in their memory access patterns, such as graph algorithms, suffer from the high overhead of TLB misses. This paper proposes a new hybrid kernel design targeting many-core CPUs, which manages the application's memory space by segmentation and offloads kernel services to dedicated CPU cores where paging is utilized. The method enables applications to run on top of the low-cost segmented memory management while allows the kernel to use the rich features of paging. We present the design and implementation of our kernel and demonstrate that segmentation can provide superior performance compared to both regular and large page based virtual memory. For example, running Graph500 on top of our segmentation design over Intel's Xeon Phi chip can yield up to 81% and 9% improvement compared to utilizing 4kB and 2MB pages in MPSS Linux, respectively.
The proceedings contain 8 papers. The topics discussed include: 2nd MDHPCL: model-driven engineering for high performance and cloud computing;towards a solution avoiding vendor lock-in to enable migration between clou...
The proceedings contain 8 papers. The topics discussed include: 2nd MDHPCL: model-driven engineering for high performance and cloud computing;towards a solution avoiding vendor lock-in to enable migration between cloud platforms;modeling cloud architectures as interactive systems;vehicleFORGE: a cloud-based infrastructure for collaborative model-based design;a model-driven approach for price/performance tradeoffs in cloud-based MapReduce application deployment;towards domain-specific testing languages for software-as-a-service;architecture framework for mapping parallelalgorithms to parallel computing platforms;and model-driven transformations for mapping parallelalgorithms on parallel computing platforms.
The proceedings contain 12 papers. The topics discussed include: a novel finite element method assembler for co-processors and accelerators;the energy case for graph processing on hybrid CPU and GPU systems;a syntheti...
ISBN:
(纸本)9781450325035
The proceedings contain 12 papers. The topics discussed include: a novel finite element method assembler for co-processors and accelerators;the energy case for graph processing on hybrid CPU and GPU systems;a synthetic task model for HPC-grade optical network performance evaluation;maximizing the performance of irregular applications on multithreaded, NUMA;analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application;in-memory data compression for sparse matrices;on the GPU performance of cell-centered finite volume method over unstructured tetrahedral meshes;nonzero pattern analysis and memory access optimization in GPU-based sparse LU factorization for circuit simulation;register level sort algorithm on multi-core SIMD processors;parallel sparse FFT;an AMR computation and communication dependency and analysis methodology;and parallel implementations of ensemble data assimilation for atmospheric prediction.
Aggregate Risk Analysis is a computationally intensive and a data intensive problem, thereby making the application of high-performance computing techniques interesting. In this paper, the design and implementation of...
详细信息
暂无评论