Improving the performance of stencil computations is a long-standing optimization challenge due to their inherent heavy memory-access patterns. This problem has been explored in many wave-propagation simulation engine...
详细信息
ISBN:
(纸本)9781538655559
Improving the performance of stencil computations is a long-standing optimization challenge due to their inherent heavy memory-access patterns. This problem has been explored in many wave-propagation simulation engines. Moving towards implementations with elastic waves instead of acoustic ones (e.g., used in medical imaging) results in computationally more expensive processes along with increased memory usage. Despite the computational demand, the elevated cost of exploration combined the need for higher success rates is driving the oil & gas industry to adopt elastic anisotropic wave-propagation models as the core of many geophysical imaging mechanisms to extract subsurface features more accurately, increasing return on investment. To reduce time-to-solution, the more complex stencil codes must run efficiently on modern CPU architectures. The Intel Xeon Phi processors emerge as an energy-efficient solution that provides a good trade-off between market price and computing capability. In this paper, we study the effect of several optimization techniques using the YASK stencil-generation framework to implement and evaluate a 25-point stencil of an elastic-wave propagation engine for Intel Xeon Phi processors. The results showed improvements of up to 7x in computations and 8x in memory bandwidth with respect to the non-tuned version, reaching up to 75% of the attainable floating-point performance at the given operational intensity. We collected performance metrics for a set of the most representative optimizations and revealed the relation between each strategy and fundamental characteristics of both code and hardware.
Large graphs analytics has been an important aspect of many big data applications, such as web search, social networks and recommendation systems. Many research focuses on processing large scale graphs using distribut...
详细信息
Large graphs analytics has been an important aspect of many big data applications, such as web search, social networks and recommendation systems. Many research focuses on processing large scale graphs using distribut...
详细信息
ISBN:
(纸本)9781509042982
Large graphs analytics has been an important aspect of many big data applications, such as web search, social networks and recommendation systems. Many research focuses on processing large scale graphs using distributed system over past few years. And numbers of studies turn to construct graph processing system on a single server-class machine in consideration of cost, usability and maintainability. HPGraph is a high parallel graph processing system which adopts the edge-centric model, our contributions are as follows: (1) designing an efficient data allocation and access strategy for NUMA machine, and providing tasks scheduling to keep load balance, (2) raising a fine-grained edge-block filtering mechanism to avoid accessing unnecessary edge data, (3) constructing a high-speed flash array as the second storage. We made a detailed evaluation on a 16-core machine using asset of popular real word and synthetic data sets, and the results show that HPGraph always outperforms the state-of-the-art single machine graph processing systems-GridGraph. And HPGraph can achieve 1.27X faster than GridGraph for specific application. Our source code is available at https://***/xinghuan1990/HPGraph.
The main aim of this work is to show, how GPGPUs can facilitate certain type of image processing methods. The software used in this paper is used to detect special tissue part, the nuclei on (HE - hematoxilin eosin) s...
详细信息
The main aim of this work is to show, how GPGPUs can facilitate certain type of image processing methods. The software used in this paper is used to detect special tissue part, the nuclei on (HE - hematoxilin eosin) s...
详细信息
The main aim of this work is to show, how GPGPUs can facilitate certain type of image processing methods. The software used in this paper is used to detect special tissue part, the nuclei on (HE - hematoxilin eosin) stained colon tissue sample images. Since pathologists are working with large number of high resolution images - thus require significant storage space -, one feasible way to achieve reasonable processing time is the usage of GPGPUs. The CUDA software development kit was used to develop processing algorithms to NVIDIA type GPUs. Our work focuses on how to achieve better performance with coalesced global memory access when working with three-channel RGB tissue images, and how to use the on-die shared memory efficiently.
This paper presents a dynamic heart model based on a parallelized space-time adaptive mesh refinement algorithm (AMRA). The spatial and temporal simulation method of the anisotropic excitable media has to achieve grea...
详细信息
ISBN:
(纸本)9783540771289
This paper presents a dynamic heart model based on a parallelized space-time adaptive mesh refinement algorithm (AMRA). The spatial and temporal simulation method of the anisotropic excitable media has to achieve great performance in distributedprocessing environment. The accuracy and efficiency of the algorithm was tested for anisotropic and inhomogeneous 3D domains using ten Tusscher's and Nygen's cardiac cell models. During propagation of depolarization wave, the kinetic, compositional and rotational anisotrophy is included in the tissue, organ and torso model. The generated inverse ECC with conventional andparallelized algorithm has the same quality, but a speedup of factor 200 can be reached using AMRA modeling and single instruction multiple data (SIMD) programming of the video cards. These results suggest that a powerful personal computer will be able to perform a one-second long simulation of the spatial electrical dynamics of the heart in approximately five minutes.
暂无评论