GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer...
详细信息
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer in the TOP500 list, built at NUDT (National University of Defense technology) last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications. By analyzing the SIMT (single-instruction, multiple-thread) characteristics of programs running on GPGPUs, we have developed PartialRC, a new checkpoint-based compiler-directed partial recomputing method, for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs. In this paper, we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region, describe a checkpoint-based faulttolerance framework developed on PartialRC, and discuss an implementation on the CUDA platform. Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing Checkpoint-Rollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FullRC, by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average. In addition, PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens.
Machine learning is broadly used in many intelligent cybernetic systems. With the burgeoning of the communities of AI, the number of machine learning-based models is rapidly increasing, but picking a suitable and opti...
详细信息
The trimming power of on-chip optical networks consisted by 105 microrings is simulated. The total trimming power is no larger than 14W from 20°C to 100°C, if the distribution of these rings is optimized. ...
详细信息
Innovations in powerful high-performance computing (HPC) architecture are enabling high-fidelity whole-core neutron transport simulations at reasonable time. Especially, the currently fashionable heterogeneous archite...
详细信息
Monte Carlo (MC) simulation plays a key role in radiotherapy. Since the simulation time of the MC program cannot fully meet the clinical requirements, we use the ARM-based FT-2000+ multi-core processor for paralleliza...
详细信息
This paper reports our experience optimizing the performance of a high-order and high accurate Computational Fluid Dynamics (CFD) application (HOSTA) on the state of art multicore processor and the emerging Intel Many...
详细信息
Deep neural networks(DNNs)have recently shown great potential in solving partial differential equations(PDEs).The success of neural network-based surrogate models is attributed to their ability to learn a rich set of ...
详细信息
Deep neural networks(DNNs)have recently shown great potential in solving partial differential equations(PDEs).The success of neural network-based surrogate models is attributed to their ability to learn a rich set of solution-related ***,learning DNNs usually involves tedious training iterations to converge and requires a very large number of training data,which hinders the application of these models to complex physical *** address this problem,we propose to apply the transfer learning approach to DNN-based PDE solving *** our work,we create pairs of transfer experiments on Helmholtz and Navier-Stokes equations by constructing subtasks with different source terms and Reynolds *** also conduct a series of experiments to investigate the degree of generality of the features between different *** results demonstrate that despite differences in underlying PDE systems,the transfer methodology can lead to a significant improvement in the accuracy of the predicted solutions and achieve a maximum performance boost of 97.3%on widely used surrogate models.
As the big data era is coming, it brings new challenges to the massive data processing. A combination of GPU and CPU on chip is the trend to release the pressure of large scale computing. We found that there are diffe...
详细信息
Unlike Emotion Cause Extraction (ECE) task which consists of pre-annotate emotions and passage, emotion-cause pair extraction (ECPE) aims at extracting potential emotions and corresponding causes in the document witho...
详细信息
Nowadays, cloud providers of 'Infrastructure as a service' require datacenter networks to support virtualization and multi-tenancy at large scale, while it brings a grand challenge to datacenters. Traditional ...
详细信息
暂无评论