General purpose GPU's (GPGPU) appearance made it possible that heterogeneous computing can be used by human beings. And it's also produce a reform for GPU's general purpose computing and parallel computing...
详细信息
General purpose GPU's (GPGPU) appearance made it possible that heterogeneous computing can be used by human beings. And it's also produce a reform for GPU's general purpose computing and parallel computing. Heterogeneous Systems has been adopted by large-scale of high-performance computers. Nowadays, fault tolerance technique is necessary among these large-scale kinds of scientific computing, but in a few years of GPGPU and heterogeneous system appearance, there is not an effective fault tolerance method come out, therefore, towards this situation, this paper will apply the traditional fault tolerance technique—application-level checkpointing to heterogeneous system. Cause the main solution of reducing overhead of the application-level checkpointing is reducing checkpoint data size, so after analyzing the heterogeneous system and GPGPU program, we propose a method to optimize the data storage of application-level checkpointing technique and validate its optimization by experiments.
With the popularization of multi-core processors, transaction memory, as a concurrent control mechanism with easy programing and high scalability, has attracted more and more attention. As a result, the reliability pr...
详细信息
With the popularization of multi-core processors, transaction memory, as a concurrent control mechanism with easy programing and high scalability, has attracted more and more attention. As a result, the reliability problems of transactional memory become a concerning issue. This paper addresses a transactional implementation of the Lu benchmark of SPLASH-2, and proposes a fault-tolerant Lu algorithm for this transactionalize Lu algorithm. The fault-tolerant Lu uses the data-versioning mechanism of the transactional memory system, detects errors based on transactions and recovers the error by rolling back the error transaction. The experiments show that the fault-tolerant Lu can get a better fault tolerance effect under a smaller cost.
In this paper, we consider novel anycast-based integrated routing protocol (AIRP) to reduce the cost in delay performance of communications in multihop WSNs. Without tight time synchronization or known geographic info...
详细信息
Continued increasing of fault rate in integrate circuit makes processors more susceptible to errors, especially many-core processor. Meanwhile, most systems or applications do not need full fault coverage, which has e...
详细信息
ISBN:
(纸本)9781479909735
Continued increasing of fault rate in integrate circuit makes processors more susceptible to errors, especially many-core processor. Meanwhile, most systems or applications do not need full fault coverage, which has excessive overhead. So on-demand fault tolerance is desired for these applications. In this paper, we propose an adaptive low-overhead fault tolerance mechanism for many-core system, called Device View Redundancy (DVR). It treats fault tolerance as a device that can be configured and used by application when high reliability is needed. Nevertheless, DVR exploits the idle resources for low-overhead fault tolerance, which is based on the observation that the utilization of many-core system is low due to lack of parallelism in application. Finally, the experiment shows that the performance overhead of DVR is reduced by 16% to 98% compared with full Dual Modular Redundancy (DMR).
Data skew is an important reason for the emergence of stragglers in MapReduce-like cloud systems. In this paper, we propose a Skew-Aware Task Scheduling (SATS) mechanism for iterative applications in MapReduce-like sy...
详细信息
ISBN:
(纸本)9781467356596
Data skew is an important reason for the emergence of stragglers in MapReduce-like cloud systems. In this paper, we propose a Skew-Aware Task Scheduling (SATS) mechanism for iterative applications in MapReduce-like systems. The mechanism utilizes the similarity of data distribution in adjacent iterations of iterative applications to reduce the straggle problem caused by data skew. It collects the data distribution information during the execution of tasks for the current iteration, and uses the information to guide data partitioning in tasks for the next iteration. We implement the mechanism in the HaLoop system and deploy it in a cluster. Experiments show that the proposed mechanism could deal with the data skew and improve the load balancing effectively.
parallel query processing over data streams in cloud computing environments has attracted considerable attention recently in various fields, due to the huge potential value of analyzing massive data or big data in a l...
详细信息
parallel query processing over data streams in cloud computing environments has attracted considerable attention recently in various fields, due to the huge potential value of analyzing massive data or big data in a large number of streaming applications. Nevertheless, existing studies on queries primarily focus on the algorithms for the specific query types with the lack of the general framework for processing various queries. Moreover, existing parallel frameworks in cloud such as MapReduce and its variations are not suitable for many complex queries over complex data streams. In this paper, we extensively discuss the problem of designing the general framework for parallel queries over data streams in cloud. Particularly, we propose and implement a framework called GPS, which can be well adapted to various queries over complex data streams like the uncertain data streams. Furthermore, we further propose a hierarchical and general parallel model for queries over data streams based on the proposed framework, which is more flexible than the MapReduce model. The skyline queries over uncertain data streams based on our proposed framework with real deployment are conducted as an example to verify the performances of our proposals.
A general hardware structure was proposed to accelerate variable data set management, which was designed to accept instructions flexibly and accomplish the commonly used functions and some more complicated functions o...
详细信息
A general hardware structure was proposed to accelerate variable data set management, which was designed to accept instructions flexibly and accomplish the commonly used functions and some more complicated functions of the linked-list data structure .The structure can access the data based on both pointer and address mechanism. In order to fully utilize the limited memory resources, we proposed a memory recycle scheme to reuse the memory space where the data have been deleted. Experimental results on FPGA show that our proposal can accelerate the variable data set management. Only few hardware resources were used and it consumed pretty low power. Compared with the software linked-list structure in PC, our proposal in FPGA achieved high speedups.
As the fourth passive circuit component, a memristor is a nonlinear resistor that can "remember" the amount of charge passing through it. The characteristic of "remembering" the charge and non-volatility makes mem...
详细信息
As the fourth passive circuit component, a memristor is a nonlinear resistor that can "remember" the amount of charge passing through it. The characteristic of "remembering" the charge and non-volatility makes memristors great potential candidates in many fields. Nowadays, only a few groups have the ability to fabricate memristors, and most researchers study them by theoretic analysis and simulation. In this paper, we first analyse the theoretical base and characteristics of memristors, then use a simulation program with integrated circuit emphasis as our tool to simulate the theoretical model of memristors and change the parameters in the model to see the influence of each parameter on the characteristics. Our work supplies researchers engaged in memristor-based circuits with advice on how to choose the proper parameters.
Fault resilience has became a major issue for HPC systems, in particular in the perspective of future E-scale systems, which will consist of millions of CPU cores and other components. Fault tolerant MPI was proposed ...
详细信息
Fault resilience has became a major issue for HPC systems, in particular in the perspective of future E-scale systems, which will consist of millions of CPU cores and other components. Fault tolerant MPI was proposed to offer support of software level fault tolerance approaches. However, the widely used MPI implementations, such as MPICH and Mvapich2, provide limited support for fault tolerance. This paper proposes NR-MPI, a Non-stop and Fault Resilient MPI. NR-MPI implements the semantics of FT-MPI based on MPICH. Specifically, this paper focuses on failure detection in MPI library, online failure recovery of communicators for multiple failures, friendly programming interface extending for NR-MPI. Furthermore, to support failure recovery of applications, NR-MPI implements data backup and restore interfaces based on double in-memory checkpoint/restart. We conduct experiments with NPB benchmarks on TH-1A supercomputer. Experimental results show that NR-MPI based fault tolerant programs can recover from failures online without restarting, and the overhead is small even for applications with tens of thousands of cores.
Low power is the first-class design requirement for HPC systems. Dynamic voltage and frequency scaling (DVFS) has become the commonly used and efficient technology to achieve a trade-off between power consumption and ...
详细信息
Low power is the first-class design requirement for HPC systems. Dynamic voltage and frequency scaling (DVFS) has become the commonly used and efficient technology to achieve a trade-off between power consumption and system performance. However, most the prior work using DVFS did not take into account the latency of voltage/frequency scaling, which is a critical factor in real hardware determining the power efficiency of the power management algorithm. This paper, firstly, investigate the latency features of DVFS on a real many-core hardware platform. Secondly, we propose a latency-aware DVFS algorithm for profile-based power management to avoid aggressive power state transitions. At last, we evaluate our algorithm on Intel SCC platform using a data-intensive benchmark, Graph 500 benchmark. The experimental results not only show impressive potential for energy saving in data-intensive applications (up to 31% energy saving and 60% EDP reduction), but also evaluate the efficiency of our latency-aware DVFS algorithm which achieves 12.0% extra energy saving and 5.0% extra EDP reduction, moreover, increases the execution performance by 22.4%.
暂无评论