As the electronic technology develops, the integration levels of CPUs and memories keep growing, and the speeds of communication devices are improved. The high-performance computing (HPC) systems consist of processing...
详细信息
Successive interference cancellation (SIC) is an effective technique of multipacket reception to combat interference. As not all collision are resolvable, careful transmission coordination is required. We study link s...
详细信息
Non-volatile random-access memory(NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized pr...
详细信息
Non-volatile random-access memory(NVRAM) technology is maturing rapidly and its byte-persistence feature allows the design of new and efficient fault tolerance mechanisms. In this paper we propose the versionized process(Ver P), a new process model based on NVRAM that is natively non-volatile and fault tolerant. We introduce an intermediate software layer that allows us to run a process directly on NVRAM and to put all the process states into NVRAM, and then propose a mechanism to versionize all the process data. Each piece of the process data is given a special version number, which increases with the modification of that piece of data. The version number can effectively help us trace the modification of any data and recover it to a consistent state after a system *** with traditional checkpoint methods, our work can achieve fine-grained fault tolerance at very little cost.
Predicting network latencies between Internet hosts can efficiently support large-scale Internet applications, e.g., file sharing service and the overlay construction. Several study use the Hyperbolic space to model t...
详细信息
As the burst increasing of created and demand on information and data, the efficient solution on storage management is highly required in the cloud storage systems. As an important component of management, storage all...
详细信息
Multicore systems provide potential to improve the performance of the applications. However, substantial programming effort is required to exploit the power of the parallelism. This paper presents a single source comp...
详细信息
ISBN:
(纸本)9783642133732
Multicore systems provide potential to improve the performance of the applications. However, substantial programming effort is required to exploit the power of the parallelism. This paper presents a single source compiler to map the data-parallel programs onto Cell Broadband Engine. Based on the distributed memory model, the compiler performs automatic data distribution and generates SPMD programs with message-passing primitives for Cell. We evaluate our compiler using a range of computation intensive benchmarks, high performance is achieved on Cell platform. In contrast to OpenMP, our method can fully exploit data locality through managing the shared data using inter-processor communication instead of accessing main memory, which significantly reduces the off-chip memory access overhead.
With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the us...
详细信息
With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the usability of supercomputers. This issue is referred to as the "reliability wall", which is regarded as a critical problem for current and future supercomputers. To address this problem, we propose an autonomous fault-tolerant system, named Iaso, in MilkyWay- 2 system. Iaso introduces the concept of autonomous management in supercomputers. By autonomous management, the computer itself, rather than manpower, takes charge of the fault management work. Iaso automatically manage the whole lifecycle of faults, including fault detection, fault diagnosis, fault isolation, and task recovery. Iaso endows the autonomous features with MilkyWay-2 system, such as self-awareness, self-diagnosis, self-healing, and self-protection. With the help of Iaso, the cost of fault handling in supercomputers reduces from several hours to a few seconds. Iaso greatly improves the usability and reliability of MilkyWay-2 system.
Program mode is a regular trajectory of the execution of a program that is determined by the values of its input variables. By exploiting program modes we may make Worst Case Execution Time (WCET) analysis more precis...
详细信息
ISBN:
(纸本)1595934081
Program mode is a regular trajectory of the execution of a program that is determined by the values of its input variables. By exploiting program modes we may make Worst Case Execution Time (WCET) analysis more precise. This paper presents a novel method to automatically find program modes and calculate the WCET of programs. It consists of two phases. In phase one, we firstly automatically find the modes of a program by mode-relevant program slicing;then we compute the precondition for each mode using a path-wise test data generation method;after that, we can either conclude that it is an infeasible path, or get its precondition. In phase two, we calculate the WCET estimate of each given mode for modern RISC processors with caches and pipelines. The experiments are demonstrated to show the effectiveness of the method. Copyright 2006 ACM.
The paper presents the ongoing work of studying FMEA method for embedded safety critical software via formal analysis of various dependence relations among software elements, which can fairly improve the automation an...
详细信息
ISBN:
(纸本)9780769532622
The paper presents the ongoing work of studying FMEA method for embedded safety critical software via formal analysis of various dependence relations among software elements, which can fairly improve the automation and precision of both system level and detailed level FMEA. These dependence relations are depicted by the formal models abstracted from software design and implementation, and the FMEA processes for both structural and object-oriented software are proposed respectively. The initial result of case study shows the effectiveness of the approach. I 2008 IEEE.
This paper presents a novel methodology, called COPP, to estimate available bandwidth over a given network path. COPP deploys a particular probe scheme, namely chirp of packet pairs, which is composed of several packe...
详细信息
暂无评论