The proceedings contain 34 papers. The topics discussed include: evaluating HACMP/6000: a clustering solution for high availability distributedsystems;hardware and software fault tolerance using fail-silent virtual d...
ISBN:
(纸本)0818668075
The proceedings contain 34 papers. The topics discussed include: evaluating HACMP/6000: a clustering solution for high availability distributedsystems;hardware and software fault tolerance using fail-silent virtual duplex systems;dynamic autonomous redundancy management strategy for balanced graceful degradation;optimal message log reclamation for uncoordinated checkpointing;efficient checkpointing over local area networks;an efficient coordinated checkpointing scheme for multicomputers;redundant linked list based cache coherence protocol;and roll-forward recovery: the bidirectional cache approach.
In the era of big data, efficiently processing and retrieving insights from unstructured data presents a critical challenge. This paper introduces a scalable leader-worker distributed data pipeline designed to handle ...
详细信息
In this paper we present the technical issues of a temporal model for fault-tolerantparallel programs. We present successively the formal aspects of this temporal model, and an algorithm that we have developed to det...
详细信息
In this paper we present the technical issues of a temporal model for fault-tolerantparallel programs. We present successively the formal aspects of this temporal model, and an algorithm that we have developed to detect errors in parallel programs running on a parallel architecture with shared memory. A simple example is given to illustrate the model and the algorithm.
Energy-efficient task allocation and scheduling schemes with deterministic fault-tolerant capabilities are proposed for symmetric multiprocessor systems executing tasks with hard real-time constraints. The proposed he...
详细信息
Energy-efficient task allocation and scheduling schemes with deterministic fault-tolerant capabilities are proposed for symmetric multiprocessor systems executing tasks with hard real-time constraints. The proposed heuristic achieves energy savings by optimally balancing the application workload among processors in a system. Based on the observation that a fault-free operation is expected to remain dominant in the near future and the probability of the worst case faults is low, an optimistic fault-tolerant heuristic is then proposed to achieve optimum energy savings in the absence of faults and meet application timing requirements in the worst case faults at the cost of energy inefficiency. Extensive simulation experiment results show that when compared to the state-of-art schemes, the proposed optimistic heuristic achieves average energy savings of up to 70 percent and exhibits higher tolerance to variations in application utilizations and more resilience to fault occurrences beyond system specification.
作者:
Maier, J.
Stuttgart Univ. Breitwiesenstrasse 20-22 StuttgartD-70565 Germany
The Pact (parallel actions) parallel programming environment provides an easy-to-use parallel execution and synchronization model based on task parallelization. To give the programmer an abstraction for global data (e...
详细信息
A real-time fault-tolerant multicast protocol is necessary for obtaining high performance in operating distributed real-time computing systems. The purpose of this paper is to show the efficiency of RFRM (Release-time...
详细信息
ISBN:
(纸本)0769515762
A real-time fault-tolerant multicast protocol is necessary for obtaining high performance in operating distributed real-time computing systems. The purpose of this paper is to show the efficiency of RFRM (Release-time based fault-tolerant Real-time Multicast) protocol which is based on the idea of attaching the official release time to each multicast message. As a part of this, a real-time simulation based on the TMO structuring scheme is conducted to evaluate the proposed approach. We experiment a real-time multicast model which does not receive ack-messages toward reducing the message traffic on the network by employing fault detection mechanism. Simulation results promised the efficiency of the proposed real-time multicast protocol.
This paper concerns the quantification of the fault tolerance of computer networks, where at least n-1 of the networks n nodes need to be intact and connected for mission success. The fault tree construction process i...
详细信息
fault tolerance protocols can be checked for design faults by injecting operation fault cases. We present a scheme for generating these operation fault cases on the basis of a reachability analysis. Thereby, we aim at...
详细信息
Replication of information among multiple servers is necessary to service requests for Web application such as Internet banking. A dispatcher in distributed Web systems distributes client requests among Web applicatio...
详细信息
ISBN:
(纸本)0769523471
Replication of information among multiple servers is necessary to service requests for Web application such as Internet banking. A dispatcher in distributed Web systems distributes client requests among Web application servers and multiple dispatchers are also needed for fault-tolerant Web services. In this paper, we describe issues related to building fault-tolerantdistributed Web systems. We evaluate the performance of fault-tolerantdistributed Web systems based on replication. Our evaluation is conducted on LVS(Linux. Virtual Server) and the Apache Web server using the request generator, LoadCube. We show some performance measurements for the systems.
暂无评论