The following topics are dealt with: fault-tolerant local area networks;distributeddatabasereliability;fault-tolerant distributedsystems.software fault tolerance;and computer system reliability. 26 papers were pres...
详细信息
ISBN:
(纸本)0818606908
The following topics are dealt with: fault-tolerant local area networks;distributeddatabasereliability;fault-tolerant distributedsystems.software fault tolerance;and computer system reliability. 26 papers were presented, all of which are published in full in the present proceedings.
作者:
York, GaryCarnegie-Mellon Univ
Dep of Electrical Engineering Pittsburgh PA USA Carnegie-Mellon Univ Dep of Electrical Engineering Pittsburgh PA USA
Many modern reliable systems.are N-modular redundancy and voting to achieve the required reliability. Most of these systems.assume that the redundant modules are synchronized. Experiments have been performed on the mu...
详细信息
ISBN:
(纸本)0818605014
Many modern reliable systems.are N-modular redundancy and voting to achieve the required reliability. Most of these systems.assume that the redundant modules are synchronized. Experiments have been performed on the multiprocessor with redundant software modules that are allowed to execute with various degrees of asynchrony. The performance of such systems.has been experimentally determined along two lines. The first experiment determined how much overhead is added to the system execution time as the voting frequency changes. The second experiment shows how much asynchrony can be tolerated in prcess execution for three different experimental paradigms. Mathematical models for the experiments have been developed that closely model the experimental results.
An algorithm for dynamic planning of recovery line is specified. A computational model is defined for a distributed system of communicating processes using asynchronous message passing, and the recovery algorithms are...
详细信息
ISBN:
(纸本)0818605642
An algorithm for dynamic planning of recovery line is specified. A computational model is defined for a distributed system of communicating processes using asynchronous message passing, and the recovery algorithms are described by means of axioms.
A database management system was designed and implemented for the 5ESS switching system. The database is distributed and subject to stringent real time constraints and reliability requirements. A relational model is c...
详细信息
ISBN:
(纸本)0818605014
A database management system was designed and implemented for the 5ESS switching system. The database is distributed and subject to stringent real time constraints and reliability requirements. A relational model is chosen for this specialized database management system. This paper focuses on the reliability aspects of the design and how the balance between real-time constraints and reliability requirements is maintained. The concurrency control mechanism plays a significant role in achieving that balance by providing real-time access to call processing and also maintaining a consistent view of the data.
In this paper the authors briefly present the design of a distributed relational data base system. Then, they discuss experimental observations of the performance of that system executing both short and long commands....
详细信息
In this paper the authors briefly present the design of a distributed relational data base system. Then, they discuss experimental observations of the performance of that system executing both short and long commands. Conclusions are also drawn concerning metrics that distributed query processing heuristics should attempt to minimize. Lastly, they comment on architectures which appear viable for distributed data base applications.
Two-Phase Commit and other distributed commit protocols provide a method to commit changes while preserving consistency in a distributeddatabase. These protocols can cope with various failures occurring in the system...
详细信息
Two-Phase Commit and other distributed commit protocols provide a method to commit changes while preserving consistency in a distributeddatabase. These protocols can cope with various failures occurring in the system. But in case of failure they do not guarantee termination (of protocol processing) within a given time: sometimes the protocol requires waiting for a failed processor to be returned to operation. It happens that a straightforward use of timeouts in a distributed system is fraught with unexpected peril and does not provide an easy solution to the problem. Byzantine Agreement is combined with Two-Phase Commit, using observations of Lamport to provide a method to cope with failure within a given time bound. An extra benefit of this combination of ideas is that it handles undetected and transient faults as well as the more usual system or processor down faults handled by other distributed commit protocols.
The use of logs to provide recovery from failures in transaction systems.is well known. Checkpointing is also a familiar technique for speeding restart from failures. However, most work on logs and checkpointing has c...
详细信息
ISBN:
(纸本)0818605014
The use of logs to provide recovery from failures in transaction systems.is well known. Checkpointing is also a familiar technique for speeding restart from failures. However, most work on logs and checkpointing has considered only centralized systems. In this paper a logging, checkpointing, and restart mechanism is described for distributedsystems. Moreover, nested transactions are used to enhance the performance and flexibility of the design. The result is that actions occurring at different sites can be significantly decoupled while avoiding any domino effect. Further, unreliability of one site has a limited impact on performance elsewhere.
Two design rules which aid the construction of distributed computing systems.and the provision of fault tolerance are described, namely that: (i) a distributed computing system should be functionally equivalent to the...
详细信息
ISBN:
(纸本)0818605014
Two design rules which aid the construction of distributed computing systems.and the provision of fault tolerance are described, namely that: (i) a distributed computing system should be functionally equivalent to the individual computing systems.of which it is composed, and (ii) fault tolerant systems.should be constructed from generalized fault tolerant components. The reasoning behind these two 'recursive structuring principles', and the consequences of attempting to adhere to them, are discussed. Where appropriate this discussion is illustrated by reference to a distributed system based on UNIX that is now operational at Newcastle and several other locations. This system has been implemented by adding a software subsystem, known as the Newcastle Connection, to each of a set of UNIX systems. By this means the authors has constructed a distributed system which is functionally equivalent at both the user and the program level to a conventional uniprocessor UNIX system.
A replicated database system is a distributeddatabase system in which some data objects are stored redundantly at multiple sites to improve the reliability of the system. Without proper control mechanisms, the consis...
详细信息
ISBN:
(纸本)0818606908
A replicated database system is a distributeddatabase system in which some data objects are stored redundantly at multiple sites to improve the reliability of the system. Without proper control mechanisms, the consistency of a replicated database system might be violated. A scheme to increase the reliability as well as the degree of concurrency is described. It allows transactions to operate on a data object if more than one token copies are available. The scheme also exploits the fact that, for recovery reasons, there are two values for one data object. Proof that the proposed scheme guarantees consistency is provided. Some of variations of the scheme are discussed.
Before the heralded potential of distributed computer systems.can be realized, the system must be made robust in the face of processor failures. Reassigning the work of a failed processor so that system performance de...
详细信息
ISBN:
(纸本)0818606908
Before the heralded potential of distributed computer systems.can be realized, the system must be made robust in the face of processor failures. Reassigning the work of a failed processor so that system performance degrades gracefully is one of the most important problems in designing reliable distributedsystems. The authors present an algorithm for reassigning the work of a failed processor that attempts to minimize the increased cost caused by the redistribution. This algorithm is based on a technique known as clustering. The authors also present a comprehensive cost function, and discuss its applicability to 'real' systems.
暂无评论