作者:
Banatre, J.P.Banatre, M.Ployette, F.INSA
Inst de Recherche en Informatique et Systemes Aleatoires Rennes Fr INSA Inst de Recherche en Informatique et Systemes Aleatoires Rennes Fr
The first part of this paper describes a distributed auction bidding system and its major requirements. The hardware and software structure of the system are quickly sketched. The second part of the paper concentrates...
详细信息
ISBN:
(纸本)0818605014
The first part of this paper describes a distributed auction bidding system and its major requirements. The hardware and software structure of the system are quickly sketched. The second part of the paper concentrates upon crash recovery aspects in the system. The three major aspects which are developed concern: (i) stable storage, (ii) implementation of commit protocol and (iii) description of recovery algorithms.
It is suggested that it is helpful to study reliable distributedsystems.from the point of view of nondeterministic sequential systems. The faulty and distributed nature of systems.can be captured by nondeterminism, s...
详细信息
ISBN:
(纸本)0818607378
It is suggested that it is helpful to study reliable distributedsystems.from the point of view of nondeterministic sequential systems. The faulty and distributed nature of systems.can be captured by nondeterminism, so there is a unity to the study of faulty and fault-free systems.faulty systems.are at one end of the spectrum and fault-free systems.are at the other. Similarly, there is a unity to the study of distributed and sequential systems. It is suggested that all systems. whether faulty or fault-free, whether distributed or sequential, can be handled in a unified way by treating the system as a nondeterministic sequential program.
作者:
York, GaryCarnegie-Mellon Univ
Dep of Electrical Engineering Pittsburgh PA USA Carnegie-Mellon Univ Dep of Electrical Engineering Pittsburgh PA USA
Many modern reliable systems.are N-modular redundancy and voting to achieve the required reliability. Most of these systems.assume that the redundant modules are synchronized. Experiments have been performed on the mu...
详细信息
ISBN:
(纸本)0818605014
Many modern reliable systems.are N-modular redundancy and voting to achieve the required reliability. Most of these systems.assume that the redundant modules are synchronized. Experiments have been performed on the multiprocessor with redundant software modules that are allowed to execute with various degrees of asynchrony. The performance of such systems.has been experimentally determined along two lines. The first experiment determined how much overhead is added to the system execution time as the voting frequency changes. The second experiment shows how much asynchrony can be tolerated in prcess execution for three different experimental paradigms. Mathematical models for the experiments have been developed that closely model the experimental results.
Efforts at achieving reliability in distributed computing include the incorporation of fault-tolerance into system software such as routing tables, and the introduction of extra software to mask processor failures, su...
详细信息
ISBN:
(纸本)0818605642
Efforts at achieving reliability in distributed computing include the incorporation of fault-tolerance into system software such as routing tables, and the introduction of extra software to mask processor failures, such as Byzantine agreement. A set of n processors running software that can tolerate k faults is called a (k,n)-resilient system. Most (k,n)-resilient systems.have a higher probability of failure in the long run, and a smaller mean time-to-failure, than one processor. Further, large systems.fail in a very well-defined period, and they fail very quickly if n is more than linear in k.
Recovery provisions in a distributed system are considered and issues of the reliability of software design are examined. As there is no generally valid system for recovery provision design, the provisions are reviewe...
详细信息
Recovery provisions in a distributed system are considered and issues of the reliability of software design are examined. As there is no generally valid system for recovery provision design, the provisions are reviewed. The cost of testing, documentation, operator training, and interface administration that are required for proper operation of the provisions are considered.
The design of a distributed processing system must include methods to handle distributed data retrieval. A considerable amount of research has been devoted to the development of algorithms that provide this function. ...
详细信息
The design of a distributed processing system must include methods to handle distributed data retrieval. A considerable amount of research has been devoted to the development of algorithms that provide this function. A survey of this research is presented and a taxonomy is introduced that highlights the significant differences among the algorithms.
Many of the special problems in distributed computing relate to the handling of exceptional conditions. In a distributed program exceptions occur as a result of transmission errors and partial failures. Any exceptiona...
详细信息
Many of the special problems in distributed computing relate to the handling of exceptional conditions. In a distributed program exceptions occur as a result of transmission errors and partial failures. Any exceptional condition that arises must be handled if distributed programs are to be robust. Various approaches are examined towards providing exception handling mechanisms for distributed applications which were incorporated into several experimental distributed operating systems. These operating systems.all support the notion that the primary software structuring tool for applications will be a collection of cooperating programs (processes) mapped onto a set of loosely coupled processors.
When a data item of a database system is updated, a typical reliable storage update mechanism performs three or four secondary storage write operations in order to update the data redundantly stored on secondary stora...
详细信息
ISBN:
(纸本)0818605014
When a data item of a database system is updated, a typical reliable storage update mechanism performs three or four secondary storage write operations in order to update the data redundantly stored on secondary storage. Thus, updating the data on secondary storage may become the bottleneck for transaction processing. We show that internal processing of transactions can be separated from secondary storage update operations, and that the database state defined by the secondary storage data can be maintained consistent if updates of transactions are moved to secondary storage according to the 'U-precedence' defined for the transaction that created them. Therefore, secondary storage update operations can be buffered as long as this requirement is satisfied. Secondary storage update buffering can enhance the performance of databasesystems.since it allows more concurrency and flexible disk scheduling.
作者:
Kim, K.H.Univ of South Florida
Dep of Computer Science & Engineering Tampa FL USA Univ of South Florida Dep of Computer Science & Engineering Tampa FL USA
One of the frequently advocated advantages of distributed computing systems.over centralized computing systems.is the improved system reliability potential. Although the application of distributed computing is current...
详细信息
One of the frequently advocated advantages of distributed computing systems.over centralized computing systems.is the improved system reliability potential. Although the application of distributed computing is currently expanding at a rapid rate, the realization of its full reliability potential still requires more fresh solutions and further understanding of many design problems. The nature of some of those design issues are briefly discussed. In order to help preventing misinterpretations while maintaining abstract tones in presentation of research issues, a model of recoverable distributed computing system structure is presented. Discussed are: error detection, hardware and software reconfiguration, the degree of coordinating distributed processes for error detection and recovery;real-time recovery and software engineering tools.
暂无评论