A description is given of the results of a study of methods of achieving fault tolerance in the Clouds system and, in particular, of achieving increased availability of objects. The problems explored in this work, the...
详细信息
ISBN:
(纸本)0818608757
A description is given of the results of a study of methods of achieving fault tolerance in the Clouds system and, in particular, of achieving increased availability of objects. The problems explored in this work, the model of distributed computation in which the problems posed by the research were examined (the Clouds system), the tools that were used to address these problems (the Aeolus programming language), and some related research are briefly described. The authors present a methodology for achieving available services by conversion of resilient single-site implementations into replicated implementations. A mechanism with which they propose to support this methodology, called distributed locking (DL), is presented. A description is also given of a linguistic feature for the specification of the availability properties of an object replicated via DL. The language runtime support features (primitives) required for DL and the operating system support needed for these features are presented.
A checkpoint algorithm is presented that benefits from the research in concurrency control, commit, and site recovery algorithms in transaction processing. In the authors' approach a number of checkpointing proces...
详细信息
ISBN:
(纸本)0818608757
A checkpoint algorithm is presented that benefits from the research in concurrency control, commit, and site recovery algorithms in transaction processing. In the authors' approach a number of checkpointing processes, a number of rollback processes, and computations on operational processes can proceed concurrently while tolerating the failure of an arbitrary number of processes. Each process takes checkpoints independently. During recovery after a failure, a process invokes a two-phase rollback algorithm. It collects information about relevant message exchanges in the system in the first phase and uses it in the second phase to determine both the set of processes that must roll back and the set of checkpoints up to which rollback must occur. Concurrent rollbacks are completed in the order of the priorities of the recovering processes. The proposed solution is optimistic in the sense that it does well if failures are infrequent by minimizing overhead during normal processing.
In real-time databasesystems. a transaction may not have enough time to complete. In such cases, partial, or imprecise, results can still be produced. The authors have proposed an imprecise result mechanism for produ...
详细信息
In real-time databasesystems. a transaction may not have enough time to complete. In such cases, partial, or imprecise, results can still be produced. The authors have proposed an imprecise result mechanism for producing partial results, which is used to implement timing error recovery in real-time databasesystems. They also present a model of real-time systems.that distinguishes the external data consistency from the internal data consistency maintained by non-real-time systems. Providing a timely response may require sacrificing internal consistency. The authors discuss three examples that have different requirements of data consistency and present algorithms for implementing them.< >
The following topics are dealt with: reliability issues in distributed operating systems.communications and control;distributedsystems.replicated data reliability;object-based systems.concurrency and synchronization;...
详细信息
ISBN:
(纸本)0818607378
The following topics are dealt with: reliability issues in distributed operating systems.communications and control;distributedsystems.replicated data reliability;object-based systems.concurrency and synchronization;representing faulty distributedsystems.as nondeterministic sequential systems.algorithms for maintaining data availability and agreement. 20 papers were presented, all of which are published in full in the present proceedings.
It is suggested that it is helpful to study reliable distributedsystems.from the point of view of nondeterministic sequential systems. The faulty and distributed nature of systems.can be captured by nondeterminism, s...
详细信息
ISBN:
(纸本)0818607378
It is suggested that it is helpful to study reliable distributedsystems.from the point of view of nondeterministic sequential systems. The faulty and distributed nature of systems.can be captured by nondeterminism, so there is a unity to the study of faulty and fault-free systems.faulty systems.are at one end of the spectrum and fault-free systems.are at the other. Similarly, there is a unity to the study of distributed and sequential systems. It is suggested that all systems. whether faulty or fault-free, whether distributed or sequential, can be handled in a unified way by treating the system as a nondeterministic sequential program.
A method for testing, debugging, and measuring distributedsystems.is described. The test method accompanies the implementation as well as the operation of distributedsystems. During implementation, the test tools al...
详细信息
ISBN:
(纸本)0818607378
A method for testing, debugging, and measuring distributedsystems.is described. The test method accompanies the implementation as well as the operation of distributedsystems. During implementation, the test tools allow users to monitor and control the tested system at different problem-oriented levels. The immense amount of information is graphically displayed in easy-to-read charts and graphs. During operation, the test system permanently monitors systems.behavior and measures system performance. The author views performance and analysis measurements during operation as an integral part of the system. The test method and tools promote an improved understanding of run-time behavior and possibly of functional requirements of distributedsystems. They provide performance measurements to derive qualitative and even quantitative assessments about distributedsystems.
The authors examine the various kinds of distributedsystems.and discuss some of the reliability issues involved. They first concentrate on the causes of unreliability, illustrating these with some general solutions a...
详细信息
ISBN:
(纸本)0818607378
The authors examine the various kinds of distributedsystems.and discuss some of the reliability issues involved. They first concentrate on the causes of unreliability, illustrating these with some general solutions and examples. Among the issues treated are interprocess communication, machine crashes, server redundancy, and data integrity. Then they examine one distributed operating system, Amoeba, to see how reliability issues have been handled in at least one real system, and how the pieces fit together.
Some of the techniques used to replicate objects for resilience in distributed operating systems.are reviewed. The problems associated with the replication of objects are discussed, and a scheme of replicated actions ...
详细信息
ISBN:
(纸本)0818607378
Some of the techniques used to replicate objects for resilience in distributed operating systems.are reviewed. The problems associated with the replication of objects are discussed, and a scheme of replicated actions and replicated objects, using a paradigm called PETs (parallel execution threads) is presented. The PET scheme not only utilizes the high availability of replicated objects, but also tolerates site failures that happen while an action is executing. It is shown how this scheme can be implemented in a distributed object-based system, and the Clouds operating system is used as an example testbed.
An approach is presented for maintaining high availability in a replicated database system with a failure-prone communications network. The status of the network is assumed to change dynamically, making the detection ...
详细信息
ISBN:
(纸本)0818607378
An approach is presented for maintaining high availability in a replicated database system with a failure-prone communications network. The status of the network is assumed to change dynamically, making the detection of partitions infeasible. The approach is based on restricting the data items transactions can access and on special requirements placed on update propagation.
A discussion is presented of access control in a heterogeneous distributeddatabase management system (DDBMS) built by integrating existing DBMSs at the nodes of a network. Often complicating the implementation of acc...
详细信息
ISBN:
(纸本)0818607378
A discussion is presented of access control in a heterogeneous distributeddatabase management system (DDBMS) built by integrating existing DBMSs at the nodes of a network. Often complicating the implementation of access control is the requirement in a heterogeneous DDBMS for site autonomy;the local DBMS at each site maintains control of the data stored at that site. Each local DBMS decides for itself if a user may access the data it manages. The problems raised by this issue are examined, and a solution is proposed.
暂无评论