The use of logs to provide recovery from failures in transaction systems.is well known. Checkpointing is also a familiar technique for speeding restart from failures. However, most work on logs and checkpointing has c...
详细信息
ISBN:
(纸本)0818605014
The use of logs to provide recovery from failures in transaction systems.is well known. Checkpointing is also a familiar technique for speeding restart from failures. However, most work on logs and checkpointing has considered only centralized systems. In this paper a logging, checkpointing, and restart mechanism is described for distributedsystems. Moreover, nested transactions are used to enhance the performance and flexibility of the design. The result is that actions occurring at different sites can be significantly decoupled while avoiding any domino effect. Further, unreliability of one site has a limited impact on performance elsewhere.
作者:
Segall, ZaryCarnegie-Mellon Univ
Computer Science Dep Pittsburgh PA USA Carnegie-Mellon Univ Computer Science Dep Pittsburgh PA USA
The ultimate test of the efficiency of mechanisms and policies employed to achieve increased performance and/or reliability in a distributed system, is provided by the evaluation of measurements taken from the real sy...
详细信息
The ultimate test of the efficiency of mechanisms and policies employed to achieve increased performance and/or reliability in a distributed system, is provided by the evaluation of measurements taken from the real system. Experimentation with multiprocessor is considered. The concept of an Integrated Instrumentation Environment (IIE) is introduced as a structured approach to facilitate the process of experimentation. The design presented emphasizes the integration of instrumentation tools such as stimulus generation and monitoring into a unified experiment management environment. An experiment schema is introduced as an appropriate structuring concept for experiment management purposes. Schema instances are introduced to capture the results of an experiment for later analysis.
The initial design of three modules of DDTS (distributeddatabase Testbed System) is presented. The DDTS emphasizes modularity and independence of modules so that it may be used to experimentally study the effects of ...
详细信息
The initial design of three modules of DDTS (distributeddatabase Testbed System) is presented. The DDTS emphasizes modularity and independence of modules so that it may be used to experimentally study the effects of different algorithms at each module. DDTS architecture and transactions are considered, with special attention to information architecture (IA) and system architecture (SA).
Two design rules which aid the construction of distributed computing systems.and the provision of fault tolerance are described, namely that: (i) a distributed computing system should be functionally equivalent to the...
详细信息
ISBN:
(纸本)0818605014
Two design rules which aid the construction of distributed computing systems.and the provision of fault tolerance are described, namely that: (i) a distributed computing system should be functionally equivalent to the individual computing systems.of which it is composed, and (ii) fault tolerant systems.should be constructed from generalized fault tolerant components. The reasoning behind these two 'recursive structuring principles', and the consequences of attempting to adhere to them, are discussed. Where appropriate this discussion is illustrated by reference to a distributed system based on UNIX that is now operational at Newcastle and several other locations. This system has been implemented by adding a software subsystem, known as the Newcastle Connection, to each of a set of UNIX systems. By this means the authors has constructed a distributed system which is functionally equivalent at both the user and the program level to a conventional uniprocessor UNIX system.
A replicated database system is a distributeddatabase system in which some data objects are stored redundantly at multiple sites to improve the reliability of the system. Without proper control mechanisms, the consis...
详细信息
ISBN:
(纸本)0818606908
A replicated database system is a distributeddatabase system in which some data objects are stored redundantly at multiple sites to improve the reliability of the system. Without proper control mechanisms, the consistency of a replicated database system might be violated. A scheme to increase the reliability as well as the degree of concurrency is described. It allows transactions to operate on a data object if more than one token copies are available. The scheme also exploits the fact that, for recovery reasons, there are two values for one data object. Proof that the proposed scheme guarantees consistency is provided. Some of variations of the scheme are discussed.
Before the heralded potential of distributed computer systems.can be realized, the system must be made robust in the face of processor failures. Reassigning the work of a failed processor so that system performance de...
详细信息
ISBN:
(纸本)0818606908
Before the heralded potential of distributed computer systems.can be realized, the system must be made robust in the face of processor failures. Reassigning the work of a failed processor so that system performance degrades gracefully is one of the most important problems in designing reliable distributedsystems. The authors present an algorithm for reassigning the work of a failed processor that attempts to minimize the increased cost caused by the redistribution. This algorithm is based on a technique known as clustering. The authors also present a comprehensive cost function, and discuss its applicability to 'real' systems.
Some of the techniques used to replicate objects for resilience in distributed operating systems.are reviewed. The problems associated with the replication of objects are discussed, and a scheme of replicated actions ...
详细信息
ISBN:
(纸本)0818607378
Some of the techniques used to replicate objects for resilience in distributed operating systems.are reviewed. The problems associated with the replication of objects are discussed, and a scheme of replicated actions and replicated objects, using a paradigm called PETs (parallel execution threads) is presented. The PET scheme not only utilizes the high availability of replicated objects, but also tolerates site failures that happen while an action is executing. It is shown how this scheme can be implemented in a distributed object-based system, and the Clouds operating system is used as an example testbed.
Task and file allocation are examined in two classes of fault-tolerant distributedsystems. The task allocation problem arises in software-implemented fault tolerance (SIFT)-like systems. while the file allocation pro...
详细信息
Task and file allocation are examined in two classes of fault-tolerant distributedsystems. The task allocation problem arises in software-implemented fault tolerance (SIFT)-like systems. while the file allocation problem arises in Ethernet-like systems. Both problems may be formulated as a constrained sum of squares minimization problem. The computational complexity of these problems prompts us to consider an efficient approximation algorithm that does not always yield optimal answers. It is shown that the ratio of the approximate to the optimal solution is bounded by 9m/8(m minus r plus 1), where m is the number of processors (file servers) to be allocated and r is the number of times each task (file) is to be replicated. Experience with the algorithm suggests that ever better performance ratios can be expected.
This paper describes an object-oriented design model for structuring reliable distributedsystems. A system is viewed as a collection of objects that are accessed and modified by transactions. Recovery techniques are ...
详细信息
ISBN:
(纸本)0818605014
This paper describes an object-oriented design model for structuring reliable distributedsystems. A system is viewed as a collection of objects that are accessed and modified by transactions. Recovery techniques are incorporated to make transactions atomic in the presence of component crashes and concurrent operations. Atomicity of transactions is based on constructing recoverable objects using multiple versions and commit protocols. These concepts are extended to nested transactions. The operations on distributed objects are performed as remote procedure calls. This requires implementation of remote procedure calls in a reliable fashion. The facilities of reliable nested transactions and remote procedure calls are used to synthesize distributed objects that are highly reliable.
作者:
Lin, James J.Liu, Ming T.Ohio State Univ
Dep of Computer & Information Science Columbus OH USA Ohio State Univ Dep of Computer & Information Science Columbus OH USA
The system design and performance evaluation of a local data network for very large distributeddatabases. The growing database problem stimulates the need of hardware support for data management in distributed system...
详细信息
The system design and performance evaluation of a local data network for very large distributeddatabases. The growing database problem stimulates the need of hardware support for data management in distributedsystems. A novel hardware configuration, the distributed Double-Loop Data Network (DDLDN), is exemplified. Concurrency control mechanisms and query processing techniques used in the DDLDN are described. Optimal strategy for disk allocation is selected. A performance comparison is made for different types of systems.under various conditions, showing superior performance of the DDLDN. Finally, a way to cope with potential growth of the system is demonstrated.
暂无评论