A method for testing, debugging, and measuring distributedsystems.is described. The test method accompanies the implementation as well as the operation of distributedsystems. During implementation, the test tools al...
详细信息
ISBN:
(纸本)0818607378
A method for testing, debugging, and measuring distributedsystems.is described. The test method accompanies the implementation as well as the operation of distributedsystems. During implementation, the test tools allow users to monitor and control the tested system at different problem-oriented levels. The immense amount of information is graphically displayed in easy-to-read charts and graphs. During operation, the test system permanently monitors systems.behavior and measures system performance. The author views performance and analysis measurements during operation as an integral part of the system. The test method and tools promote an improved understanding of run-time behavior and possibly of functional requirements of distributedsystems. They provide performance measurements to derive qualitative and even quantitative assessments about distributedsystems.
The authors examine the various kinds of distributedsystems.and discuss some of the reliability issues involved. They first concentrate on the causes of unreliability, illustrating these with some general solutions a...
详细信息
ISBN:
(纸本)0818607378
The authors examine the various kinds of distributedsystems.and discuss some of the reliability issues involved. They first concentrate on the causes of unreliability, illustrating these with some general solutions and examples. Among the issues treated are interprocess communication, machine crashes, server redundancy, and data integrity. Then they examine one distributed operating system, Amoeba, to see how reliability issues have been handled in at least one real system, and how the pieces fit together.
Some of the techniques used to replicate objects for resilience in distributed operating systems.are reviewed. The problems associated with the replication of objects are discussed, and a scheme of replicated actions ...
详细信息
ISBN:
(纸本)0818607378
Some of the techniques used to replicate objects for resilience in distributed operating systems.are reviewed. The problems associated with the replication of objects are discussed, and a scheme of replicated actions and replicated objects, using a paradigm called PETs (parallel execution threads) is presented. The PET scheme not only utilizes the high availability of replicated objects, but also tolerates site failures that happen while an action is executing. It is shown how this scheme can be implemented in a distributed object-based system, and the Clouds operating system is used as an example testbed.
An approach is presented for maintaining high availability in a replicated database system with a failure-prone communications network. The status of the network is assumed to change dynamically, making the detection ...
详细信息
ISBN:
(纸本)0818607378
An approach is presented for maintaining high availability in a replicated database system with a failure-prone communications network. The status of the network is assumed to change dynamically, making the detection of partitions infeasible. The approach is based on restricting the data items transactions can access and on special requirements placed on update propagation.
A discussion is presented of access control in a heterogeneous distributeddatabase management system (DDBMS) built by integrating existing DBMSs at the nodes of a network. Often complicating the implementation of acc...
详细信息
ISBN:
(纸本)0818607378
A discussion is presented of access control in a heterogeneous distributeddatabase management system (DDBMS) built by integrating existing DBMSs at the nodes of a network. Often complicating the implementation of access control is the requirement in a heterogeneous DDBMS for site autonomy;the local DBMS at each site maintains control of the data stored at that site. Each local DBMS decides for itself if a user may access the data it manages. The problems raised by this issue are examined, and a solution is proposed.
In a large, parallel, real-time system, high, continuous levels of performance and reliability can be achieved only if the system's dynamics are taken into account. One solution is offered by construction of an ad...
详细信息
ISBN:
(纸本)0818607378
In a large, parallel, real-time system, high, continuous levels of performance and reliability can be achieved only if the system's dynamics are taken into account. One solution is offered by construction of an adaptive system that can change its structure, both offline and during operation, to maintain reliable performance in response to arriving data on failures, request latencies, utilization, etc. Construction of such a system requires: (1) an explicit representation of its components, their interactions, and the allowable adaptations of both;and (2) algorithms and mechanisms that plan and carry out adaptations. A representation consisting of entities and relationships is presented. This representation describes the requirements of functionality, performance, and reliability imposed on the system, and the state information relevant for adaptations of a sample real-time application executing on a shared-memory multiprocessor. A distributed, dynamic adaptation algorithm for the assignment/scheduling of software components on processors is presented in order to demonstrate the feasibility of dynamic software adaptations.
One of the key concepts available in many object-based programming languages is that of type inheritence, which permits new object types to be refined out of existing object types. It is shown how this concept can be ...
详细信息
ISBN:
(纸本)0818607378
One of the key concepts available in many object-based programming languages is that of type inheritence, which permits new object types to be refined out of existing object types. It is shown how this concept can be utilized to introduce recoverability into a system. A multilevel object-based recovery model is used that allows recoverable objects to be constructed out of recoverable and unrecoverable objects. Simple examples are used to illustrate the ideas and to demonstrate the suitability of the approach. These results are relevant to the development of distributedsystems.supportng atomic actions and recoverable objects.
A system is presented for 'programming in the many' that adapts language-based editors for individual programmers to support the automatic checking of semantic interdependencies among modules as they are devel...
详细信息
ISBN:
(纸本)0818607378
A system is presented for 'programming in the many' that adapts language-based editors for individual programmers to support the automatic checking of semantic interdependencies among modules as they are developed in parallel by multiple programmers on a collection of workstations distributed across a local area network. The focus is on the reliability of these distributed programming environments as some modules become inaccessible and later return to availability. The primary contributions are the decentralized control of the programming environment, firewalls, a mechanism that encapsulates individual modules to protect them from external failures, and a special network layer that enables the system to be highly available and reliable in the face of an unreliable network. The firewalls and network layer together support reestablishment of consistency among fully replicated data in the context of distributed programming environments.
A model and performance measures for a replicated data system that make use of a quorum-consensus algorithm to maintain consistency are presented. Two measures are considered: the proportion of successfully completed ...
详细信息
ISBN:
(纸本)0818607378
A model and performance measures for a replicated data system that make use of a quorum-consensus algorithm to maintain consistency are presented. Two measures are considered: the proportion of successfully completed transations in systems.where a transaction aborts if data is not available, and the mean response time in systems.where a transaction waits until data becomes available. It is shown that for some quorum assignments, the performance may degrade if the degree of replication is increased beyond a certain limit. Optimal read and write quorums that maximize the proportion of successful transactions are derived.
A description is given of Bugnet, a portable Unix system designed to debug distributed programs. It gives the user information about interprocess communication, I/O events, and execution traces of each component proce...
详细信息
ISBN:
(纸本)0818607378
A description is given of Bugnet, a portable Unix system designed to debug distributed programs. It gives the user information about interprocess communication, I/O events, and execution traces of each component process. Bugnet allows the programmer to detect an error situation, roll back to a time in the event sequence before the error, and replay events exactly as they occurred prior to the error. It reproduces real-time execution error sequences with an accuracy of 0. 2s. A graphics interface allows the user to manage process groups and monitor process interactions very conveniently.
暂无评论