In a large, parallel, real-time system, high, continuous levels of performance and reliability can be achieved only if the system's dynamics are taken into account. One solution is offered by construction of an ad...
详细信息
ISBN:
(纸本)0818607378
In a large, parallel, real-time system, high, continuous levels of performance and reliability can be achieved only if the system's dynamics are taken into account. One solution is offered by construction of an adaptive system that can change its structure, both offline and during operation, to maintain reliable performance in response to arriving data on failures, request latencies, utilization, etc. Construction of such a system requires: (1) an explicit representation of its components, their interactions, and the allowable adaptations of both;and (2) algorithms and mechanisms that plan and carry out adaptations. A representation consisting of entities and relationships is presented. This representation describes the requirements of functionality, performance, and reliability imposed on the system, and the state information relevant for adaptations of a sample real-time application executing on a shared-memory multiprocessor. A distributed, dynamic adaptation algorithm for the assignment/scheduling of software components on processors is presented in order to demonstrate the feasibility of dynamic software adaptations.
An approach to the development of fault-tolerant distributedsoftware by a high-level concurrent language is discussed. The language constructs support mutual control and consensus about the decisions on the distribut...
详细信息
ISBN:
(纸本)0818605642
An approach to the development of fault-tolerant distributedsoftware by a high-level concurrent language is discussed. The language constructs support mutual control and consensus about the decisions on the distributed system state. Emphasis is placed on process structuring, parallel activation, and termination control. It is shown that both forward and backward recovery can be expressed in the proposed language. Backward recovery is based on nested atomic actions.
We present a technique that uses coverage measures in reliability estimation for fault tolerant programs, particularly N-version software. This technique exploits both coverage and time measures collected during testi...
详细信息
ISBN:
(纸本)0818681780
We present a technique that uses coverage measures in reliability estimation for fault tolerant programs, particularly N-version software. This technique exploits both coverage and time measures collected during testing phases for the individual program versions and the N-version software system for reliability prediction. The application of this technique on the single-version software was presented in our previous research. In this paper we Extend this technique and apply it on the N-version programs. The results obtained from the experiment conducted on an industrial project demonstrate that our technique significantly reduces the hazard of reliability overestimation for both single-version and multi-version fault tolerant softwaresystems.
The authors introduce a new concept in naming for distributedsystems. which is motivated by the need for easily-migratable objects, which is in turn motivated by the need for reliable operations. This naming scheme a...
详细信息
ISBN:
(纸本)0818605642
The authors introduce a new concept in naming for distributedsystems. which is motivated by the need for easily-migratable objects, which is in turn motivated by the need for reliable operations. This naming scheme attempts to solve certain problems associated with correcting remote references to objects which are subject to migration. A name is divided into two parts, of which the first is used to locate the possible current sites of a target object and the second is used to select the specific site (and hence the object) in which the object is to be found. The set of possible sites constitutes a cluster. The authors discuss how clusters and cluster-based naming can be applied to enhance the migratability, and thus reliability, of a system in our context. The same scheme may be applied for other purposes such as load-balancing, protection, and support of transparent services.
One of the key concepts available in many object-based programming languages is that of type inheritence, which permits new object types to be refined out of existing object types. It is shown how this concept can be ...
详细信息
ISBN:
(纸本)0818607378
One of the key concepts available in many object-based programming languages is that of type inheritence, which permits new object types to be refined out of existing object types. It is shown how this concept can be utilized to introduce recoverability into a system. A multilevel object-based recovery model is used that allows recoverable objects to be constructed out of recoverable and unrecoverable objects. Simple examples are used to illustrate the ideas and to demonstrate the suitability of the approach. These results are relevant to the development of distributedsystems.supportng atomic actions and recoverable objects.
Some ideas on the construction of user applications as atomic actions are developed. Atomic actions that last a long time pose several problems if conventional ideas on concurrency control and recovery are applied. Wh...
详细信息
Some ideas on the construction of user applications as atomic actions are developed. Atomic actions that last a long time pose several problems if conventional ideas on concurrency control and recovery are applied. What is required is some means of delaying commitment without sacrificing performance. A model is proposed in which it is possible for an action to release and process as yet uncommitable objects. The impact of this on recovery is also discussed.
JASMIN is a functionally distributeddatabase machine consisting of a three-level hierarchical architecture in which each level can be implemented by any number of real processors. Improving performance by the replica...
详细信息
ISBN:
(纸本)0818605642
JASMIN is a functionally distributeddatabase machine consisting of a three-level hierarchical architecture in which each level can be implemented by any number of real processors. Improving performance by the replication of data is discussed. The benefit is reduced access time resulting from increased availability of the data. The cost is the overhead of keeping copies mutually consistent. JASMIN is different in that it uses a distributed optimistic concurrency control method combined with a versioning scheme. The authors describe how they integrate replicated data into JASMIN's concurrency control, commit, and recovery subsystems.
作者:
Minoura, ToshimiUniv of Southern California
Dep of Electrical Engineering-Systems Los Angeles CA USA Univ of Southern California Dep of Electrical Engineering-Systems Los Angeles CA USA
Two schemes for realizing mutual exclusion in a distributed system are discussed. The ranking scheme is a common framework for the three mutual exclusion algorithms found in the literature. In the ranking scheme, the ...
详细信息
Two schemes for realizing mutual exclusion in a distributed system are discussed. The ranking scheme is a common framework for the three mutual exclusion algorithms found in the literature. In the ranking scheme, the site with the smallest rank number can enter the mutual exclusion state. Another method of realizing mutual exclusion is to let a single 'control token' float in the system and to allow only its holder to enter the mutual exclusion state. One result in this paper shows how the control token is effectively transferred by the ranking scheme.
Two-Phase Commit and other distributed commit protocols provide a method to commit changes while preserving consistency in a distributeddatabase. These protocols can cope with various failures occurring in the system...
详细信息
Two-Phase Commit and other distributed commit protocols provide a method to commit changes while preserving consistency in a distributeddatabase. These protocols can cope with various failures occurring in the system. But in case of failure they do not guarantee termination (of protocol processing) within a given time: sometimes the protocol requires waiting for a failed processor to be returned to operation. It happens that a straightforward use of timeouts in a distributed system is fraught with unexpected peril and does not provide an easy solution to the problem. Byzantine Agreement is combined with Two-Phase Commit, using observations of Lamport to provide a method to cope with failure within a given time bound. An extra benefit of this combination of ideas is that it handles undetected and transient faults as well as the more usual system or processor down faults handled by other distributed commit protocols.
Intuition tells us that in a distributed DBMS using two phase locking, the ratio (denoted by R/W) of read-only to update transactions affects system performance - the higher the ratio, the better the performance. Read...
详细信息
Intuition tells us that in a distributed DBMS using two phase locking, the ratio (denoted by R/W) of read-only to update transactions affects system performance - the higher the ratio, the better the performance. Read-only transactions only request share locks, and thus should cause fewer conflicts and deadlocks among all transactions. Therefore both read-only and update transactions are expected to perform better if R/W is higher. The results of a study contradicting this intuition are reported, and the relationship between the R/W ratio and system performance in detail.
暂无评论