A recursive algorithm for computing a lower bound on the all-terminal reliability of an n-dimensional hypercube is presented. The recursive step decomposes an n-dimensional hypercube into lower dimension hypercubes th...
详细信息
A recursive algorithm for computing a lower bound on the all-terminal reliability of an n-dimensional hypercube is presented. The recursive step decomposes an n-dimensional hypercube into lower dimension hypercubes that are linked together. As an illustration of the effectiveness and power of this method, a lower bound is computed on the all-terminal reliability of the 16-dimensional hypercube (Connection Machine architecture) whose links number 219. The notation and assumptions are defined, and background information on bounding the reliability polynomial is provided. Methods for tightening these bounds for the analysis of the hypercube architecture are discussed.
The following topics are dealt with: distributed operating systems.local area networks;network fault tolerance;hypercubes;distributeddatabases;real-time systems.replicated programs;computer architectures;and voting. ...
详细信息
The following topics are dealt with: distributed operating systems.local area networks;network fault tolerance;hypercubes;distributeddatabases;real-time systems.replicated programs;computer architectures;and voting. Abstracts of individual papers can be found under the relevant classification codes in this or other issues.
Replicated execution of distributed programs, which provides a means of masking hardware (processor) failures in a distributed system, is discussed. Application-level entities (processes, objects) are replicated to ex...
详细信息
Replicated execution of distributed programs, which provides a means of masking hardware (processor) failures in a distributed system, is discussed. Application-level entities (processes, objects) are replicated to execute on distinct processors. Such replica entities communicate by message passing. Nondeterminism within the replicas could cause messages to be processed in nonidentical order, producing a divergence of state. Possible sources of nondeterminism are identified, and a generic mechanism for ensuring that nonfaulty replicas process messages in identical order, thereby preventing state divergence among such replicate entities, is presented.
A knowledge-based approach for query processing during network partitioning is proposed. The approach uses available domain and summary knowledge to infer inaccessible data to answer a given query. A rule induction te...
详细信息
A knowledge-based approach for query processing during network partitioning is proposed. The approach uses available domain and summary knowledge to infer inaccessible data to answer a given query. A rule induction technique is used to extract correlated knowledge between attributes from the database contents. This knowledge is represented as rules for data inference. On the basis of a set of queries, simulation is used to evaluate the effectiveness of the proposed data inference technique for improving data availability under network partitioning. Object allocation has a significant impact on data availability. Allocating objects that increase remote redundancy and reduce local redundancy increases data availability during network partitioning. A prototype distributeddatabase system that uses the proposed inference technique with correlated knowledge from a ship database has been implemented. Experience indicates that the proposed inference technique can significantly improve the availability of a distributeddatabase during network partitioning.
The problem of voting is studied for both the exact and inexact cases. Optimal solutions based on explicit computation of condition probabilities are given. The most commonly used strategies, i.e., majority, median, a...
详细信息
The problem of voting is studied for both the exact and inexact cases. Optimal solutions based on explicit computation of condition probabilities are given. The most commonly used strategies, i.e., majority, median, and plurality are compared quantitatively. The results show that plurality voting is the most powerful of these techniques and is, in fact, optimal for a certain class of probability distributions. An efficient method of implementing a generalized plurality voter when nonfaulty processes can produce differing answers is also given.
A fault-tolerant mutual exclusion algorithm for distributedsystems.is presented. The algorithm uses a distributed queue strategy and maintains alternative paths at each site to provide a high degree of fault toleranc...
详细信息
A fault-tolerant mutual exclusion algorithm for distributedsystems.is presented. The algorithm uses a distributed queue strategy and maintains alternative paths at each site to provide a high degree of fault tolerance. However, owing to these alternative paths, the algorithm must use reverse messages to avoid the occurrence of directed cycles, which may form when the direction of edges is reversed after the token passes through. If there is no alternative path, the total number of the messages exchanged is O(2 × log N) in light traffic and two messages in heavy traffic;however, in this case the system cannot tolerate even a single communication link or site failure. If there are alternative paths between sites, the system can achieve a higher degree of fault tolerance at the expense of increased message traffic (owing to reverse messages). Thus, there is a tradeoff between efficiency and reliability, and a system can be designed to balance these two criteria properly. A recovery procedure for restoring a recovering site consistently into the system is also presented.
ROSE, a modular distributed operating system that provides support for building reliable applications, is designed and implemented. Failure detection capabilities are provided by a failure detection server. Configurat...
详细信息
ROSE, a modular distributed operating system that provides support for building reliable applications, is designed and implemented. Failure detection capabilities are provided by a failure detection server. Configuration objects can be used to capture the relationship among multiple processes that cooperate to replicate certain resources. Replicated address space (RAS) objects, whose content is accessible with a high probability despite hardware failures, can be used to increase data availability. Finally, a resistant process (RP) abstraction allows user processes to survive hardware failures with minimal interruption. Two different implementations of RP are provided: one checkpoints the information about its state in an RAS object periodically;the other uses replicated execution by executing the same code in different nodes at the same time.
The authors present an enhancement to distributed file systems.that allows the users of the system to keep local copies of important files, decreasing the dependency over file servers. Using the notions of stashing an...
详细信息
The authors present an enhancement to distributed file systems.that allows the users of the system to keep local copies of important files, decreasing the dependency over file servers. Using the notions of stashing and quasi-copies, the system allows users to tune up the quality of the service they want to receive when the file server is not reachable. One of the key points of this work is the focus on the tradeoff between availability and degradation of service. The other main contribution is the design of a distributed file system which is ideally suited to very large distributedsystems. in that it provides users with greater tolerance of network partitions and server failures. It is emphasized that the use of stashing does not preclude the use of other performance-enhancing or fault-tolerant techniques. The file system architecture has been implemented and FACE, a prototype of a file system service based on Sun's NFS, is described. Performance figures are reported. These figures show that the overhead of providing the service is negligible. Current plans also call for porting the FACE design to a number of other processors.
A validated fault-tolerant distributed computer system architecture for real-time aerospace applications, the advanced information processing system (AIPS), is discussed. AIPS consists of fault-tolerant data processor...
详细信息
ISBN:
(纸本)0818620080
A validated fault-tolerant distributed computer system architecture for real-time aerospace applications, the advanced information processing system (AIPS), is discussed. AIPS consists of fault-tolerant data processors, fault-tolerant networks which interconnect processors to each other and to sensors, actuators, displays and subsystems. and a set of system services embodied in the system software. A performability database is one part of the total knowledgebase that will aid in the validation of the building blocks. A performability knowledgebase will also enable the system designer to configure the building blocks to meet specific mission requirements with a high degree of confidence. The performability knowledgebase consists of analytical and empirical relationships between major domains: performance, reliability, and architectural parameters. A set of performance and reliability metrics that characterize real-time distributed fault-tolerant architectures is described. The performability data obtained empirically from an engineering model of AIPS are also presented.
Traditional approaches to reliability and performance analysis become intractable when dealing with complex parallel and distributed processing systems. computer networks, and software for such systems. New approaches...
详细信息
Traditional approaches to reliability and performance analysis become intractable when dealing with complex parallel and distributed processing systems. computer networks, and software for such systems. New approaches based on Petri nets, dataflow graphs, simulations and approximations are now used in such cases. In order to extend the utility of Petri nets and dataflow graphs, the authors present a decomposition technique that can be used to partition a large system into smaller subsystems. where performance indexes of the total system can be obtained (at least approximately) from the subsystem analyses. The decomposition reduces the computational complexity of analysis significantly. The approach (using marked graph components) is similar to the concept of 'near-completely decomposable' stochastic processes.< >
暂无评论