The problem of allocation and release of subcubes from a hypercube with node failures is addressed. Two algorithms are presented, both based on the Buddy allocation scheme for memory management which is also used by t...
详细信息
This paper describes an embedding of Triple Modular Redundancy (TMR) into a binary hypercube. The goal is to improve fault tolerance by masking any single-point faults. Each module of an application task is triplicate...
详细信息
ISBN:
(纸本)0897912780
This paper describes an embedding of Triple Modular Redundancy (TMR) into a binary hypercube. The goal is to improve fault tolerance by masking any single-point faults. Each module of an application task is triplicated and executed in parallel on three nodes of a 2-dimensional subcube (Q2) of the hypercube. Each of these nodes also executes a voter process. The remaining node is used for message passing only. All outputs from the triplicated modules are voted on, and the voting results are transmitted to the appropriate destination. Thus, all interunit messages are also triplicated. We propose an embedding of TMR into a hypercube which can be implemented in a manner transparent to the application program. Subcubes are allocated so that the address space for the TMR units is also a hypercube. Hence, the subcube allocation and intermodule communication schemes are defined to be analogous to the schemes used in the nonre-dundant system. The embedded system is proven to mask all single-point faults.
A connected hypercube containing faulty components (nodes or links) is called an injured hypercube. To enable non-faulty nodes to communicate with each other in an injured hypercube, the information of component failu...
详细信息
ISBN:
(纸本)0897912780
A connected hypercube containing faulty components (nodes or links) is called an injured hypercube. To enable non-faulty nodes to communicate with each other in an injured hypercube, the information of component failures must be made available to those non-faulty nodes for them to route messages around the faulty components. We develop a fault-tolerant routing scheme which requires each node to know only the information on the failure of its own links. Performance of this scheme is rigorously analyzed. This scheme is not only shown to be capable of routing messages successfully in injured hypercubes when the number of component failures is less than n, but also proved to be able to choose a shortest path with a very high probability.
It is shown how to determine closed-form expressions for task scheduling delay and active task time distributions for any real-time system application, given a scheduling policy and task execution time distributions. ...
详细信息
It is shown how to determine closed-form expressions for task scheduling delay and active task time distributions for any real-time system application, given a scheduling policy and task execution time distributions. The active task time denotes the total time a task is executing or waiting to be executed, including scheduling delays and resource contention delays. The distributions are used to determine the probability of dynamic failure and processor utilization, where the probability of dynamic failure is the probability that any task will not complete before its deadline. The opposing effects of decreasing the probability of dynamic failure and increasing utilization are also addressed. The analysis first addresses workloads where all tasks are periodic, i.e., they are repetitively triggered at constant frequencies. It is then extended to include the arrival of asynchronously triggered tasks. The effects of asynchronous tasks on the probability of dynamic failure and utilization are addressed.< >
An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. First, a common time base is established in the system using a hardware clock synchronization ...
详细信息
An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. First, a common time base is established in the system using a hardware clock synchronization algorithm. This common time base is coupled with a pseudorecovery block approach to develop a checkpointing algorithm that has the following advantages: (i) maximum process autonomy, (ii) no wait for commitment for establishing recovery lines, (iii) fewer messages to be exchanged, and (iv) less memory requirement.< >
The reliability of a real-time digital control system depends not only on the reliability of the hardware and software used, but also on the speed in executing control algorithms. The latter is due to the negative eff...
详细信息
The reliability of a real-time digital control system depends not only on the reliability of the hardware and software used, but also on the speed in executing control algorithms. The latter is due to the negative effects of computingtime delay on control system performance. For a given sampling interval, the effects of computingtime delay are classified into the delay problem and the loss problem. Analysis of these two problems is presented as a means of evaluating real-time control systems. As an example, both the self-tuning predicted (STP) control and Proportional-Integral-Derivative (PID) control are applied to the problem of tracking robot trajectories, and their respective effects of computingtime delay on control performance are comparatively evaluated. For this exmple, the STP (PID) controller is shown to outperform the PID (STP) controller in coping with the delay (loss) problem.
We propose a hardware unification array consisting of k × n fourconnected unification units to be used to speed up the process of finding suitable bindings for common variables among the predicates in a logic pro...
详细信息
The use of multiple buses can improve both the fault tolerance and performance of local area computer networks. Existing schemes either depend on active components for full connectivity or can experience decreased per...
详细信息
The use of multiple buses can improve both the fault tolerance and performance of local area computer networks. Existing schemes either depend on active components for full connectivity or can experience decreased performance as many hosts attempt to access one bus. An architecture class based on balanced incomplete block designs (BIBDs) is proposed to address these problems. A BIBD architecture uses redundant communication channels and exhibits degradable performance as faults occur. The performability of such networks is evaluated, where evaluation is based on stochastic activity network models. The results obtained are provided for comparison of BIBD network performability with that of conventional multibus networks.< >
The authors consider a general, analytic approach to the study of workload effects on computer system dependability, where the faults considered are transient and the dependability measure in question is the time to f...
详细信息
The authors consider a general, analytic approach to the study of workload effects on computer system dependability, where the faults considered are transient and the dependability measure in question is the time to failure, T/sub f/. Under these conditions, workload plays two roles with opposing effects: it can help detect/correct a correctable fault, or it can cause the system to fail by activating an uncorrectable fault. As a consequence, the overall influence of workload on T/sub f/ is difficult to evaluate intuitively. To examine this in more formal terms, the authors establish a Markov renewal process model that represents the interaction among workload and fault accumulation ins systems for which fault tolerance can be characterized by fault margins. Using this model, they consider some specific examples and show how the probabilistic nature of T/sub f/ can be formulated directly in terms of parameters regarding workload, fault arrivals, and fault margins.< >
Three factors determine the optimum configuration of a multiprocessor at any epoch: the workload, the reward structure, and the state of the computer system. An algorithm is presented for the optimal (more realistical...
详细信息
Three factors determine the optimum configuration of a multiprocessor at any epoch: the workload, the reward structure, and the state of the computer system. An algorithm is presented for the optimal (more realistically, quasi-optimal) configuration of such systems used in real-time applications with periodic reward rates and workloads. The algorithm is based on Markov decision theory. It is suggested that a change in the workload or the reward structure should be as powerful a motivation for reconfiguration as component failure. Such changes occur naturally over the course of operation: an example of an online transaction processing system with a workload and reward structure that has a period of a day is given.< >
暂无评论