Allocation and deallocation of subcubes usually result in a fragmented hypercube where even if a sufficient number of hypercube nodes are available, they do not form a subeube large enough to execute an incoming task....
详细信息
Allocation and deallocation of subcubes usually result in a fragmented hypercube where even if a sufficient number of hypercube nodes are available, they do not form a subeube large enough to execute an incoming task. As the fragmentation in conventional memory allocation can be handled by memory compaction, the fragmentation problem in a hypercube can be solved by task migration, i.e., relocating tasks within the hypercubc to remove the fragmentation. The procedure for task migration closely depends on the subcube allocation strategy used, since active tasks must be relocated in such a way that the availability of subcubcs can be detected by that allocation strategy. In this paper, we develop a task migration strategy for the sub cube allocation policy based on the binary reflected Gray code. A goal configuration (of destination subeubes) without fragmentation is determined first. Then, the node-mapping between the source and destination subcubes is derived. Finally, a routing procedure to achieve shortest deadlock-free paths for relocating tasks is developed.
A methodology is presented for evaluating fault-tolerant systems when workloads and fault arrivals are not time-homogeneous. Of particular interests are systems whose environments vary considerably between different u...
详细信息
A methodology is presented for evaluating fault-tolerant systems when workloads and fault arrivals are not time-homogeneous. Of particular interests are systems whose environments vary considerably between different utilization phases of random duration. In such cases, evaluations of overall system performability must account for the corresponding differences in workload effects, especially with regard to fault recovery. The proposed methodology uses analytic techniques based on Markov processes and stochastic activity networks. Examples of evaluation studies, using this approach, are presented. These include evaluation of a system wherein self-exercising is varied between phases of passive and active use.< >
This paper documents an experiment performed by The Johns Hopkins University Applied Physics laboratory to measure the effect of inserting a data bus into a combat system. The experiment was conducted at the Aegis Com...
详细信息
This paper documents an experiment performed by The Johns Hopkins University Applied Physics laboratory to measure the effect of inserting a data bus into a combat system. The experiment was conducted at the Aegis computer Center located at the Naval Surface Weapons Center in Dahlgren, Virginia (NSWC/DL). The purpose of the experiment was to determine whether or not the Aegis Weapon System (the core of the Aegis Combat System) could be operated with a portion of its point-to-point interelement cables replaced by a data bus. The data bus chosen for the experiment employs message broadcasting with receiver selection. A primary goal of the experiment was to minimize the amount of Aegis computer program changes required to accommodate the data bus. The results presented in this paper will show that the experiment was a success. Key certification tests were passed with no computer program changes to the tactical elements and minimal changes in the Aegis tactical executive (ATES) program (less than 110 words changed).
The problem of allocation and release of subcubes from a hypercube with node failures is addressed. Two algorithms are presented, both based on the Buddy allocation scheme for memory management which is also used by t...
详细信息
This paper describes an embedding of Triple Modular Redundancy (TMR) into a binary hypercube. The goal is to improve fault tolerance by masking any single-point faults. Each module of an application task is triplicate...
详细信息
ISBN:
(纸本)0897912780
This paper describes an embedding of Triple Modular Redundancy (TMR) into a binary hypercube. The goal is to improve fault tolerance by masking any single-point faults. Each module of an application task is triplicated and executed in parallel on three nodes of a 2-dimensional subcube (Q2) of the hypercube. Each of these nodes also executes a voter process. The remaining node is used for message passing only. All outputs from the triplicated modules are voted on, and the voting results are transmitted to the appropriate destination. Thus, all interunit messages are also triplicated. We propose an embedding of TMR into a hypercube which can be implemented in a manner transparent to the application program. Subcubes are allocated so that the address space for the TMR units is also a hypercube. Hence, the subcube allocation and intermodule communication schemes are defined to be analogous to the schemes used in the nonre-dundant system. The embedded system is proven to mask all single-point faults.
A connected hypercube containing faulty components (nodes or links) is called an injured hypercube. To enable non-faulty nodes to communicate with each other in an injured hypercube, the information of component failu...
详细信息
ISBN:
(纸本)0897912780
A connected hypercube containing faulty components (nodes or links) is called an injured hypercube. To enable non-faulty nodes to communicate with each other in an injured hypercube, the information of component failures must be made available to those non-faulty nodes for them to route messages around the faulty components. We develop a fault-tolerant routing scheme which requires each node to know only the information on the failure of its own links. Performance of this scheme is rigorously analyzed. This scheme is not only shown to be capable of routing messages successfully in injured hypercubes when the number of component failures is less than n, but also proved to be able to choose a shortest path with a very high probability.
It is shown how to determine closed-form expressions for task scheduling delay and active task time distributions for any real-time system application, given a scheduling policy and task execution time distributions. ...
详细信息
It is shown how to determine closed-form expressions for task scheduling delay and active task time distributions for any real-time system application, given a scheduling policy and task execution time distributions. The active task time denotes the total time a task is executing or waiting to be executed, including scheduling delays and resource contention delays. The distributions are used to determine the probability of dynamic failure and processor utilization, where the probability of dynamic failure is the probability that any task will not complete before its deadline. The opposing effects of decreasing the probability of dynamic failure and increasing utilization are also addressed. The analysis first addresses workloads where all tasks are periodic, i.e., they are repetitively triggered at constant frequencies. It is then extended to include the arrival of asynchronously triggered tasks. The effects of asynchronous tasks on the probability of dynamic failure and utilization are addressed.< >
An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. First, a common time base is established in the system using a hardware clock synchronization ...
详细信息
An approach to checkpointing and rollback recovery in a distributed computing system using a common time base is proposed. First, a common time base is established in the system using a hardware clock synchronization algorithm. This common time base is coupled with a pseudorecovery block approach to develop a checkpointing algorithm that has the following advantages: (i) maximum process autonomy, (ii) no wait for commitment for establishing recovery lines, (iii) fewer messages to be exchanged, and (iv) less memory requirement.< >
The reliability of a real-time digital control system depends not only on the reliability of the hardware and software used, but also on the speed in executing control algorithms. The latter is due to the negative eff...
详细信息
The reliability of a real-time digital control system depends not only on the reliability of the hardware and software used, but also on the speed in executing control algorithms. The latter is due to the negative effects of computingtime delay on control system performance. For a given sampling interval, the effects of computingtime delay are classified into the delay problem and the loss problem. Analysis of these two problems is presented as a means of evaluating real-time control systems. As an example, both the self-tuning predicted (STP) control and Proportional-Integral-Derivative (PID) control are applied to the problem of tracking robot trajectories, and their respective effects of computingtime delay on control performance are comparatively evaluated. For this exmple, the STP (PID) controller is shown to outperform the PID (STP) controller in coping with the delay (loss) problem.
We propose a hardware unification array consisting of k × n fourconnected unification units to be used to speed up the process of finding suitable bindings for common variables among the predicates in a logic pro...
详细信息
暂无评论