We present three protocols defin ing the relationship between messages and the chan nel resources requested: request-then-hold, requestthen wait, and request-then-relinquish. Based on the three protocols, we develop a...
详细信息
In this paper, we apply the load sharing (LS) mechanism proposed in [1, 2] to HARTS, an experimental distributed realtime system [3] currently being built at the Real-Time computinglaboratory of the University of Mic...
详细信息
The authors address the problem of designing and incorporating a timeout mechanism into load sharing (LS) with state-region change broadcasts in the presence of node failures in a distributed real-time system. They fo...
详细信息
The authors address the problem of designing and incorporating a timeout mechanism into load sharing (LS) with state-region change broadcasts in the presence of node failures in a distributed real-time system. They formulate the problem of determining the best timeout period T/sub out//sup / for node i as a hypothesis testing problem, and maximize the probability of detecting node failures subject to a prespecified probability of falsely diagnosing a healthy node as faulty. They outline the LS algorithm and the proposed timeout mechanism and establish a theoretical basis for the calculation of optimal T/sub out//sup /. The simulation results show that the LS algorithm, which combines online parameter estimation, the timeout mechanism, and a few extra, timely broadcasts, can significantly reduce the probability of missing task deadlines.< >
The optimal amount of time used for retrying an instruction on detection of an error in a computing system is usually determined under the assumption that the system is composed of a single module, within which all fa...
详细信息
The optimal amount of time used for retrying an instruction on detection of an error in a computing system is usually determined under the assumption that the system is composed of a single module, within which all fault activities are confined until some module-replacement action is taken. The authors consider fault activities in multiple-module systems. They first relax the single-module assumption and model the fault activities in a multiple-module system as a Markov process. The randomization approach is applied to decompose the Markov process into a discrete-time Markov chain subordinated to a Poisson process. Using this decomposition, several interesting measures can be derived such as the conditional probability of successful retry given a retry period and the fact that a non-permanent fault has occurred, and the mean time until which all modules in the system enter a fault-free state. All the measures derived are used to determine, along with the parameters characterizing fault activities and costs of recovery techniques, whether or not retry should be used as a first-step recovery means on detection of an error; and the best retry period subject to a specific probability of successful retry.< >
The problems of intelligent response to disruptions in a decentralized manufacturing system are considered. In the absence of intelligent coordination, the response to a disruption of one part of the system may cause ...
详细信息
The problems of intelligent response to disruptions in a decentralized manufacturing system are considered. In the absence of intelligent coordination, the response to a disruption of one part of the system may cause a disruption in another part of the system. A model for the problem of recovering from this kind of disruption is given. A solution approach based on the idea of negotiation from the field of distributed artificial intelligence is presented. The model and solution approach are evaluated in the domain of job shop rescheduling.< >
We present three protocols defin ing the relationship between messages and the chan nel resources requested: request-then-hold, requestthen wait, and request-then-relinquish. Based on the three protocols, we develop a...
详细信息
We present three protocols defin ing the relationship between messages and the chan nel resources requested: request-then-hold, requestthen wait, and request-then-relinquish. Based on the three protocols, we develop an adaptive deadlockfree routing algorithm called the SP routing. The SP routing uses shortest paths and is fully-adaptive, so messages can be routed via any of the shortest paths from the source to the destination. Since it is a minimal or shortest routing, the SP routing guar antees the freedom of livelocks. The SP routing is not limited to a specific network topology. The main requirement for an applicable network topology is that there exists a deterministic, minimal, deadlock-free routing algorithm. Most ex isting network topologies are equipped with such an algorithm. In this paper, we present an adaptive deadlock-free routing agorithm for n-dimensional meshes by using the SP routing. The hardware re quired by the SP routing uses only one extra virtual channel as compared to the deterministic routing.
The authors describe a software fault injector (SFI) developed to facilitate the validation of dependability mechanisms on an experimental distributed real-time system called HARTS. SFI introduces a number of extensio...
详细信息
The authors describe a software fault injector (SFI) developed to facilitate the validation of dependability mechanisms on an experimental distributed real-time system called HARTS. SFI introduces a number of extensions to previous work done on fault injection tools. In particular, it allows combinations of fault types to be injected in the nodes of a distributed system. It also allows control of all timing parameters of the injection at each node. A description is given of the features and implementation of SFI. As a demonstration of the utility of SFI, the results of some sample experiments are presented.
The authors present an enhanced version of the real-time channel protocol for the transmission of compressed digital motion video over computer networks. This protocol can guarantee the timely delivery of video frames...
详细信息
The authors present an enhanced version of the real-time channel protocol for the transmission of compressed digital motion video over computer networks. This protocol can guarantee the timely delivery of video frames without wasting network bandwidth. Extensive simulation results have shown the protocol's superiority over the ordinary circuit/packet switching protocols.< >
The authors describe a new portable algorithm for parallel circuit extraction. The algorithm is built as part of the ongoing ProperCAD project: a portable object-oriented parallel environment for CAD applications that...
详细信息
The authors describe a new portable algorithm for parallel circuit extraction. The algorithm is built as part of the ongoing ProperCAD project: a portable object-oriented parallel environment for CAD applications that is built on top of the CHARM system. The algorithm, unlike prior approaches like PACE is asynchronous and is based on a coarse-grained dataflow execution model. Performance of circuit extraction is presented on four parallel machines: an Encore Multimax, a Sequent Symmetry, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. The extractor runs unchanged on all these machines.< >
For a system of concurrent processes that can fail by stopping, we study a generalization of the traditional binary agreement problem having more than two possible input values. We provide bounds on the number of poss...
详细信息
暂无评论