This paper presents a fault-tolerant routing algorithm that employs a modified distributed recovery block (DRB) approach. The section of a parallel or distributed system spanning between the source and destination nod...
详细信息
ISBN:
(纸本)0780375149
This paper presents a fault-tolerant routing algorithm that employs a modified distributed recovery block (DRB) approach. The section of a parallel or distributed system spanning between the source and destination nodes is partitioned into a series of overlapping DRB groups. Each DRB group consists of three nodes: a current node and two successor nodes. Primary successor executes the primary try while alternate successor executes an alternate try. ne primary successor node delivers the message, whereas the alternate is ready to take over if the primary fails. The successful successor in an active DRB group becomes the current node of the next DRB group on the routing path. A prototype version of the routing method is implemented for a hypercube topology and its performance is compared with adaptive routing techniques based on backtracking.
An approach to fault-tolerant execution of real-time application tasks in hypercubes is proposed. The proposed approach is based on the distributed recovery block (DRB) scheme that was developed earlier and experiment...
详细信息
An approach to fault-tolerant execution of real-time application tasks in hypercubes is proposed. The proposed approach is based on the distributed recovery block (DRB) scheme that was developed earlier and experimentally demonstrated. The approach does not require special hardware mechanisms in support of fault tolerance. In this approach, each task is assigned to a pair of processors forming a DRB computing station for execution in a dual-redundant and self-checking mode. Assignment of all tasks in an application in such a form is called the full DRB mapping. The DRB scheme was developed as an approach to uniform treatment of hardware and software faults with the effect of fast forward recovery. However, if the system developer is concerned with hardware fault possibilities only, but not with possible inadequacies of the application task software, then forming DRB stations becomes a mechanical process not burdening the application software designer in any way. A procedure for converting an efficient nonredundant task-to-processor mapping into an efficient full DRB mapping is presented and its optimality is proven. As an illustration of this procedure called the symmetric duplexing, optimal strategies for full DRB mapping of complete binary tree task structures to hypercubes are obtained.
A testbed-based approach to the evaluation of fault-tolerant distributed computing schemes is discussed. The approach is based on experimental incorporation of system structuring and design techniques into real-time d...
详细信息
A testbed-based approach to the evaluation of fault-tolerant distributed computing schemes is discussed. The approach is based on experimental incorporation of system structuring and design techniques into real-time distributed computing testbeds centered around tightly coupled microcomputer networks. The effectiveness of this approach has been confirmed through some experiments conducted in the author's laboratory. Primary advantages of the testbed-based approach include the relatively high accuracy of the data obtained on timing and logical complexity as well as the relatively high degree of assurance that can be obtained on the practical effectiveness of the scheme evaluated. This paper discusses various design issues encountered in the course of establishing the basic microcomputer network testbed facilities and augmenting them to support some experiments conducted. The shortcomings of the testbeds that have been recognized are also discussed together with the desired extensions of the testbeds. Some of the desired extensions are beyond the state of the art in microcomputer network implementation. [ABSTRACT FROM AUTHOR]
Of several schemes proposed to handle the propagation of erroneous information among interacting processes in distributed and parallel computer systems, the distributed real-time conversation (DRC) scheme stands out i...
详细信息
ISBN:
(纸本)0818680474
Of several schemes proposed to handle the propagation of erroneous information among interacting processes in distributed and parallel computer systems, the distributed real-time conversation (DRC) scheme stands out in its fast forward recovery capability which is essential in safety-critical hard-real-time applications. However, previous formulations of the scheme remained at relatively abstract levels and practical models for their implementation in complex safety-critical real-time applications were not established before. The core approach in the DRC scheme is to make a group of computing stations cooperate in recovery from hardware and software faults that may occur during their interaction. In this paper we present a practical implementation model for the DRC scheme. A simple model of an anti-missile defense system is used to illustrate the main structuring principles of the DRC scheme and major components of the practical implementation model.
The basic distributed recovery block (DRB) scheme was formulated earlier as a scheme for achieving task-execution-level real-time fault tolerance. The implementation techniques for the DRB scheme, especially those for...
详细信息
The basic distributed recovery block (DRB) scheme was formulated earlier as a scheme for achieving task-execution-level real-time fault tolerance. The implementation techniques for the DRB scheme, especially those for use in LAN-based systems, are expected to go through continuous refinement and extension in the future. This is partly due to the fact that a higher-fault-coverage real-time fault tolerance scheme can be obtained by combining the DRB scheme with complementary techniques for network diagnosis and reconfiguration and reliable communication. New complementary techniques with such capabilities will continue to emerge at least for the next several years. The purpose of this paper is to present an implementation model that has a modular and easily expandable structure so that incorporation of new complementary mechanisms may be performed with ease. The promising nature of the model has been confirmed during an experimental implementation using both the model and a simple PC-network-based real-time distributed computing testbed.
暂无评论