Recovery provisions in a distributed system are considered and issues of the reliability of software design are examined. As there is no generally valid system for recovery provision design, the provisions are reviewe...
详细信息
Recovery provisions in a distributed system are considered and issues of the reliability of software design are examined. As there is no generally valid system for recovery provision design, the provisions are reviewed. The cost of testing, documentation, operator training, and interface administration that are required for proper operation of the provisions are considered.
Many of the special problems in distributed computing relate to the handling of exceptional conditions. In a distributed program exceptions occur as a result of transmission errors and partial failures. Any exceptiona...
详细信息
Many of the special problems in distributed computing relate to the handling of exceptional conditions. In a distributed program exceptions occur as a result of transmission errors and partial failures. Any exceptional condition that arises must be handled if distributed programs are to be robust. Various approaches are examined towards providing exception handling mechanisms for distributed applications which were incorporated into several experimental distributed operating systems. These operating systems.all support the notion that the primary software structuring tool for applications will be a collection of cooperating programs (processes) mapped onto a set of loosely coupled processors.
The design of a distributed processing system must include methods to handle distributed data retrieval. A considerable amount of research has been devoted to the development of algorithms that provide this function. ...
详细信息
The design of a distributed processing system must include methods to handle distributed data retrieval. A considerable amount of research has been devoted to the development of algorithms that provide this function. A survey of this research is presented and a taxonomy is introduced that highlights the significant differences among the algorithms.
In the fault-tolerant distributed processing systems.some failures may be still considered due to late failure detection and/or to transmission delays. The failures may be caused by both hardware or software. A method...
详细信息
In the fault-tolerant distributed processing systems.some failures may be still considered due to late failure detection and/or to transmission delays. The failures may be caused by both hardware or software. A method is introduced that bufferizes the information before using it and determines where the information may be used. The operation of a telephone system is used to illustrate this method used in duplicate data recovery.
作者:
Kim, K.H.Univ of South Florida
Dep of Computer Science & Engineering Tampa FL USA Univ of South Florida Dep of Computer Science & Engineering Tampa FL USA
One of the frequently advocated advantages of distributed computing systems.over centralized computing systems.is the improved system reliability potential. Although the application of distributed computing is current...
详细信息
One of the frequently advocated advantages of distributed computing systems.over centralized computing systems.is the improved system reliability potential. Although the application of distributed computing is currently expanding at a rapid rate, the realization of its full reliability potential still requires more fresh solutions and further understanding of many design problems. The nature of some of those design issues are briefly discussed. In order to help preventing misinterpretations while maintaining abstract tones in presentation of research issues, a model of recoverable distributed computing system structure is presented. Discussed are: error detection, hardware and software reconfiguration, the degree of coordinating distributed processes for error detection and recovery;real-time recovery and software engineering tools.
The initial design of three modules of DDTS (distributeddatabase Testbed System) is presented. The DDTS emphasizes modularity and independence of modules so that it may be used to experimentally study the effects of ...
详细信息
The initial design of three modules of DDTS (distributeddatabase Testbed System) is presented. The DDTS emphasizes modularity and independence of modules so that it may be used to experimentally study the effects of different algorithms at each module. DDTS architecture and transactions are considered, with special attention to information architecture (IA) and system architecture (SA).
作者:
Seifert, Manfred H.IBM Germany
Heidelberg Scientific Cent Heidelberg West Ger IBM Germany Heidelberg Scientific Cent Heidelberg West Ger
A characteristic feature of the dynamic structure of the distributedsoftware is, that management functions as well as application functions are carried out by parallel and interacting processes. A set of such interac...
详细信息
A characteristic feature of the dynamic structure of the distributedsoftware is, that management functions as well as application functions are carried out by parallel and interacting processes. A set of such interacting and belonging together processes is called a distributed process system. The application and system programs are defined and the structural-redundant process is explained including functional redundancy. User and manager process systems.are also considered in the architecture of fault-tolerant software.
作者:
Segall, ZaryCarnegie-Mellon Univ
Computer Science Dep Pittsburgh PA USA Carnegie-Mellon Univ Computer Science Dep Pittsburgh PA USA
The ultimate test of the efficiency of mechanisms and policies employed to achieve increased performance and/or reliability in a distributed system, is provided by the evaluation of measurements taken from the real sy...
详细信息
The ultimate test of the efficiency of mechanisms and policies employed to achieve increased performance and/or reliability in a distributed system, is provided by the evaluation of measurements taken from the real system. Experimentation with multiprocessor is considered. The concept of an Integrated Instrumentation Environment (IIE) is introduced as a structured approach to facilitate the process of experimentation. The design presented emphasizes the integration of instrumentation tools such as stimulus generation and monitoring into a unified experiment management environment. An experiment schema is introduced as an appropriate structuring concept for experiment management purposes. Schema instances are introduced to capture the results of an experiment for later analysis.
The concept of atomic transactions has been used to provide reliable processing in both centralized and distributedsystems. An extension of traditional atomic transactions is presented: nested transactions. Nested tr...
详细信息
The concept of atomic transactions has been used to provide reliable processing in both centralized and distributedsystems. An extension of traditional atomic transactions is presented: nested transactions. Nested transactions are seen to permit safe concurrency within as well as among transactions, and to enable transactions to fail partially in a graceful and controlled manner. These properties of nested transactions suit them to a number of distributed applications. Examples of such applications are described.
Task and file allocation are examined in two classes of fault-tolerant distributedsystems. The task allocation problem arises in software-implemented fault tolerance (SIFT)-like systems. while the file allocation pro...
详细信息
Task and file allocation are examined in two classes of fault-tolerant distributedsystems. The task allocation problem arises in software-implemented fault tolerance (SIFT)-like systems. while the file allocation problem arises in Ethernet-like systems. Both problems may be formulated as a constrained sum of squares minimization problem. The computational complexity of these problems prompts us to consider an efficient approximation algorithm that does not always yield optimal answers. It is shown that the ratio of the approximate to the optimal solution is bounded by 9m/8(m minus r plus 1), where m is the number of processors (file servers) to be allocated and r is the number of times each task (file) is to be replicated. Experience with the algorithm suggests that ever better performance ratios can be expected.
暂无评论