Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located, and corrected using a policy-driven ...
详细信息
Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located, and corrected using a policy-driven fault management system. Other work in this are to date has focused on network-level faults. We believe that in a distributed system it is more appropriate to focus on faults at the application level. Furthermore, this work has been largely domain specific - a generic, structured approach to this problem is needed. Our work has focused on policy-driven fault management in distributedsystems.at the application level. In this paper, we define a generic architecture for policy-driven fault management, and present a prototype system based on this architecture. We also discuss experience to date using and experimenting with our prototype system.
The authors report a study of the dependability of the various communication topologies that can be used to construct a Delta-4 system. Single and dual bus and ring configurations are possible (based on 802.4, 802.5, ...
详细信息
ISBN:
(纸本)0818622601
The authors report a study of the dependability of the various communication topologies that can be used to construct a Delta-4 system. Single and dual bus and ring configurations are possible (based on 802.4, 802.5, and FDDI standards);the authors give closed-form expressions for the reliability and availability of each topology when repair is taken into account. It is shown that the dimensioning parameter in the dependability of the communication system is the coverage of the self-checking mechanisms built into the network attachment controllers.
The author presents an algorithm for maintaining consistency and improving the performance of databases with replicated data in distributed real-time systems. The semantic information of read-only transactions is used...
详细信息
ISBN:
(纸本)0818608153
The author presents an algorithm for maintaining consistency and improving the performance of databases with replicated data in distributed real-time systems. The semantic information of read-only transactions is used for improved efficiency, and a multiversion technique is used to increase the degree of concurrency. Related issues, including the consistency of the states seen by transactions, version management, and recovery of replicated data in distributedsystems. are discussed.
As softwaredistributed Shared Memory(DSM) systems.become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, we propose a lightweight logging scheme...
详细信息
ISBN:
(纸本)0769520693
As softwaredistributed Shared Memory(DSM) systems.become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, we propose a lightweight logging scheme, called remote logging, and a recovery protocol for home-based DSM. Remote logging stores coherence-related data to the volatile memory of a remote node. The logging overhead can be moderated with high-speed system area network and user-level DMA operations supported by modern communication protocols. Remote logging tolerates multiple failures if the backup nodes of failed nodes are alive. It makes the reliability of DSM grow much higher. Experimental results show that our fault-tolerant DSM has low overhead compared to conventional stable logging and it can be effectively recovered from some concurrent failures.
distributedsystems.verification is one of the main issues in software engineering. It is considered as the major field of the formal specification techniques. However, many difficulties remain. In fact, the principal...
详细信息
ISBN:
(纸本)0769521320
distributedsystems.verification is one of the main issues in software engineering. It is considered as the major field of the formal specification techniques. However, many difficulties remain. In fact, the principal problem is in producing a coherent specification and providing a fully integrated semantics. Since formal methods are mathematical description models that try to give a response concerning the reliability of a system. It remains a hard way for the designers. Thus, we present, in this paper, an open environment for the integration of formal methods in the description and verification of distributed and concurrent systems. The system currently uses UML notation and provides rewriting logic, model checking, theorem proving, and simulation techniques.
HGPSS, a simulation language and environment aimed specifically at distributedsystems. is described. HGPSS is upwardly compatible with GPSS, adding a number of features for the modeling of distributeddatabase system...
详细信息
HGPSS, a simulation language and environment aimed specifically at distributedsystems. is described. HGPSS is upwardly compatible with GPSS, adding a number of features for the modeling of distributeddatabasesystems. The incorporation of these primitives reduces the complexity of the task of the simulation programmer. In addition, HGPSS is a portable system, thus permitting the use of more-powerful processors. HGPSS presents a novel approach to simulation, namely, that of incorporating application-specific functionality into the basic tools. By enriching the simulation language with constructs designed explicitly for an application environment, the task of the modeler can be simplified substantially. Furthermore, for situations in which general algorithmic facilities are necessary, a direct C interface is provided. A software modeling environment for determining the performance of various distributeddatabasesystems.is described, which provides the user with the tools needed to model and analyze such a system.
This paper presents a software modeling environment for estimating the performance of distributeddatabasesystems. This tool supports a simulation language, HGPSS, which comprises various simulation primitives, conta...
详细信息
ISBN:
(纸本)0818619465
This paper presents a software modeling environment for estimating the performance of distributeddatabasesystems. This tool supports a simulation language, HGPSS, which comprises various simulation primitives, contains a collection of network modules, and allows for the collection of statistics. This provides an overview of the HGPSS environment emphasizing its applicability to the modeling of distributeddatabases.
The authors present an election protocol that does not assume an underlying ring structure and that tolerates failures, including lost messages and network partitioning, during the execution of the protocol itself. Th...
详细信息
ISBN:
(纸本)0818608757
The authors present an election protocol that does not assume an underlying ring structure and that tolerates failures, including lost messages and network partitioning, during the execution of the protocol itself. The major problem to be solved is that when nodes cannot communicate with one another or messages are lost, a conflict in resolving the election will often arise. In the authors' approach, the conflict is detected by the cohorts (noncandidate participants in the election). Related election protocols are discussed, and the system model is described together with assumptions about the communication subsystem. The protocol and the lost-message situations are then examined.
The symposium Materials contain 21 papers. The following topics are dealt with: checkpointing and logging algorithms;backward recovery schemes;replication and parallelism;dependability modeling and assessment;agreemen...
详细信息
ISBN:
(纸本)0818622601
The symposium Materials contain 21 papers. The following topics are dealt with: checkpointing and logging algorithms;backward recovery schemes;replication and parallelism;dependability modeling and assessment;agreement;and garbage collection.
Multicast communication in a distributed system connected by a local area network can increase parallelism, and it can also provide a greater functionality than one-to-one communication. In the authors' multicast ...
详细信息
Multicast communication in a distributed system connected by a local area network can increase parallelism, and it can also provide a greater functionality than one-to-one communication. In the authors' multicast protocol, the sender directs a message to a named group of receivers, which can be specified by function without requiring the sender to know the specific members of the group. Each host's kernel in the network can respond to every group message sent, providing various levels of reliability. It was found that the overhead of providing dependable multicast over a single local area network was very small, mainly because the protocol operates at the kernel level rather than the user level. Several forms of this multicast communication, expressed as simple message-passing communication primitives, are described, and the effectiveness of the protocol is evaluated using an example of a distributed algorithm. Performance analyses and actual performance data for the protocol are presented.
暂无评论