This paper is to present a systematic problem solving approach, which is based on the Failure Modes and Effects Analysis (FMEA), to system softwarereliability. This approach will practically: (a) Ensure that all of c...
详细信息
ISBN:
(纸本)0780366158
This paper is to present a systematic problem solving approach, which is based on the Failure Modes and Effects Analysis (FMEA), to system softwarereliability. This approach will practically: (a) Ensure that all of conceivable failure modes and their effects on operational success of the software system have been considered. (b) List potential failures, and identify the magnitude of their effects. (c) Develop criteria for test planning, design of the tests, and checkout systems.(e.g., logging mechanism). (d) Provide a basis for quantitative reliability and availability analysis. (e) Provide a basis for establishing corrective action priorities. This approach was created for softwarereliability analysis and testing in the Multimedia Digital Distribution System (MDDS) at Thomson-CSF Sextant In-Flight systems. First it was used to improve the softwarereliability for the Communication Control Unit (CCU) subsystem of the MDDS, and then globally applied to the softwarereliability analysis and improvement for the whole MDDS. It has been proven to be an effective and efficient approach to system softwarereliability.
作者:
Leitner, GeraldColumbia Univ
Dep of Computer Science New York NY USA Columbia Univ Dep of Computer Science New York NY USA
Many distributed applications are structured as a set of client processes generating requests for service, and a set of server processes servicing these requests. The common form of interprocess communication (IPC) be...
详细信息
ISBN:
(纸本)0818605642
Many distributed applications are structured as a set of client processes generating requests for service, and a set of server processes servicing these requests. The common form of interprocess communication (IPC) between two single processes does not provide adequate support for robustness and throughput optimization. The author presents a stylized form of IPC between a group of client processes and a group of server processes. The choice of a particular server process to service a request is made dynamically at run-time. This dynamic task distribution provides the kernel support needed to make individual failures of server processes transparent and to optimize system performance through load leveling. The semantics of stylized IPC and the high-level protocol to implement it are described. Particular emphasis is placed on minimizing the performance overhead incurred by this mechanism.
software FMEA is a means to determine whether any single failure in computer software can cause catastrophic system effects, and additionally identifies other possible consequences of unexpected software behavior. The...
详细信息
ISBN:
(纸本)0780377176
software FMEA is a means to determine whether any single failure in computer software can cause catastrophic system effects, and additionally identifies other possible consequences of unexpected software behavior. The procedure described here was developed and used to analyze mission- and safety-critical softwaresystems. The procedure includes using a structured approach to understanding the subject software, developing rules and tools for doing the analysis as a group effort with minimal data entry and human error, and generating a final report. software FMEA is a kind of implementation analysis that is an intrinsically tedious process but database tools make the process reasonably painless, highly accurate, and very thorough. The main focus here is on development and use of these database tools.
The proceeding contains 21 papers. The following topics are dealt with: recovery in distributedsystems.managing replication and network partition;fault-tolerance techniques;fault-tolerant protocols;voting and fault d...
详细信息
ISBN:
(纸本)0818608757
The proceeding contains 21 papers. The following topics are dealt with: recovery in distributedsystems.managing replication and network partition;fault-tolerance techniques;fault-tolerant protocols;voting and fault diagnosis;experimental systems.and, consistency maintenance.
Fault-tolerant distributed algorithms that are designed to reach agreement have been the subject of a great deal of recent study, primarily focussed on the Byzantine agreement paradigm. The author explores new paradig...
详细信息
ISBN:
(纸本)0818606908
Fault-tolerant distributed algorithms that are designed to reach agreement have been the subject of a great deal of recent study, primarily focussed on the Byzantine agreement paradigm. The author explores new paradigms and problems that arise in the context of maintaining agreement, rather than reaching agreement in an isolated instance. The emphasis is on open problem areas rather than on specific solutions.
In this paper we describe an infrastructure that provides increased reliability for three-tier applications, transparently, using commercial off-the-shelf application servers and databasesystems. In this infrastructu...
详细信息
In this paper we describe an infrastructure that provides increased reliability for three-tier applications, transparently, using commercial off-the-shelf application servers and databasesystems. In this infrastructure the application servers are actively replicated to protect the business logic processing. Replicating the transaction coordinator renders the two-phase commit protocol non-blocking and thus, avoids potentially long service disruptions caused by coordinator failure. A thin interpositioning library provides client-side automatic failover, so that clients know the outcome of their requests. The interaction between the application servers and the database servers is handled through replicated gateways that prevent duplicate requests from reaching the database servers. Aborted transactions, caused by process or communication faults, are automatically retried on the client's behalf.
The Byzantine Generals problem involves a system of N processes, t of which may be unreliable. The problem is for the reliable processes to agree on a binary value sent by a 'general' which may itself be one o...
详细信息
The Byzantine Generals problem involves a system of N processes, t of which may be unreliable. The problem is for the reliable processes to agree on a binary value sent by a 'general' which may itself be one of the N processes. If the general sends the same value to each process, then all reliable processes must agree on that value but in any case, they must agree on the same value. An explicit solution is given for a binary value among N equals 3t plus 1 processes, using 2t plus 4 rounds and O(t**3 log t) message bits, where t bounds the number of faulty processes. This solution is easily extended to the general case of N greater than equivalent to 3t plus 1 to give a solution using 2t plus 5 rounds and O(tN plus t**3 log t) message bits.
This paper presents four models to demonstrate our techniques for optimizing software and hardware reliability for fault-tolerant distributedsystems. The models help us find the optimal system structure while conside...
详细信息
ISBN:
(纸本)0780366158
This paper presents four models to demonstrate our techniques for optimizing software and hardware reliability for fault-tolerant distributedsystems. The models help us find the optimal system structure while considering basic information on reliability and cost of the available software and hardware components. Each model is suitable for a distinct set of conditions or situations. All four models maximize reliability while meeting cost constraints. The Simulated Annealing optimization algorithm is selected to demonstrate system reliability optimization techniques for distributedsystems.because of its flexibility in applying to various problem types with various constraints, as well as its efficiency in computation time. It provides satisfactory reliability results while meeting the constraints.
Existing IEEE softwarereliability standards do not address the characteristics of distributedsystems. including client-server systems. Furthermore, these standards were issued before the widespread application of CO...
详细信息
Existing IEEE softwarereliability standards do not address the characteristics of distributedsystems. including client-server systems. Furthermore, these standards were issued before the widespread application of COTS and safety-critical systems. In addition, these standards do not take into account the influence on reliability of such process improvement measures as inspections, reuse, and object-oriented design paradigms. Lastly, these standards do not consider both hardware and softwarereliability nor do they include availability and maintainability. To be of value, the next generation of dependability standards must address these deficiencies. With the active participation of the audience, the panel will identify and debate the future direction of dependability standards.
Middleware-based database replication approaches have emerged in the last few years as an alternative to traditional database replication implemented within the database kernel. A middleware approach enables third par...
详细信息
ISBN:
(纸本)0769526772
Middleware-based database replication approaches have emerged in the last few years as an alternative to traditional database replication implemented within the database kernel. A middleware approach enables third party vendors to provide high availability solutions, a growing practice nowadays in the software industry However, middleware solutions often lack scalability and exhibit a number of consistency and performance issues. The reason is that in most cases the middleware has to handle the database as a black box, and hence, cannot take advantage of the many optimizations implemented in the database kernel. Thus, middleware solutions often reimplement key functionality but cannot achieve the same efficiency as a kernel implementation. Reflection has been proposed during the last decade as a fruitful paradigm to separate non-functional aspects from functional ones, simplifying software development and maintenance whilst fostering reuse. However fully reflective databases are not feasible due to the high cost of reflection. Our claim is that by exposing some minimal database functionality through a lightweight reflective interface, efficient and scalable middleware database replication can be attained. In this paper we explore a wide variety of such lightweight reflective interfaces and discuss what kind of replication algorithms they enable. We also discuss implementation alternatives for some of these interfaces and evaluate their performance.
暂无评论