An efficient reliability prediction method for ann-version fault tolerant software system with S stages and an M-of-n voting mechanism is developed. Our model takes into account the dependence of failure behavior amo...
详细信息
An efficient reliability prediction method for ann-version fault tolerant software system with S stages and an M-of-n voting mechanism is developed. Our model takes into account the dependence of failure behavior among successive stages, as well as correlated failure behavior of modules at the same stage. It is shown that the relibility of such a system can be evaluated stage by stage, and that if failure correlations among program modulrs are modeled by reliability intensity parameters with Beta distributions, then the time complexity of the proposed procedure is O(Sn2).
We present a comparison of correlated Failures for multiversion software using community error recovery (CER) and software breeding (SB). In CER, errors are detected and recovered at checkpoints which are inserted in ...
详细信息
We present a comparison of correlated Failures for multiversion software using community error recovery (CER) and software breeding (SB). In CER, errors are detected and recovered at checkpoints which are inserted in all the versions of the software. SB is analogous to the breeding of plants and animals. In SB, versions consist of loadable modules, and a driver exchanges the modules betweenversions to detect and eliminate faulty modules. We formulate reliability models to estimate the probability of failure for software using either CER or SB. Our reliability models assume failures in the checkpoints in CER and the driver in SB. We use beta-binomial distribution for modeling correlated failures of versions, because much of the evidence suggests that the assumption that Failures inversions occur independently is not always true. Our comparison indicates that multiversion software using SE is more reliable than that using CER when the probability of failure in the checkpoints in CER or the driver in SB is 10(-7).
As computing capabilities continue to advance, there will be a concurrent rise in the number of both hardware and software faults. These will be caused by the greater volume of more complex software, by the increased ...
详细信息
As computing capabilities continue to advance, there will be a concurrent rise in the number of both hardware and software faults. These will be caused by the greater volume of more complex software, by the increased number of untested software states, and by more incidents of hardware/software interaction faults as a result of increased hardware speed and density. The traditional software implemented fault tolerance approaches have been successfully utilized in life-critical systems, such as digital flight controls, where their additional costs can be easily justified. Examples include n-version programming and Recovery Block approaches. However, there is still a need for dependable computing for mission-critical applications as well. Often, these traditional techniques are avoided for mission-critical systems due to the difficulty in justifying their extra up-front development cost. We provide an alternative for the high ''sunk cost'' of traditional software fault tolerance techniques. The methodology, called Data Fusion Integrity Processes (DFIPs), is a simple, yet effective technique for mission-critical systems. In addition, the approach establishes a framework from which other costlier, more extensive traditional techniques can be added. We present details of the DFIP methodology and a DFIP framework for Ada programs. We also briefly discuss development of a DFIP code generation system which exploits Java that will enable users to quickly build a DFIP framework in Ada, and select reusable DFIP component methods.
Fault tolerant software uses redundancy to improve reliability;but such redundancy requires additional resources and tends to be costly, therefore the redundancy level needs to be optimized. Our optimization models de...
详细信息
Fault tolerant software uses redundancy to improve reliability;but such redundancy requires additional resources and tends to be costly, therefore the redundancy level needs to be optimized. Our optimization models determine the optimal level of redundancy within a software system under the assumption that functionally equivalent software components fail independently. A framework illustrates the tradeoff between the cost of using n-version programming and the improved reliability for a software system. The 2 models deal with: a single task, and multitask software. These software systems consist of several modules where each module performs a subtask and, by sequential execution of modules, a major task is performed. Major assumptions are: several versions of each module, each with an estimated cost & reliability, are available these module versions fail independently. Optimization models are used to select the optimal set of versions for each module such that the system reliability is maximized and total cost remains within budget.
A cost model determines system costs for fault-tolerant software systems, The model finds the optimal number of program versions to achieve minimum system cost of the fault-tolerant software techniques: n-version Prog...
详细信息
A cost model determines system costs for fault-tolerant software systems, The model finds the optimal number of program versions to achieve minimum system cost of the fault-tolerant software techniques: n-version programming, Recovery Block, and Consensus Recovery Block, In this case, all versions, the voter, and the acceptance test have the same reliability, When the parameters for the versions, acceptance test, and voter in the cost function are all equal, the cost of a 3-version system is always optimal, Cost(CRB) much less than Cost(RB) much less than Cost(nVP) for each target reliability differing by as much as two orders of magnitude in some cases, The cost functions were increasing functions of n. When the parameters are not equal, optimality occurred for other values of n. This was especially the case when the cost exponent for version-1 was larger than the exponents for the other versions and the acceptance test or voter, As the values of the cost exponents for the version reliabilities become larger, a smaller difference was required between the version-1 exponent and the others to produce alternate optima.
To encourage a practical application of the n-version programming (nVP) technique, a design paradigm was proposed and applied in a Six-language Project. The design paradigm improved the development effort of the n-Ver...
详细信息
To encourage a practical application of the n-version programming (nVP) technique, a design paradigm was proposed and applied in a Six-language Project. The design paradigm improved the development effort of the n-version Software (nVS);however, there were some deficiencies of the design paradigm which led to the leak of a pair of coincident faults. This paper reports on a similar project that used a revised nVP design paradigm. This project reused the revised specification of a real, automatic airplane-landing problem, and involved 40 students at the University of Iowa and the Rockwell International. Guided by the refined nVS development paradigm, the students formed 15 independent programming teams to design, program, test, and evaluate the application. The paper identifies & presents: the impact of the paradigm on the software development process;the improvement of the resulting nVS product;the insight, experience, and learning in conducting this project;various testing procedures applied to the program versions;several quantitative measures of the resulting nVS product;and some comparisons with previous projects. The effectiveness of our revised nVP design paradigm in improving software reliability by the provision of fault tolerance is demonstrated. We found that no single software engineering experiment or product can make revolutionary changes to software development practices overnight. Instead, modern software engineering techniques evolve through the refinement of software development processes. This is true for fault-tolerant software techniques. Without a paradigm to guide the development and evaluation of nVS, software projects by nature can get out of control easily. The n-version programming design paradigm offers a documented process model which is subject to readjustment, tailoring, refinement, and improvement. Compared to previous nVS projects, this project (based on this evolving paradigm) confirmed that nVS product improvement could come largely
This paper describes the software testing and analysis tool, ''ATAC (Automatic Test Analysis for C)'', developed as a research instrument at Bellcore to measure the effectiveness of testing data. It is...
详细信息
This paper describes the software testing and analysis tool, ''ATAC (Automatic Test Analysis for C)'', developed as a research instrument at Bellcore to measure the effectiveness of testing data. It is also a tool to facilitate the design and evaluation of test cases during software development. To demonstrate the capability and applicability of ATAC, we obtained 12 program versions of a critical industrial application developed in a recent university/industry n-version Software project, and used ATAC to analyze and compare coverage of the testing on the program versions. Preliminary results from this investigation show that ATAC is a powerful testing tool to provide testing metrics and quality control guidance for the certification of high quality software components or systems. In using ATAC to derive high quality test data, we assume that a good test has a high data-flow coverage score. This hypothesis requires that we show that good data-flow testing implies good software, viz, software with higher reliability, One would hope, for example, that code tested to 85% c-uses coverage would have a lower field-failure rate than similar code tested to 20% c-uses coverage. The establishment of a correlation between good data-flow testing and a low (or zero) rate of field failures is the ultimate and critical test of the usefulness of data-flow coverage testing. We demonstrated by ATAC that the 12 program versions obtained from the U. of Iowa and Rockwell nVS project (a project that has been subjected to a stringent design, implementation, and testing procedure) had very high testing coverage scores of blocks, decisions, c-uses, and p-uses. Results from the field testing (in which only one fault was found) confirmed this belief. The ultimate question that we hope ATAC can help us ans,ver is a typical question for all software reliability engineers: ''When is a program considered acceptable?'' Software reliability analysts have proposed several models to answer this questi
Various fault-tolerant software techniques have been proposed in order to meet the reliability requirements of critical systems. This paper evaluates 4 implementations of fault-tolerant software techniques with respec...
详细信息
Various fault-tolerant software techniques have been proposed in order to meet the reliability requirements of critical systems. This paper evaluates 4 implementations of fault-tolerant software techniques with respect to hardware and design faults. Project participants were divided into 4 groups, each of which developed fault-tolerant software based on a common specification. Each group applied one of the following techniques: n-version programming, recovery block, concurrent error-detection, and algorithm-based fault tolerance. Independent testing and modeling groups within the project then thoroughly analyzed the fault-tolerant software. Using fault-injection tools, the testing group subjected the fault-tolerant software to simulated design and hardware faults. Simulated design-faults included control flow, array boundary, computational, and post/pre increment/decrement software mutations. Simulated hardware-faults included code and data corruption. Data collected from the fault-injection experiment were then mapped into a discrete-time Markov model developed by the modeling group. Based on this model, the effectiveness of each implementation of the fault-tolerant software technique with respect to availability, correctness, and time to failure given an error, is contrasted with measured data. Finally, the model is analyzed with respect to additional figures of merit identified during the modeling process, and the techniques are ranked using an application taxonomy.
A database system must provide timely and accurate outputs. However, it is a well-known fact that application programs typically contain errors, so that it is not always possible to meet these criteria. An important t...
详细信息
A database system must provide timely and accurate outputs. However, it is a well-known fact that application programs typically contain errors, so that it is not always possible to meet these criteria. An important technique that helps combat design and programming errors is n-version programming where independently-developed versions of a program are executed, and a voting algorithm is used to determine the output. This paper addresses concurrency control issues that come into play whenn-version programming is employed for building reliable database systems. We show that existing correctness criteria and algorithms for concurrency control are insufficient because a system with multiple versions violates some of the basic assumptions of traditional concurrency control theory. To handle multiple versions that may involve versions with bugs, we develop two notions of correctness. By extending the well-known concurrency control algorithm, 2 PL, we also develop two algorithms to meet these criteria. While the first correctness criterion makes stronger assumptions than the second one on the correctness of the multiple versions, the concurrency control algorithm to meet that criterion is also more efficient, thereby permitting higher throughput.
Reliability is an important concern in the development of software for modern systems. The authors have performed a study that compares two major approaches to the improvement of software-software fault elimination an...
详细信息
Reliability is an important concern in the development of software for modern systems. The authors have performed a study that compares two major approaches to the improvement of software-software fault elimination and software fault tolerance-by examination of the fault detection (and tolerance where applicable) of five techniques: run-time assertions, multiversion voting, functional testing augmented by structural testing, code reading by stepwise abstraction, and static data-flow analysis. The study focused on characterizing the sets of faults detected by the techniques and on characterizing the relationships between these sets of faults. Two categories of questions were investigated: 1) comparisons between fault-elimination and fault-tolerance techniques and 2) comparisons among various testing techniques. The results provide information useful for making decisions about the allocation of project resources, point out strengths and weaknesses of the techniques studied, and suggest directions for future research.
暂无评论