The creation, execution, and maintenance of tests are some of the most expensive tasks in software development. To help reduce the cost, automated test generation tools can be used to assist and guide developers in cr...
详细信息
ISBN:
(纸本)9781479971978
The creation, execution, and maintenance of tests are some of the most expensive tasks in software development. To help reduce the cost, automated test generation tools can be used to assist and guide developers in creating test cases. Yet, the tests that automated tools produce range from simple skeletons to fully executable test suites, hence their complexity and quality vary. This paper compares the complexity and quality of test suites created by sophisticated automated test generation tools to that of developer-written test suites. The empirical study in this paper examines ten real-world programs with existing test suites and applies two state-of-the-art automated test generation tools. The study measures the resulting test suite quality in terms of code coverage and fault-finding capability. On average, manual tests covered 31.5% of the branches while the automated tools covered 31.8% of the branches. In terms of mutation score, the tests generated by automated tools had an average mutation score of 39.8% compared to the average mutation score of 42.1% for manually written tests. Even though automatically created tests often contain more lines of source code than those written by developers, this paper's empirical results reveal that test generation tools can provide value by creating high quality test suites while reducing the cost and effort needed for testing.
Many studies suggest using coverage concepts, such as branch coverage, as the starting point of testing, while others as the most prominent test quality indicator. Yet the relationship between coverage and fault-revel...
详细信息
ISBN:
(纸本)9781538638682
Many studies suggest using coverage concepts, such as branch coverage, as the starting point of testing, while others as the most prominent test quality indicator. Yet the relationship between coverage and fault-revelation remains unknown, yielding uncertainty and controversy. Most previous studies rely on the Clean Program Assumption, that a test suite will obtain similar coverage for both faulty and fixed ('clean') program versions. This assumption may appear intuitive, especially for bugs that denote small semantic deviations. However, we present evidence that the Clean Program Assumption does not always hold, thereby raising a critical threat to the validity of previous results. We then conducted a study using a robust experimental methodology that avoids this threat to validity, from which our primary finding is that strong mutation testing has the highest fault revelation of four widely-used criteria. Our findings also revealed that fault revelation starts to increase significantly only once relatively high levels of coverage are attained.
We describe the design, implementation and use of HPC, a tool-kit to record and display Haskell Program coverage. HPC includes tools that instrument Haskell programs to record program coverage, run instrumented progra...
详细信息
ISBN:
(纸本)9781595936745
We describe the design, implementation and use of HPC, a tool-kit to record and display Haskell Program coverage. HPC includes tools that instrument Haskell programs to record program coverage, run instrumented programs, and display information derived from coverage data in various ways.
The regression test selection problem-selecting a subset of a test-suite given a change-has been studied widely over the past two decades. However, the problem has seen little attention when constrained to high-critic...
详细信息
ISBN:
(纸本)9783319390833;9783319390826
The regression test selection problem-selecting a subset of a test-suite given a change-has been studied widely over the past two decades. However, the problem has seen little attention when constrained to high-criticality developments and where a "safe" selection of tests need to be chosen. Further, no practical approaches have been presented for the programming language Ada. In this paper, we introduce an approach to solving the selection problem given a combination of both static and dynamic data for a program and a change-set. We present a change impact analysis for Ada that selects the safe set of tests that need to be re-executed to ensure no regressions. We have implemented the approach in the commercial, unit-testing tool VectorCAST, and validated it on a number of open-source examples. On an example of a fully-functioning Ada implementation of a DNS server (IRONSIDES), the experimental results show a 97% reduction in test-case execution.
code coverage is a method used to gauge the effectiveness of a Test setup for a particular Design. Even with the advent of modern techniques like Functional coverage, code coverage still remains an important cog in th...
详细信息
ISBN:
(纸本)9781467368582
code coverage is a method used to gauge the effectiveness of a Test setup for a particular Design. Even with the advent of modern techniques like Functional coverage, code coverage still remains an important cog in the wheel of SoC Verification cycle. Through code coverage, Design and Verification Engineers get to know about parts of code excited by the Test stimulus and more importantly, parts of code which have not been exercised. code coverage for SoC is usually a long pole in the Verification cycle typically because of huge Simulator Compile and Run times. The paper discusses the traditional SoC coverage closure methodology and proposes an automated methodology to efficiently manage the coverage enabled Simulation runs.
Software testing is an important method to guarantee software quality. For the large-scale complex software, some mistakes or errors will easily be overlooked if programs are detected only by manual work. Therefore, a...
详细信息
ISBN:
(纸本)9781509061617
Software testing is an important method to guarantee software quality. For the large-scale complex software, some mistakes or errors will easily be overlooked if programs are detected only by manual work. Therefore, a full-automatic system is necessary to rapidly cover all program logics through calculation to achieve input and output;besides, the system can assist to generate a large number of test cases before manual intervention, and can find out some software defects to assist manual detection to complete compiling work of all test cases. In this paper, a combination of the static structure of procedure and improved genetic algorithm is proposed in order to implement a fully automatic test case generating technology, enhance the generating efficiency and coverage rate of codes, and also can help to save a lot of time in manual testing.
Fuzzing is one of the most popular and practical techniques for security analysis. In this work, we aim to address the critical problem of high-quality input generation with a novel input-aware fuzzing approach called...
详细信息
ISBN:
(纸本)9798400700507
Fuzzing is one of the most popular and practical techniques for security analysis. In this work, we aim to address the critical problem of high-quality input generation with a novel input-aware fuzzing approach called NESTFUZZ. NESTFUZZ can universally and automatically model input format specifications and generate valid input. The key observation behind NESTFUZZ is that the code semantics of the target program always highly imply the required input formats. Hence, NESTFUZZ applies fine-grained program analysis to understand the input processing logic, especially the dependencies across different input fields and substructures. To this end, we design a novel data structure, namely Input Processing Tree, and a new cascading dependency-aware mutation strategy to drive the fuzzing. Our evaluation of 20 intensively-tested popular programs shows that NestFuzz is effective and practical. In comparison with the state-of-the-art fuzzers (AFL, AFLFast, AFL++, MOpt, AFLSmart, WEIZZ, ProFuzzer, and TIFF), NestFuzz achieves outperformance in terms of both code coverage and security vulnerability detection. NESTFUZZ finds 46 vulnerabilities that are both unique and serious. Until the moment this paper is written, 39 have been confirmed and 37 have been assigned with CVE-ids.
We present a new method for automated test case generation based on symbolic execution and a custom process of interpolation. The method first identifies program execution paths in order to define a corresponding set ...
详细信息
ISBN:
(纸本)9781728117645
We present a new method for automated test case generation based on symbolic execution and a custom process of interpolation. The method first identifies program execution paths in order to define a corresponding set of test inputs. It then annotates the program with assertions so as to identify feasible and infeasible cases, the former of which are processed to produce the desired test inputs. The main contribution is that performing symbolic execution using a custom form of interpolation significantly prunes the search space. Our main result is that the set of Modified Condition/Decision coverage (MC/DC) test cases we produce is optimal.
The research community has long recognized a complex interrelationship between fault detection, test adequacy criteria, and test set size. However, there is substantial confusion about whether and how to experimentall...
详细信息
ISBN:
(纸本)9781450367684
The research community has long recognized a complex interrelationship between fault detection, test adequacy criteria, and test set size. However, there is substantial confusion about whether and how to experimentally control for test set size when assessing how well an adequacy criterion is correlated with fault detection and when comparing test adequacy criteria. Resolving the confusion, this paper makes the following contributions: (1) A review of contradictory analyses of the relationships between fault detection, test adequacy criteria, and test set size. Specifically, this paper addresses the supposed contradiction of prior work and explains why test set size is neither a confounding variable, as previously suggested, nor an independent variable that should be experimentally manipulated. (2) An explication and discussion of the experimental designs of prior work, together with a discussion of conceptual and statistical problems, as well as specific guidelines for future work. (3) A methodology for comparing test adequacy criteria on an equal basis, which accounts for test set size without directly manipulating it through unrealistic stratification. (4) An empirical evaluation that compares the effectiveness of coverage-based testing, mutation-based testing, and random testing. Additionally, this paper proposes probabilistic coupling, a methodology for assessing the representativeness of a set of test goals for a given fault and for approximating the fault-detection probability of adequate test sets.
Recently, many automatic test generation techniques have been proposed, such as Randoop, Pex and jCUTE. However, usually test coverage of these techniques has been around 50-60% only, due to several challenges, such a...
详细信息
ISBN:
(纸本)9781450312042
Recently, many automatic test generation techniques have been proposed, such as Randoop, Pex and jCUTE. However, usually test coverage of these techniques has been around 50-60% only, due to several challenges, such as 1) the object mutation problem, where test generators cannot create and/or modify test inputs to desired object states;and 2) the constraint solving problem, where test generators fail to solve path conditions to cover certain branches. By analyzing branches not covered by state-of-the-art techniques, we noticed that these challenges might not be so difficult for humans. To verify this hypothesis, we propose a Puzzle-based Automatic Testing environment (PAT) which decomposes object mutation and complex constraint solving problems into small puzzles for humans to solve. We generated PAT puzzles for two open source projects and asked different groups of people to solve these puzzles. It was shown that they could be effectively solved by humans: 231 out of 400 puzzles were solved by humans at an average speed of one minute per puzzle. The 231 puzzle solutions helped cover 534 and 308 additional branches (7.0% and 5.8% coverage improvement) in the two open source projects, on top of the saturated branch coverages achieved by the two state-of-the-art test generation techniques.
暂无评论