Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violations are common and important. How to test t...
详细信息
Multicore hardware is making concurrent programs pervasive. Unfortunately, concurrent programs are prone to bugs. Among different types of concurrency bugs, atomicity violations are common and important. How to test the interleaving space and expose atomicity-violation bugs is an open problem. This paper makes three contributions. First, it designs and evaluates a hierarchy of four interleaving coverage criteria using 105 real-world concurrency bugs. This study finds a coverage criterion (Unserializable Interleaving coverage) that balances the complexity and the capability of exposing atomicity-violation bugs well. Second, it studies stress testing to understand why this common practice cannot effectively expose atomicity-violation bugs from the perspective of unserializable interleaving coverage. Third, it designs CTrigger following the unserializable interleaving coverage criterion. CTrigger uses trace analysis to identify feasible unserializable interleavings, and then exercises low-probability interleavings to expose atomicity-violation bugs. We evaluate CTrigger with real-world atomicity-violation bugs from seven applications. CTrigger efficiently exposes these bugs within 1-235 seconds, two to four orders of magnitude faster than stress testing. Without CTrigger, some of these bugs do not manifest even after seven days of stress testing. Furthermore, once a bug is exposed, CTrigger can reliably reproduce it, usually within 5 seconds, for diagnosis.
Graphical user interfaces (GUIs) are used as front ends to most of today's software applications. The event-driven nature of GUIs presents new challenges for testing. One important challenge is test suite reductio...
详细信息
Graphical user interfaces (GUIs) are used as front ends to most of today's software applications. The event-driven nature of GUIs presents new challenges for testing. One important challenge is test suite reduction. Conventional reduction techniques/tools based on static analysis are not easily applicable due to the increased use of multilanguage GUI implementations, callbacks for event handlers, virtual function calls, reflection, and multithreading. Moreover, many existing techniques ignore code in libraries and fail to consider the context in which event handlers execute. Consequently, they yield GUI test suites with seriously impaired fault-detection abilities. This paper presents a reduction technique based on the call-stack coverage criterion. Call stacks may be collected for any executing program with very little overhead. Empirical studies in this paper compare reduction based on call-stack coverage to reduction based on line, method, and event coverage, including variations that control for the size and optional consideration of library methods. These studies show that call-stack-based reduction provides unique trade-offs between the reduction in test suite size and the loss of fault detection effectiveness, which may be valuable in practice. Additionally, an analysis of the relationship between coverage requirements and fault-revealing test cases is presented.
The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all pos...
详细信息
The empirical assessment of test techniques plays an important role in software testing research. One common practice is to seed faults in subject software, either manually or by using a program that generates all possible mutants based on a set of mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults, thus facilitating the statistical analysis of fault detection effectiveness of test suites;however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. Focusing on four common control and data flow criteria ( Block, Decision, C-Use, and P-Use), this paper investigates this important issue based on a middle size industrial program with a comprehensive pool of test cases and known faults. Based on the data available thus far, the results are very consistent across the investigated criteria as they show that the use of mutation operators is yielding trustworthy results: Generated mutants can be used to predict the detection effectiveness of real faults. Applying such a mutation analysis, we then investigate the relative cost and effectiveness of the above-mentioned criteria by revisiting fundamental questions regarding the relationships between fault detection, test suite size, and control/data flow coverage. Although such questions have been partially investigated in previous studies, we can use a large number of mutants, which helps decrease the impact of random variation in our analysis and allows us to use a different analysis approach. Our results are then compared with published studies, plausible reasons for the differences are provided, and the research leads us to suggest a way to tune the mutation analysis process to possible differences in fault detection probabilities in a specific environment.
Functional testing of applications that process the information stored in databases often requires a careful design of the test database. The larger the test database, the more difficult it is to develop and maintain ...
详细信息
Functional testing of applications that process the information stored in databases often requires a careful design of the test database. The larger the test database, the more difficult it is to develop and maintain tests as well as to load and reset the test data. This paper presents an approach to reduce a database with respect to a set of SQL queries and a coverage criterion. The reduction procedures search the rows in the initial database that contribute to the coverage in order to find a representative subset that satisfies the same coverage as the initial database. The approach is automated and efficiently executed against large databases and complex queries. The evaluation is carried out over two real life applications and a well-known database benchmark. The results show a very large degree of reduction as well as scalability in relation to the size of the initial database and the time needed to perform the reduction.
Statistical testing has been shown to be more efficient at detecting faults in software than other methods of dynamic testing such as random and structural testing. test data are generated by sampling from a probabili...
详细信息
Statistical testing has been shown to be more efficient at detecting faults in software than other methods of dynamic testing such as random and structural testing. test data are generated by sampling from a probability distribution chosen so that each element of the software's structure is exercised with a high probability. However, deriving a suitable distribution is difficult for all but the simplest of programs. This paper demonstrates that automated search is a practical method of finding near-optimal probability distributions for real-world programs, and that test sets generated from these distributions continue to show superior efficiency in detecting faults in the software.
暂无评论