testing and debugging the implementation of static analysis is a challenging task, often involving significant manual effort from domain experts in a tedious and unprincipled process. In this work, we propose an appro...
详细信息
ISBN:
(纸本)9781665457019
testing and debugging the implementation of static analysis is a challenging task, often involving significant manual effort from domain experts in a tedious and unprincipled process. In this work, we propose an approach that greatly improves the automation of this process for static analyzers with configuration options. At the core of our approach is the novel adaptation of the theoretical partial order relations that exist between these options to reason about the correctness of actual results from running the static analyzer with different configurations. This allows for automated testing of static analyzers with clearly defined oracles, followed by automated delta debugging, even in cases where ground truths are not defined over the input programs. To apply this approach to many static analysis tools, we design and implement ECSTATIC, an easy-to-extend, open-source framework. We have integrated four popular static analysis tools, SOOT, WALA, DOOP, and FlowDroid, into ECSTATIC. Our evaluation shows running ECSTATIC detects 74 partial order bugs in the four tools and produces reduced bug-inducing programs to assist debugging. We reported 42 bugs;in all cases where we received responses, the tool developers confirmed the reported tool behavior was unintended. So far, three bugs have been fixed and there are ongoing discussions to fix more.
The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the surv...
详细信息
ISBN:
(纸本)9781450370431
The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the abundance of testing and debugging tools for Java. Thus, there is a need to push research on tools that can help Python developers. One factor that contributed to the rapid growth of Java testing and debugging tools is the availability of benchmarks. A popular benchmark is the Defects4J benchmark;its initial version contained 357 real bugs from 5 real-world Java programs. Each bug comes with a test suite that can expose the bug. Defects4J has been used by hundreds of testing and debugging studies and has helped to push the frontier of research in these directions. In this project, inspired by Defects4J, we create another benchmark database and tool that contain 493 real bugs from 17 real-world Python programs. We hope our benchmark can help catalyze future work on testing and debugging tools that work on Python programs.
Aspect Oriented Programming (AOP) advocates the notion of aspects to encapsulate crosscutting concerns. A concern is a behavior in a computer program and is said to be crosscutting if the module(s) that address the be...
详细信息
ISBN:
(纸本)9781467361613
Aspect Oriented Programming (AOP) advocates the notion of aspects to encapsulate crosscutting concerns. A concern is a behavior in a computer program and is said to be crosscutting if the module(s) that address the behavior are scattered and tangled with other modules of the system. In this paper, we investigate the possibility of using AOP for software testing and non-invasive debugging. We have identified some crosscutting concerns like access control, logging, performance, tracing, etc., and developed a framework based on java bytecode instrumentation technique to inject these crosscutting concerns into the compiled code. The framework is available as a service, i.e., the service takes the required code as input and produces the changed code as output that contains the appropriate aspects. Thus, it is argued that the framework can be the basis for implementing the notion of "Enabling testing as a Service".
We present an integrated method for program proving, testing, and debugging. Using the concept of metamorphic relations, we select necessary properties for target programs. For programs where global symbolic evaluatio...
详细信息
We present an integrated method for program proving, testing, and debugging. Using the concept of metamorphic relations, we select necessary properties for target programs. For programs where global symbolic evaluation can be conducted and the constraint expressions involved can be solved, we can either prove that these necessary conditions for program correctness are satisfied or identify all inputs that violate the conditions. For other programs, our method can be converted into a symbolic-testing approach. Our method extrapolates from the correctness of a program for tested inputs to the correctness of the program for related untested inputs. The method supports automatic debugging through the identification of constraint expressions that reveal failures.
GenProg implemented a novel method for automatically evolving patches to repair test suite failures in legacy C programs. It combined insights from genetic programming and software engineering. Many of the original de...
详细信息
GenProg implemented a novel method for automatically evolving patches to repair test suite failures in legacy C programs. It combined insights from genetic programming and software engineering. Many of the original design decisions in GenProg were ultimately less important than its impact as an existence proof. In particular, it demonstrated that useful patches for non-trivial bugs and programs could be generated automatically. Since the original publication, research in automated program repair has expanded to consider and evaluate many new methods, contexts and defects. As code synthesis and debugging techniques based on machine learning have become popular, it is informative to consider how views on perennial issues in program repair have changed, or remained static, over time. This retrospective discusses the issues of repair quality (including the role of tests), use cases for automated repairs (including the role of humans), and why these approaches work at all.
Ensuring the long-term reproducibility of data analyses requires results stability tests to verify that analysis results remain within acceptable variation bounds despite inevitable software updates and hardware evolu...
详细信息
Ensuring the long-term reproducibility of data analyses requires results stability tests to verify that analysis results remain within acceptable variation bounds despite inevitable software updates and hardware evolutions. This paper introduces a numerical variability approach for results stability tests, which determines acceptable variation bounds using random rounding of floating-point calculations. By applying the resulting stability test to fMRIPrep, a widely-used neuroimaging tool, we show that the test is sensitive enough to detect subtle updates in image processing methods while remaining specific enough to accept numerical variations within a reference version of the application. This result contributes to enhancing the reliability and reproducibility of data analyses by providing a robust and flexible method for stability testing.
In recent years, to maximize the value of software testing and analysis, we have proposed the methodology of cooperative software testing and analysis (in short as cooperative testing and analysis) to enable testing...
详细信息
In recent years, to maximize the value of software testing and analysis, we have proposed the methodology of cooperative software testing and analysis (in short as cooperative testing and analysis) to enable testing and analysis tools to cooperate with their users (in the form of tool-human cooperation), and enable one tool to cooperate with another tool (in the form of tool-tool cooperation). Such cooperations are motivated by the observation that a tool is typically not powerful enough to address complications in testing or analysis of complex real-world software, and the tool user or another tool may be able to help out some problems faced by the tool. To enable tool-human or tool-tool cooperation, effective mechanisms need to be developed 1) for a tool to communicate problems faced by the tool to the tool user or another tool, and 2) for the tool user or another tool to assist the tool to address the problems. Such methodology of cooperative testing and analysis forms a new research frontier on synergistic cooperations between humans and tools along with cooperations between tools and tools. This article presents recent example advances and challenges on cooperative testing and analysis.
Many software systems are developed in a number of consecutive releases. In each release not only new code is added but also existing code is often modified. In this study we show that the modified code can be an impo...
详细信息
Many software systems are developed in a number of consecutive releases. In each release not only new code is added but also existing code is often modified. In this study we show that the modified code can be an important source of faults. Faults are widely recognized as one of the major cost drivers in software projects. Therefore, we look for methods that improve the fault detection in the modified code. We propose and evaluate a number of prediction models that increase the efficiency of fault detection. To build and evaluate our models we use data collected from two large telecommunication systems produced by Ericsson. We evaluate the performance of our models by applying them both to a different release of the system than the one they are built on and to a different system. The performance of our models is compared to the performance of the theoretical best model, a simple model based on size, as well as to analyzing the code in a random order (not using any model). We find that the use of our models provides a significant improvement over not using any model at all and over using a simple model based on the class size. The gain offered by our models corresponds to 38-57% of the theoretical maximum gain.
The increasing reliance put on networked computer systems demands higher levels of dependability. This is even more relevant as new threats and forms of attack are constantly being revealed, compromising the security ...
详细信息
The increasing reliance put on networked computer systems demands higher levels of dependability. This is even more relevant as new threats and forms of attack are constantly being revealed, compromising the security of systems. This paper addresses this problem by presenting an attack injection methodology for the automatic discovery of vulnerabilities in software components. The proposed methodology, implemented in AJECT, follows an approach similar to hackers and security analysts to discover vulnerabilities in network-connected servers. AJECT uses a specification of the server's communication protocol and predefined test case generation algorithms to automatically create a large number of attacks. Then, while it injects these attacks through the network, it monitors the execution of the server in the target system and the responses returned to the clients. The observation of an unexpected behavior suggests the presence of a vulnerability that was triggered by some particular attack (or group of attacks). This attack can then be used to reproduce the anomaly and to assist the removal of the error. To assess the usefulness of this approach, several attack injection campaigns were performed with 16 publicly available POP and IMAP servers. The results show that AJECT could effectively be used to locate vulnerabilities, even on well-known servers tested throughout the years.
Given a distributed computation and a global predicate, predicate detection involves determining whether there exists at least one consistent cut ( or global state) of the computation that satisfies the predicate. On ...
详细信息
Given a distributed computation and a global predicate, predicate detection involves determining whether there exists at least one consistent cut ( or global state) of the computation that satisfies the predicate. On the other hand, computation slicing is concerned with computing the smallest subcomputation ( with the least number of consistent cuts) that contains all consistent cuts of the computation satisfying the predicate. In this paper, we investigate the relationship between predicate detection and computation slicing and show that the two problems are actually equivalent. Specifically, given an algorithm to detect a predicate b in a computation C, we derive an algorithm to compute the slice of C with respect to b. The time complexity of the ( derived) slicing algorithm is O(n\E\T), where n is the number of processes, E is the set of events, and O(T) is the time complexity of the detection algorithm. We discuss how the "equivalence" result of this paper can be utilized to derive a faster algorithm for solving the general predicate detection problem in many cases. Slicing algorithms described in our earlier papers are all offline in nature. In this paper, we also present two online algorithms for computing the slice. The first algorithm can be used to compute the slice for a general predicate. Its amortized time complexity is O(n(c + n)T) per event, where c is the average concurrency in the computation and O(T) is the time complexity of the detection algorithm. The second algorithm can be used to compute the slice for a regular predicate. Its amortized time complexity is only O(n(2)) per event.
暂无评论