The competitive game between agents exists in many critical applications, such as military unmanned aerial vehicles. It is urgent to test these agents to reduce the significant losses caused by their failures. Existin...
详细信息
The competitive game between agents exists in many critical applications, such as military unmanned aerial vehicles. It is urgent to test these agents to reduce the significant losses caused by their failures. Existing studies mainly are to construct a testing agent that competes with the target agent to induce its failures. These approaches usually focus on a single task, requiring much more time for multi-task testing. However, if the previously tested tasks (source tasks) and the task to be tested (target task) share similar agents or task objectives, the transferable knowledge in source tasks can potentially increase the effectiveness of testing in the target task. We propose Demo2Test for conducting transfer testing of agents in the competitive environment, i.e., leveraging the demonstrations of failure scenarios from the source task to boost the testing effectiveness in the target task. It trains a testing agent with demonstrations and incorporates the action perturbation at key states to balance the number of revealed failures and their diversity. We conduct experiments in the simulated robotics competitive environments of MuJoCo. The results indicate that Demo2Test outperforms the best-performing baseline with improvements ranging from 22.38% to 87.98%, and 12.69% to 60.98%, in terms of the number and diversity of discovered failure scenarios, respectively.
software developers can only obtain a very small amount of information from the individual failure-causing inputs, which makes debugging difficult. Therefore, it is necessary to explore additional failure-causing inpu...
详细信息
software developers can only obtain a very small amount of information from the individual failure-causing inputs, which makes debugging difficult. Therefore, it is necessary to explore additional failure-causing inputs (failure regions) using the known failure-causing inputs. In order to accurately and efficiently identify the failure region, we propose a novel two-stage search algorithm, TS-FRI. In the initial exploration stage, a round-robin search identifies several boundary failure-causing points, and the failure region's centroid is estimated. During the main search stage, the boundary failure-causing points are identified through iterative division of the input domain with an equally sized partitioning strategy. This results in the boundary points being as dispersed as possible around the failure-region boundary, with the polytope formed by the points approximating the failure region (e.g., a polygon in two dimensions). The proposed algorithm is validated through simulation and empirical analysis: The experimental results show that the TS-FRI accuracy is at least comparable to the best accuracy of the compared three algorithms, and can be ten times better. In addition, TS-FRI only takes a quarter of the computation time and half the failure-validation cost of the other algorithms.
We describe and evaluate spectrum -based methods aimed at finding faults in context -free grammars. In their basic form, they take as input a test suite and a parser for the grammar that is modified to collect grammar...
详细信息
We describe and evaluate spectrum -based methods aimed at finding faults in context -free grammars. In their basic form, they take as input a test suite and a parser for the grammar that is modified to collect grammar spectra (i.e., the sets of grammar elements used in attempts to parse the individual test cases), and return as output a ranked list of suspicious elements. We define grammar spectra suitable for localizing faults on the level of the grammar rules (i.e., rule spectra) and the rules' individual symbols (i.e., item spectra), respectively. We show how both types of grammar spectra can be collected by both LL and LR parsers, and how the JavaCC, ANTLR, and CUP parser generators can be modified and used to automate the collection of the grammar spectra. We also show how grammar spectra can be synthesized directly from test cases derived from a grammar, and how such synthetic spectra can be used to localize differences between a grammar and a black -box system under test. We first evaluate our approach over a large number of medium-sized single fault grammars, which we constructed by fault seeding from a common origin grammar. At the rule level, it ranks the rules containing the seeded faults within the top five rules in about 40%-70% of the cases, depending on the applied parsing technique, test suite, and ranking metric, and pinpoints them (i.e., correctly identifies them as unique most suspicious rule) in about 10%-30% of the cases, with significantly better results for the synthetic spectra. At the item level, our approach remains remarkably effective despite the larger number of possible locations, provided it is coupled with a simple tie -breaking strategy that prefers items with the right -most designated position over other items from the same rules in a tie. It typically ranks the seeded faults within the top five positions in about 30%-60% of the cases, and pinpoints them in about 15%-40% of the cases. This specialized item -level localization also
In grammar -based testing, the test suites that drive the system under test are typically constructed from a given context -free grammar through a set of derivations that jointly satisfy some coverage criterion. In th...
详细信息
In grammar -based testing, the test suites that drive the system under test are typically constructed from a given context -free grammar through a set of derivations that jointly satisfy some coverage criterion. In this paper, we describe and evaluate a new algorithm that instead constructs test suites from a set of valid paths that cover all edges in a labeled directed graph corresponding to an LR-automaton that accepts the language of the grammar. Vertices in this graph correspond to states in the LR-automaton;two vertices are connected by an edge iff the top of the LR-automaton's stack can change from one state to the other, either by shifting a terminal or non -terminal symbol (push edges), or by reducing with a grammar rule (pop edges). The algorithm constructs a unique reduction path for each pop edge in the graph. These reduction paths are recursively embedded into each other, and any unresolved non -terminal push edges are substituted by shortest derivations for the non -terminal symbol. The algorithm can work with different types of LR-automata, including LR(0)- and LR(1)automata, and can successfully generate a test suite from an LR-graph even if the underlying LR-automaton construction leads to shift/reduce or reduce/reduce conflicts. The algorithm only constructs valid paths over the LR-graphs that correspond to sentences in the language and thus generates only positive tests. We therefore also describe mutations to the positive paths that are guaranteed to generate negative tests without needing any further verification by an oracle. Our algorithm is substantially more efficient than an earlier algorithm that explores LR-graphs with two consecutive breadth -first graph traversals and our experimental evaluation shows that it scales to large production -quality grammars. It is robust against random choices made to resolve ambiguity in the construction of the tests, while the code coverage of the different test suite variants is relatively uniform. Finall
Quantum Computing (QC) promises computational speedup over classic computing. However, noise exists in near-term quantum computers. Quantum softwaretesting (for gaining confidence in quantum software's correctnes...
详细信息
Quantum Computing (QC) promises computational speedup over classic computing. However, noise exists in near-term quantum computers. Quantum softwaretesting (for gaining confidence in quantum software's correctness) is inevitably impacted by noise, i.e., it is impossible to know if a test case failed due to noise or real faults. Existing testing techniques test quantum programs without considering noise, i.e., by executing tests on ideal quantum computer simulators. Consequently, they are not directly applicable to test quantum software on real quantum computers or noisy simulators. Thus, we propose a noise-aware approach (named $\mathit{QOIN}$QOIN) to alleviate the noise effect on test results of quantum programs. $\mathit{QOIN}$QOIN employs machine learning techniques (e.g., transfer learning) to learn the noise effect of a quantum computer and filter it from a program's outputs. Such filtered outputs are then used as the input to perform test case assessments (determining the passing or failing of a test case execution against a test oracle). We evaluated $\mathit{QOIN}$QOIN on IBM's 23 noise models, Google's two available noise models, and Rigetti's Quantum Virtual Machine, with six real-world and 800 artificial programs. We also generated faulty versions of these programs to check if a failing test case execution can be determined under noise. Results show that $\mathit{QOIN}$QOIN can reduce the noise effect by more than $80\%$80% on most noise models. We used an existing test oracle to evaluate $\mathit{QOIN}$QOIN's effectiveness in quantum softwaretesting. The results showed that $\mathit{QOIN}$QOIN attained scores of $99\%$99%, $75\%$75%, and $86\%$86% for precision, recall, and F1-score, respectively, for the test oracle across six real-world programs. For artificial programs, $\mathit{QOIN}$QOIN achieved scores of $93\%$93%, $79\%$79%, and $86\%$86% for precision, recall, and F1-score respectively. This highlights $\mathit{QOIN}$QOIN's effectiveness in le
Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs. Despite the abundance of gameplay videos online, extracting useful information remains a challeng...
详细信息
Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs. Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This article introduces a method for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no external information, like video metadata;it solely depends on video content. Leveraging the zero-shot transfer capabilities of the contrastive language-image pretraining model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset, comprising 26 954 videos from 1873 games that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate the potential of our approach for applications, such as the creation of a video search tool tailored to identifying video game bugs, which could greatly benefit quality assurance teams in finding and reproducing bugs.
Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on rei...
详细信息
Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on reinforcement learning. Since the natural reward structure is very sparse, the key to successful exploration in reinforcement learning is reward augmentation. We show two different techniques that build on one another. First, we provide a decaying exploration bonus based on the discovery of new states-the reward decays as the same state is visited multiple times. The exploration bonus captures the intuition from coverage-guided fuzzing of prioritizing new coverage points;in contrast to other schemes, we show that taking the maximum of the bonus and the Q-value leads to more effective exploration. Second, we provide waypoints to the algorithm as a sequence of predicates that capture interesting semantic scenarios. Waypoints exploit designer insight about the protocol and guide the exploration to "interesting" parts of the state space. Our reward structure ensures that new episodes can reliably get to deep interesting states even without execution caching. We have implemented our algorithm in Go. Our evaluation on three large benchmarks (RedisRaft, Etcd, and RSL) shows that our algorithm can significantly outperform baseline approaches in terms of coverage and bug finding.
The step of software engineering that requires the utmost time and resources is softwaretesting. Techniques for test case reduction are used to condense the test suite, saving time and resources in the process. The b...
详细信息
ISBN:
(纸本)9789819735587;9789819735594
The step of software engineering that requires the utmost time and resources is softwaretesting. Techniques for test case reduction are used to condense the test suite, saving time and resources in the process. The basic goal of test case reduction is to get rid of useless test cases while still ensuring that the code being tested is sufficiently covered by the test suite. In this study, a model has been proposed to cut down on complexity and test cases. We have employed a moderated nature-inspired meta-heuristic algorithm called the Firefly Algorithm, to identify essential strings of test cases and eliminate irrelevant test instances. In our research, we have used some synthetic models and succeeded to reduce 36.17% of test cases.
Kernel fuzzers rely heavily on program mutation to automatically generate new test programs based on existing ones. In particular, program mutation can alter the test's control and data flow inside the kernel by i...
详细信息
ISBN:
(纸本)9798400710797
Kernel fuzzers rely heavily on program mutation to automatically generate new test programs based on existing ones. In particular, program mutation can alter the test's control and data flow inside the kernel by inserting new system calls, changing the values of call arguments, or performing other program mutations. However, due to the complexity of the kernel code and its user-space interface, finding the effective mutation that can lead to the desired outcome such as increasing the coverage and reaching a target code location is extremely difficult, even with the widespread use of manually-crafted heuristics. This work proposes SNOWPLOW, a kernel fuzzer that uses a learned white-box test mutator to enhance test mutation. The core of SNOWPLOW is an efficient machine learning model that can learn to predict promising mutations given the test program to mutate, its kernel code coverage, and the desired coverage. SNOWPLOW is demonstrated on argument mutations of the kernel tests, and evaluated on recent Linux kernel releases. When fuzzing the kernels for 24 hours, SNOWPLOW shows a significant speedup of discovering new coverage (4.8x similar to 5.2x) and achieves higher overall coverage (7.0%similar to 8.6%). In a 7-day fuzzing campaign, SNOWPLOW discovers 86 previously-unknown crashes. Furthermore, the learned mutator is shown to accelerate directed kernel fuzzing by reaching 19 target code locations 8.5x faster and two additional locations that are missed by the state-of-the-art directed kernel fuzzer.
暂无评论