The competitive game between agents exists in many critical applications, such as military unmanned aerial vehicles. It is urgent to test these agents to reduce the significant losses caused by their failures. Existin...
详细信息
The competitive game between agents exists in many critical applications, such as military unmanned aerial vehicles. It is urgent to test these agents to reduce the significant losses caused by their failures. Existing studies mainly are to construct a testing agent that competes with the target agent to induce its failures. These approaches usually focus on a single task, requiring much more time for multi-task testing. However, if the previously tested tasks (source tasks) and the task to be tested (target task) share similar agents or task objectives, the transferable knowledge in source tasks can potentially increase the effectiveness of testing in the target task. We propose Demo2Test for conducting transfer testing of agents in the competitive environment, i.e., leveraging the demonstrations of failure scenarios from the source task to boost the testing effectiveness in the target task. It trains a testing agent with demonstrations and incorporates the action perturbation at key states to balance the number of revealed failures and their diversity. We conduct experiments in the simulated robotics competitive environments of MuJoCo. The results indicate that Demo2Test outperforms the best-performing baseline with improvements ranging from 22.38% to 87.98%, and 12.69% to 60.98%, in terms of the number and diversity of discovered failure scenarios, respectively.
Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically ...
详细信息
Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model architecture and hyperparameters for a given task. Like other software systems, existing AutoML systems have shortcomings in their design. We identify two common and severe shortcomings in AutoML, performance issue (i.e., searching for the desired model takes an unreasonably long time) and ineffective search issue (i.e., AutoML systems are not able to find an accurate enough model). After analyzing the workflow of AutoML, we observe that existing AutoML systems overlook potential opportunities in search space, search method, and search feedback, which results in performance and ineffective search issues. Based on our analysis, we design and implement DREAM, an automatic and general-purpose tool to alleviate and repair the shortcomings of AutoML pipelines and conduct effective model searches for diverse tasks. It monitors the process of AutoML to collect detailed feedback and automatically repairs shortcomings by expanding search space and leveraging a feedback-driven search strategy. Our evaluation results show that DREAM can be applied on two state-of-the-art AutoML pipelines and effectively and efficiently repair their shortcomings.
Kernel fuzzers rely heavily on program mutation to automatically generate new test programs based on existing ones. In particular, program mutation can alter the test's control and data flow inside the kernel by i...
详细信息
ISBN:
(纸本)9798400710797
Kernel fuzzers rely heavily on program mutation to automatically generate new test programs based on existing ones. In particular, program mutation can alter the test's control and data flow inside the kernel by inserting new system calls, changing the values of call arguments, or performing other program mutations. However, due to the complexity of the kernel code and its user-space interface, finding the effective mutation that can lead to the desired outcome such as increasing the coverage and reaching a target code location is extremely difficult, even with the widespread use of manually-crafted heuristics. This work proposes SNOWPLOW, a kernel fuzzer that uses a learned white-box test mutator to enhance test mutation. The core of SNOWPLOW is an efficient machine learning model that can learn to predict promising mutations given the test program to mutate, its kernel code coverage, and the desired coverage. SNOWPLOW is demonstrated on argument mutations of the kernel tests, and evaluated on recent Linux kernel releases. When fuzzing the kernels for 24 hours, SNOWPLOW shows a significant speedup of discovering new coverage (4.8x similar to 5.2x) and achieves higher overall coverage (7.0%similar to 8.6%). In a 7-day fuzzing campaign, SNOWPLOW discovers 86 previously-unknown crashes. Furthermore, the learned mutator is shown to accelerate directed kernel fuzzing by reaching 19 target code locations 8.5x faster and two additional locations that are missed by the state-of-the-art directed kernel fuzzer.
Service choreographies present numerous engineering challenges, particularly with respect to testing activities, that traditional design-time approaches cannot properly address. A proposed online testing solution offe...
详细信息
Service choreographies present numerous engineering challenges, particularly with respect to testing activities, that traditional design-time approaches cannot properly address. A proposed online testing solution offers a powerful, extensible framework to effectively assess service compositions, leading to a more trustworthy and reliable service ecosystem.
Quantum Computing (QC) promises computational speedup over classic computing. However, noise exists in near-term quantum computers. Quantum softwaretesting (for gaining confidence in quantum software's correctnes...
详细信息
Quantum Computing (QC) promises computational speedup over classic computing. However, noise exists in near-term quantum computers. Quantum softwaretesting (for gaining confidence in quantum software's correctness) is inevitably impacted by noise, i.e., it is impossible to know if a test case failed due to noise or real faults. Existing testing techniques test quantum programs without considering noise, i.e., by executing tests on ideal quantum computer simulators. Consequently, they are not directly applicable to test quantum software on real quantum computers or noisy simulators. Thus, we propose a noise-aware approach (named $\mathit{QOIN}$QOIN) to alleviate the noise effect on test results of quantum programs. $\mathit{QOIN}$QOIN employs machine learning techniques (e.g., transfer learning) to learn the noise effect of a quantum computer and filter it from a program's outputs. Such filtered outputs are then used as the input to perform test case assessments (determining the passing or failing of a test case execution against a test oracle). We evaluated $\mathit{QOIN}$QOIN on IBM's 23 noise models, Google's two available noise models, and Rigetti's Quantum Virtual Machine, with six real-world and 800 artificial programs. We also generated faulty versions of these programs to check if a failing test case execution can be determined under noise. Results show that $\mathit{QOIN}$QOIN can reduce the noise effect by more than $80\%$80% on most noise models. We used an existing test oracle to evaluate $\mathit{QOIN}$QOIN's effectiveness in quantum softwaretesting. The results showed that $\mathit{QOIN}$QOIN attained scores of $99\%$99%, $75\%$75%, and $86\%$86% for precision, recall, and F1-score, respectively, for the test oracle across six real-world programs. For artificial programs, $\mathit{QOIN}$QOIN achieved scores of $93\%$93%, $79\%$79%, and $86\%$86% for precision, recall, and F1-score respectively. This highlights $\mathit{QOIN}$QOIN's effectiveness in le
software testing and debugging has become the most critical aspect of the development of modern embedded systems, mainly driven by growing software and hardware complexity, increasing integration, and tightening time-...
详细信息
software testing and debugging has become the most critical aspect of the development of modern embedded systems, mainly driven by growing software and hardware complexity, increasing integration, and tightening time-to-market deadlines. software developers increasingly rely on on-chip trace and debug infrastructure to locate software bugs faster. However, the existing infrastructure offers limited visibility or relies on hefty on-chip buffers and wide trace ports that significantly increase system cost. This paper introduces a new technique called mcfTRaptor for capturing and compressing functional and time-stamped control-flow traces on-the-fly in modern multicore systems. It relies on private on-chip predictor structures and corresponding software modules in the debugger to significantly reduce the number of events that needs to be streamed out of the target platform. Our experimental evaluation explores the effectiveness of mcfrRaptor as a function of the number of cores, encoding mechanisms, and predictor configurations. When compared to the Nexus-like control-flow tracing, mcfTRaptor reduces the trace port bandwidth in the range from 14 to 23.8 times for functional traces and 10.8-18.6 times for time-stamped traces. (C) 2015 Elsevier BM. All rights reserved.
Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs. Despite the abundance of gameplay videos online, extracting useful information remains a challeng...
详细信息
Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs. Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This article introduces a method for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no external information, like video metadata;it solely depends on video content. Leveraging the zero-shot transfer capabilities of the contrastive language-image pretraining model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset, comprising 26 954 videos from 1873 games that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate the potential of our approach for applications, such as the creation of a video search tool tailored to identifying video game bugs, which could greatly benefit quality assurance teams in finding and reproducing bugs.
software reliability is one of the most important internal attributes of software systems. Over the past three decades, many software reliability growth models have been proposed and discussed. Some research has also ...
详细信息
software reliability is one of the most important internal attributes of software systems. Over the past three decades, many software reliability growth models have been proposed and discussed. Some research has also shown that the fault detection and removal processes of software can be described and modeled using an infinite-server queueing system. But, there is practically no company that can afford unlimited resources to test and correct detected faults in the real world. Consequently, the number of debuggers should be limited, not infinite. In this paper, we propose an extended finite-server-queueing (EFSQ) model to analyze the fault removal process of the software system. Numerical examples based on real project data are illustrated. Evaluation results show that our proposed EFSQ model has a fairly accurate prediction capability of software reliability and also depict the real-life situation of software development activities more faithfully. Finally, the applications of our proposed model to project management are also presented. Our proposed model can provide a theoretically effective technique for managing the necessary activities of testing and debugging in software project management.
In the software reliability engineering literature, few attempts have been made to study the fault debugging environment using discrete-time modelling. Most endeavours assume that a detected fault to have been either ...
详细信息
In the software reliability engineering literature, few attempts have been made to study the fault debugging environment using discrete-time modelling. Most endeavours assume that a detected fault to have been either immediately removed or is perfectly debugged. Such discrete-time models may be used for any debugging environment and may be termed black-box, which are used without having prior knowledge about the nature of the fault being debugged. However, if one has to develop a white-box model, one needs to be cognizant of the debugging environment. During debugging, there are numerous factors that affect the debugging process. These factors may include the internal, for example, fault density, and fault debugging complexity and the external that originates in the debugging environment itself, such as the skills of the debugging team and the debugging effort expenditures. Hence, the debugging environment fault removal may take a longer time after having been detected. Therefore, it is imperative to clearly understand the testing and debugging environment and, hence, the urgency to develop a model. The model ought to take into account the fault debugging complexity and incorporate the learning phenomenon of the debugger under imperfect debugging environment. This objective dictates developing a framework through an integrated modelling approach based on nonhomogenous Poisson process that incorporates these realistic factors during the fault debugging process. Actual software reliability data have been used to demonstrate applicability of the proposed integrated framework. Copyright (C) 2016 John Wiley & Sons, Ltd.
暂无评论