检索结果-内蒙古大学图书馆

Demo2Test: Transfer testing of Agent in Competitive Environment with Failure Demonstrations

ACM TRANSACTIONS ON software ENGINEERING AND METHODOLOGY 2025年第2期34卷 1-28页

作者： Chen, Jianming Wang, Yawen Wang, Junjie Xie, Xiaofei Wang, Dandan Wang, Qing Xu, Fanjiang Chinese Acad Sci Inst Software Beijing Peoples R China Singapore Management Univ Singapore Singapore

The competitive game between agents exists in many critical applications, such as military unmanned aerial vehicles. It is urgent to test these agents to reduce the significant losses caused by their failures. Existing studies mainly are to construct a testing agent that competes with the target agent to induce its failures. These approaches usually focus on a single task, requiring much more time for multi-task testing. However, if the previously tested tasks (source tasks) and the task to be tested (target task) share similar agents or task objectives, the transferable knowledge in source tasks can potentially increase the effectiveness of testing in the target task. We propose Demo2Test for conducting transfer testing of agents in the competitive environment, i.e., leveraging the demonstrations of failure scenarios from the source task to boost the testing effectiveness in the target task. It trains a testing agent with demonstrations and incorporates the action perturbation at key states to balance the number of revealed failures and their diversity. We conduct experiments in the simulated robotics competitive environments of MuJoCo. The results indicate that Demo2Test outperforms the best-performing baseline with improvements ranging from 22.38% to 87.98%, and 12.69% to 60.98%, in terms of the number and diversity of discovered failure scenarios, respectively.

关键词： Computer systems organization Reliability software and its engineering software testing and debugging Computing methodologies Adversarial learning

来源：评论

学校读者我要写书评

暂无评论

A Two-Stage Algorithm for Identifying software Failure Regions

引用

IEEE TRANSACTIONS ON RELIABILITY 2025年第2期74卷 2693-2707页

作者： Mao, Chengying Zhu, Zheng Chen, Tsong Yueh Towey, Dave Wen, Linlin Chen, Jifu Jiangxi Univ Finance & Econ Sch Software & IoT Engn Nanchang 330013 Peoples R China Zhejiang Univ Sch Software Technol Ningbo 315048 Peoples R China Swinburne Univ Technol Dept Comp Sci & Software Engn Hawthorn Vic 3122 Australia Univ Nottingham Ningbo China Sch Comp Sci Ningbo 315100 Peoples R China

software developers can only obtain a very small amount of information from the individual failure-causing inputs, which makes debugging difficult. Therefore, it is necessary to explore additional failure-causing inputs (failure regions) using the known failure-causing inputs. In order to accurately and efficiently identify the failure region, we propose a novel two-stage search algorithm, TS-FRI. In the initial exploration stage, a round-robin search identifies several boundary failure-causing points, and the failure region's centroid is estimated. During the main search stage, the boundary failure-causing points are identified through iterative division of the input domain with an equally sized partitioning strategy. This results in the boundary points being as dispersed as possible around the failure-region boundary, with the polytope formed by the points approximating the failure region (e.g., a polygon in two dimensions). The proposed algorithm is validated through simulation and empirical analysis: The experimental results show that the TS-FRI accuracy is at least comparable to the best accuracy of the compared three algorithms, and can be ten times better. In addition, TS-FRI only takes a quarter of the computation time and half the failure-validation cost of the other algorithms.

关键词： software software algorithms Codes Costs Accuracy debugging software testing Accuracy and efficiency failure region identification failure-based testing software failure software testing and debugging

来源：评论

学校读者我要写书评

暂无评论

Spectrum-based rule- and item-level localization of faults in context-free grammars

引用

JOURNAL OF SYSTEMS AND software 2024年 215卷

作者： Raselimo, Moeketsi Fischer, Bernd Stellenbosch Univ Stellenbosch South Africa

We describe and evaluate spectrum -based methods aimed at finding faults in context -free grammars. In their basic form, they take as input a test suite and a parser for the grammar that is modified to collect grammar spectra (i.e., the sets of grammar elements used in attempts to parse the individual test cases), and return as output a ranked list of suspicious elements. We define grammar spectra suitable for localizing faults on the level of the grammar rules (i.e., rule spectra) and the rules' individual symbols (i.e., item spectra), respectively. We show how both types of grammar spectra can be collected by both LL and LR parsers, and how the JavaCC, ANTLR, and CUP parser generators can be modified and used to automate the collection of the grammar spectra. We also show how grammar spectra can be synthesized directly from test cases derived from a grammar, and how such synthetic spectra can be used to localize differences between a grammar and a black -box system under test. We first evaluate our approach over a large number of medium-sized single fault grammars, which we constructed by fault seeding from a common origin grammar. At the rule level, it ranks the rules containing the seeded faults within the top five rules in about 40%-70% of the cases, depending on the applied parsing technique, test suite, and ranking metric, and pinpoints them (i.e., correctly identifies them as unique most suspicious rule) in about 10%-30% of the cases, with significantly better results for the synthetic spectra. At the item level, our approach remains remarkably effective despite the larger number of possible locations, provided it is coupled with a simple tie -breaking strategy that prefers items with the right -most designated position over other items from the same rules in a tie. It typically ranks the seeded faults within the top five positions in about 30%-60% of the cases, and pinpoints them in about 15%-40% of the cases. This specialized item -level localization also

关键词： software testing and debugging Grammars and context-free languages

来源：评论

学校读者我要写书评

暂无评论

Grammar-based test suite construction using coverage-directed algorithms over LR-graphs

引用

JOURNAL OF SYSTEMS AND software 2024年 214卷

作者： Rossouw, Christoff Fischer, Bernd Stellenbosch Univ Div Comp Sci Stellenbosch South Africa

In grammar -based testing, the test suites that drive the system under test are typically constructed from a given context -free grammar through a set of derivations that jointly satisfy some coverage criterion. In this paper, we describe and evaluate a new algorithm that instead constructs test suites from a set of valid paths that cover all edges in a labeled directed graph corresponding to an LR-automaton that accepts the language of the grammar. Vertices in this graph correspond to states in the LR-automaton;two vertices are connected by an edge iff the top of the LR-automaton's stack can change from one state to the other, either by shifting a terminal or non -terminal symbol (push edges), or by reducing with a grammar rule (pop edges). The algorithm constructs a unique reduction path for each pop edge in the graph. These reduction paths are recursively embedded into each other, and any unresolved non -terminal push edges are substituted by shortest derivations for the non -terminal symbol. The algorithm can work with different types of LR-automata, including LR(0)- and LR(1)automata, and can successfully generate a test suite from an LR-graph even if the underlying LR-automaton construction leads to shift/reduce or reduce/reduce conflicts. The algorithm only constructs valid paths over the LR-graphs that correspond to sentences in the language and thus generates only positive tests. We therefore also describe mutations to the positive paths that are guaranteed to generate negative tests without needing any further verification by an oracle. Our algorithm is substantially more efficient than an earlier algorithm that explores LR-graphs with two consecutive breadth -first graph traversals and our experimental evaluation shows that it scales to large production -quality grammars. It is robust against random choices made to resolve ambiguity in the construction of the tests, while the code coverage of the different test suite variants is relatively uniform. Finall

关键词： software testing and debugging Grammars and context-free languages Test case generation Push-down automata

来源：评论

学校读者我要写书评

暂无评论

Mitigating Noise in Quantum software testing Using Machine Learning

引用

IEEE TRANSACTIONS ON software ENGINEERING 2024年第11期50卷 2947-2961页

作者： Muqeet, Asmar Yue, Tao Ali, Shaukat Arcaini, Paolo Simula Res Lab N-0164 Oslo Norway Univ Oslo N-0313 Oslo Norway Oslo Metropolitan Univ N-0130 Oslo Norway Natl Inst Informat Tokyo 1018430 Japan

Quantum Computing (QC) promises computational speedup over classic computing. However, noise exists in near-term quantum computers. Quantum software testing (for gaining confidence in quantum software's correctness) is inevitably impacted by noise, i.e., it is impossible to know if a test case failed due to noise or real faults. Existing testing techniques test quantum programs without considering noise, i.e., by executing tests on ideal quantum computer simulators. Consequently, they are not directly applicable to test quantum software on real quantum computers or noisy simulators. Thus, we propose a noise-aware approach (named $\mathit{QOIN}$QOIN) to alleviate the noise effect on test results of quantum programs. $\mathit{QOIN}$QOIN employs machine learning techniques (e.g., transfer learning) to learn the noise effect of a quantum computer and filter it from a program's outputs. Such filtered outputs are then used as the input to perform test case assessments (determining the passing or failing of a test case execution against a test oracle). We evaluated $\mathit{QOIN}$QOIN on IBM's 23 noise models, Google's two available noise models, and Rigetti's Quantum Virtual Machine, with six real-world and 800 artificial programs. We also generated faulty versions of these programs to check if a failing test case execution can be determined under noise. Results show that $\mathit{QOIN}$QOIN can reduce the noise effect by more than $80\%$80% on most noise models. We used an existing test oracle to evaluate $\mathit{QOIN}$QOIN's effectiveness in quantum software testing. The results showed that $\mathit{QOIN}$QOIN attained scores of $99\%$99%, $75\%$75%, and $86\%$86% for precision, recall, and F1-score, respectively, for the test oracle across six real-world programs. For artificial programs, $\mathit{QOIN}$QOIN achieved scores of $93\%$93%, $79\%$79%, and $86\%$86% for precision, recall, and F1-score respectively. This highlights $\mathit{QOIN}$QOIN's effectiveness in le

关键词： Noise Quantum computing Qubit Computers software testing Logic gates Computational modeling software testing and debugging computing methodologies quantum computing and machine learning

来源：评论

学校读者我要写书评

暂无评论

Searching Bug Instances in Gameplay Video Repositories

引用

IEEE TRANSACTIONS ON GAMES 2024年第3期16卷 697-710页

作者： Taesiri, Mohammad Reza Macklon, Finlay Habchi, Sarra Bezemer, Cor-Paul Univ Alberta Analyt Software GAmes & Repository Data ASGAARD La Edmonton AB T6G 2R3 Canada Ubisoft Montreal Montreal PQ H2T 1S6 Canada

Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs. Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This article introduces a method for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no external information, like video metadata;it solely depends on video content. Leveraging the zero-shot transfer capabilities of the contrastive language-image pretraining model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset, comprising 26 954 videos from 1873 games that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate the potential of our approach for applications, such as the creation of a video search tool tailored to identifying video game bugs, which could greatly benefit quality assurance teams in finding and reproducing bugs.

关键词： Training software testing Video games Visualization Quality assurance Social networking (online) Computer bugs software testing and debugging video mining bug reports video games video retrieval

来源：评论

学校读者我要写书评

暂无评论

SURE: A Visualized Failure Indexing Approach Using Program Memory Spectrum

引用

ACM TRANSACTIONS ON software ENGINEERING AND METHODOLOGY 2024年第8期33卷 1-43页

作者： Song, Yi Zhang, Xihao Xie, Xiaoyuan Chen, Songqiang Liu, Quanming Gao, Ruizhi Wuhan Univ Sch Comp Sci Wuhan Peoples R China Hong Kong Univ Sci & Technol Hong Kong Hong Kong Peoples R China Sonos Inc Boston MA 02111 USA

This work was partially supported by the National Natural Science Foundation of China under the grant number 62250610224.

关键词： software and its engineering software testing and debugging

来源：评论

学校读者我要写书评

暂无评论

Reward Augmentation in Reinforcement Learning for testing Distributed Systems

引用

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL 2024年第OOPSLA2期8卷 1928-1954页

作者： Borgarelli, Andrea Enea, Constantin Majumdar, Rupak Nagendra, Srinidhi MPI SWS Kaiserslautern Germany Inst Polytech Paris LIX Ecole Polytech CNRS Palaiseau France Univ Paris Cite IRIF CNRS Paris France Chennai Math Inst Chennai India

Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on reinforcement learning. Since the natural reward structure is very sparse, the key to successful exploration in reinforcement learning is reward augmentation. We show two different techniques that build on one another. First, we provide a decaying exploration bonus based on the discovery of new states-the reward decays as the same state is visited multiple times. The exploration bonus captures the intuition from coverage-guided fuzzing of prioritizing new coverage points;in contrast to other schemes, we show that taking the maximum of the bonus and the Q-value leads to more effective exploration. Second, we provide waypoints to the algorithm as a sequence of predicates that capture interesting semantic scenarios. Waypoints exploit designer insight about the protocol and guide the exploration to "interesting" parts of the state space. Our reward structure ensures that new episodes can reliably get to deep interesting states even without execution caching. We have implemented our algorithm in Go. Our evaluation on three large benchmarks (RedisRaft, Etcd, and RSL) shows that our algorithm can significantly outperform baseline approaches in terms of coverage and bug finding.

关键词： CCS Concepts software and its engineering software testing and debugging Theory of computation Reinforcement learning Computing methodologies Distributed algorithms

来源：评论

学校读者我要写书评

暂无评论

software Test Case Minimization Using Modified Firefly Technique 9th

Software Test Case Minimization Using Modified Firefly Techn...

引用

9th International Congress on Information and Communication Technology (ICICT)

作者： Sobuj, Md Shazzad Ali Rizwan, Syed Akhond, Mostafijur Rahman Jashore Univ Sci & Technol Jashore 7400 Bangladesh

ISBN: (纸本)9789819735587;9789819735594

The step of software engineering that requires the utmost time and resources is software testing. Techniques for test case reduction are used to condense the test suite, saving time and resources in the process. The basic goal of test case reduction is to get rid of useless test cases while still ensuring that the code being tested is sufficiently covered by the test suite. In this study, a model has been proposed to cut down on complexity and test cases. We have employed a moderated nature-inspired meta-heuristic algorithm called the Firefly Algorithm, to identify essential strings of test cases and eliminate irrelevant test instances. In our research, we have used some synthetic models and succeeded to reduce 36.17% of test cases.

关键词： software testing and debugging software verification and validation software defect analysis software and its engineering

来源：评论

学校读者我要写书评

暂无评论

SNOWPLOW: Effective Kernel Fuzzing with a Learned White-box Test Mutator 25

SNOWPLOW: Effective Kernel Fuzzing with a Learned White-box ...

引用

30th International Conference on Architectural Support for Programming Languages and Operating Systems-ASPLOS

作者： Gong, Sishuai Wang, Rui Altinbuken, Deniz Fonseca, Pedro Maniatis, Petros Purdue Univ W Lafayette IN 47907 USA Google DeepMind Mountain View CA USA

ISBN: (纸本)9798400710797

Kernel fuzzers rely heavily on program mutation to automatically generate new test programs based on existing ones. In particular, program mutation can alter the test's control and data flow inside the kernel by inserting new system calls, changing the values of call arguments, or performing other program mutations. However, due to the complexity of the kernel code and its user-space interface, finding the effective mutation that can lead to the desired outcome such as increasing the coverage and reaching a target code location is extremely difficult, even with the widespread use of manually-crafted heuristics. This work proposes SNOWPLOW, a kernel fuzzer that uses a learned white-box test mutator to enhance test mutation. The core of SNOWPLOW is an efficient machine learning model that can learn to predict promising mutations given the test program to mutate, its kernel code coverage, and the desired coverage. SNOWPLOW is demonstrated on argument mutations of the kernel tests, and evaluated on recent Linux kernel releases. When fuzzing the kernels for 24 hours, SNOWPLOW shows a significant speedup of discovering new coverage (4.8x similar to 5.2x) and achieves higher overall coverage (7.0%similar to 8.6%). In a 7-day fuzzing campaign, SNOWPLOW discovers 86 previously-unknown crashes. Furthermore, the learned mutator is shown to accelerate directed kernel fuzzing by reaching 19 target code locations 8.5x faster and two additional locations that are missed by the state-of-the-art directed kernel fuzzer.

关键词： Kernel fuzzing Operating systems reliability and security software testing and debugging

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：