This paper considers issues of conducting research to assess the software interface quality level. The list of criteria for assessing the interface quality is determined in accordance with the generally accepted inter...
详细信息
With the rise of the library ecosystem (such as NPM for JavaScript and PyPI for Python), a developer has access to a multitude of library packages that they can adopt as dependencies into their application. Prior work...
详细信息
Defect detection and repair are critical to the performance and dependability of modern software development. To evaluate large software repositories and find errors early in the development process, this study propos...
详细信息
Selecting the best code solution from multiple generated ones is an essential task in code generation, which can be achieved by using some reliable validators (e.g., developer-written test cases) for assistance. Since...
详细信息
ISBN:
(纸本)9798400712487
Selecting the best code solution from multiple generated ones is an essential task in code generation, which can be achieved by using some reliable validators (e.g., developer-written test cases) for assistance. Since reliable test cases are not always available and can be expensive to build in practice, researchers propose to automatically generate test cases to assess code solutions. However, when both code solutions and test cases are plausible and not reliable, selecting the best solution becomes challenging. Although some heuristic strategies have been proposed to tackle this problem, they lack a strong theoretical guarantee and it is still an open question whether an optimal selection strategy exists. Our work contributes in two ways. First, we show that within a Bayesian framework, the optimal selection strategy can be defined based on the posterior probability of the observed passing states between solutions and tests. The problem of identifying the best solution is then framed as an integer programming problem. Second, we propose an efficient approach for approximating this optimal (yet uncomputable) strategy, where the approximation error is bounded by the correctness of prior knowledge. We then incorporate effective prior knowledge to tailor code generation tasks. Both theoretical and empirical studies confirm that existing heuristics are limited in selecting the best solutions with plausible test cases. Our proposed approximated optimal strategy B-4 significantly surpasses existing heuristics in selecting code solutions generated by large language models (LLMs) with LLM-generated tests, achieving a relative performance improvement by up to 50% over the strongest heuristic and 246% over the random selection in the most challenging scenarios. Our code is publicly available at https://***/ZJU-CTAG/B4.
In this work, A multi-physics field simulation method based on the finite element method is developed for reliability analysis of multi-band antennas. The Multiphysics field simulation describes the multidisciplinary ...
详细信息
Coincidental correctness (CC) can be misleading for developers because it gives the impression that the code is functioning correctly when there are hidden faults. To mitigate the negative impacts of CC test cases, ex...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
Coincidental correctness (CC) can be misleading for developers because it gives the impression that the code is functioning correctly when there are hidden faults. To mitigate the negative impacts of CC test cases, extensive research has been conducted on their detection, employing either coverage-based or expert-based features. These studies have yielded promising results. Coverage and expert features each provide unique insights into program execution, yet the literature has not fully explored the combined potential of these two feature sets to enhance the detection of CC. Additionally, the rich semantics of the test code and focal method have not been fully utilized. Therefore, we propose to build a unified model, CORE, that integrates coverage and expert features with semantic representations of test and focal methods to improve the detection of CC test cases. We make a comprehensive evaluation with six state-of-the-art baselines on the widely-used Defects4J benchmark. The experimental results show that CORE outperforms the baselines in terms of CC detection accuracy, with a substantial improvement (i.e., 40% improvement on average in terms of F1 score). Then, we conduct the ablation experiment to show that the coverage, expert, and semantics contribute to CORE. CORE can also improve the effectiveness of spectrum-based and mutation-based fault localization performance (e.g., 50% improvements for spectrum-based formula Dstar and 44% improvements for mutation-based method MUSE under relabeling strategy).
In the recent digital era security has become more challenging. There is a plethora of ways to find solution to ensure monitor the system and provide required security. Key logger is one of the cyber attacks which rec...
详细信息
software vulnerabilities are a major cyber threat and it is important to detect them. One important approach to detecting vulnerabilities is to use deep learning while treating a program function as a whole, known as ...
详细信息
software Defined Networking (SDN) has emerged as a revolutionary network architecture aimed at surpassing the constraints inherent in traditional network infrastructures. As SDN adoption increases, it brings additiona...
详细信息
In this paper, a near vertical incident skywave (NVIS) shortwave magnetic loop antenna suitable for engineering applications is proposed and designed, and the effects of its structural form, size, and impedance matchi...
详细信息
暂无评论