In recent years, AI-based softwareengineering has progressed from pre-trained models to advanced agentic workflows, with software Development Agents representing the next major leap. These agents, capable of reasonin...
详细信息
ISBN:
(数字)9798331535100
ISBN:
(纸本)9798331535117
In recent years, AI-based softwareengineering has progressed from pre-trained models to advanced agentic workflows, with software Development Agents representing the next major leap. These agents, capable of reasoning, planning, and interacting with external environments, offer promising solutions to complex softwareengineering tasks. However, while much research has evaluated code generated by large language models (LLMs), comprehensive studies on agent-generated patches, particularly in real-world settings, are lacking. This study addresses that gap by evaluating 4,892 patches from 10 top-ranked agents on 500 real-world GitHub issues from SWE-Bench Verified, focusing on their impact on code quality. Our analysis shows no single agent dominated, with 170 issues unresolved, indicating room for improvement. Even for patches that passed unit tests and resolved issues, agents made different file and function modifications compared to the gold patches from repository developers, revealing limitations in the benchmark's test case coverage. Most agents maintained code reliability and security, avoiding new bugs or vulnerabilities; while some agents increased code complexity, many reduced code duplication and minimized code smells. Finally, agents performed better on simpler codebases, suggesting that breaking complex tasks into smaller sub-tasks could improve effectiveness. This study provides the first comprehensive evaluation of agent-generated patches on real-world GitHub issues, offering insights to advance AI-driven software development.
In recent years, AI-based softwareengineering has progressed from pre-trained models to advanced agentic workflows, with software Development Agents representing the next major leap. These agents, capable of reasonin...
详细信息
In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. The exploration of effective...
详细信息
In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. The exploration of effective collaborative training settings, which are capable of leveraging valuable knowledge from distributed and isolated datasets, is increasingly *** study investigates key factors that impact the effectiveness of collaborative training methods in code next-token prediction, as well as the correctness and utility of the generated code, showing the promise of such methods. Additionally, we evaluate the memorization of different participant training data across various collaborative training settings, including centralized, federated, and incremental training, showing their potential risks in leaking data. Our findings indicate that the size and diversity of code datasets are pivotal factors influencing the success of collaborative trained code models. We demonstrate that federated learning achieves competitive performance compared to centralized training while offering better data protection, as evidenced by lower memorization ratios in the generated code. However, federated learning can still produce verbatim code snippets from hidden training data, potentially violating data privacy or copyright. Our study further explores the patterns of effectiveness and memorization in incremental learning, emphasizing the importance of the sequence in which individual participant datasets are introduced. Also, we identify the memorization phenomenon of cross-organizational clones as a prevalent challenge in both centralized and federated learning scenarios. Our findings highlight the persistent risk of data leakage during inference, even when training data remains unseen. We conclude with strategic recommendations for practitioners and researchers to optimize the use of multisource datasets, thereby propelling the cross-organizational collaboration forward. Cop
Autonomous Vehicle (AV) usage has become predominant in the rapidly evolving landscape of urban transportation. Integrating AVs and non-AVs in the existing traffic infrastructure has significantly increased the comple...
详细信息
Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integratio...
详细信息
Algorithmic verification of realistic systems to sat-isfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many a...
详细信息
ISBN:
(数字)9798350382655
ISBN:
(纸本)9798350382662
Algorithmic verification of realistic systems to sat-isfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many approaches still rely on exact models of the underlying systems. Since this assumption can rarely be met in practice, models have to be inferred from measurement data or are bypassed completely. Whilst former usually requires the model structure to be known a-priori and immense amounts of data to be available, latter gives rise to a plethora of restrictive mathematical assumptions about the unknown dynamics. In a pursuit of developing scalable formal verification algorithms without shifting the problem to unrealistic assumptions, we employ the concept of barrier certificates, which can guarantee safety of the system, and learn the certificate directly from a compact set of system trajectories. We use conditional mean embeddings to embed data from the system into a reproducing kernel Hilbert space (RKHS) and construct an RKHS ambiguity set that can be inflated to robustify the result w.r.t. a set of plausible transition kernels. We show how to solve the resulting program efficiently using sum-of-squares optimization and a Gaussian process envelope. Our approach lifts the need for restrictive assumptions on the system dynamics and uncertainty, and suggests an improvement in the sample complexity of verifying the safety of a system on a tested case study compared to a state-of-the-art approach.
Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retrain...
详细信息
The notion of a metaverse seems hard to define but encourages the impression that it can be considered as a new virtual metaphysical landscape that somehow goes beyond our geographical locations and understanding (i.e...
详细信息
Algorithmic verification of realistic systems to satisfy safety and other temporal requirements has suffered from poor scalability of the employed formal approaches. To design systems with rigorous guarantees, many ap...
详细信息
Context: Machine learning (ML) is a field that involves analysing raw data and extracting useful information from it through specific phases. As continuous practices become more prevalent in software projects, there i...
详细信息
Context: Machine learning (ML) is a field that involves analysing raw data and extracting useful information from it through specific phases. As continuous practices become more prevalent in software projects, there is a need to explore how ML methods can be trained to enhance the Continuous Integration (CI) pipeline. Moreover, the growing utilization of ML algorithms in CI, combined with the surge of literature on the subject, highlights the significance of establishing a comprehensive body of knowledge to support future researchers in conducting high-quality research and bridging any existing gaps. Objective: The objective of this research is to conduct a systematic review and analysis of the existing literature on ML-based methods employed in the CI domain. This study aims to identify and describe the various techniques employed in the literature and present the key characteristics of the training phases of ML-based solutions in the CI context. Method: To achieve this objective, we conducted a Systematic Literature Review (SLR) of 48 primary studies that were selected after searching relevant literature published over the past 22 years (2000–November 2022). We used statistical and thematic analysis to examine the composition phase of CI, data engineering techniques and data source types, feature engineering methods and extracted features, the employed hyper-parameter tuning methods and types of the ML models, and the evaluation methods and metrics used in the selected studies. Additionally, this article aims to present the relationship between these concepts. Results: In this paper, we have depicted the phases of CI testing, the connection between them, and the employed techniques in training the ML method phases. We presented nine types of data sources and four taken steps in the selected studies for preparing the data. Also, we identified four feature types and nine subsets of data features through thematic analysis of the selected studies. Besides, five method
暂无评论