This study examines to what extent the testing of traditional software components and machinelearning (ML) models fundamentally differs or not. While some researchers argue that ML software requires new concepts and ...
详细信息
ISBN:
(纸本)9798350333350
This study examines to what extent the testing of traditional software components and machinelearning (ML) models fundamentally differs or not. While some researchers argue that ML software requires new concepts and perspectives for testing, our analysis highlights that, at a fundamental level, the specification and testing of a software component are not dependent on the development process used or on implementation details. Although the softwareengineering/computer science (SE/CS) and Data Science/ML (DS/ML) communities have developed different expectations, unique perspectives, and varying testing methods, they share clear commonalities that can be leveraged. We argue that both areas can learn from each other, and a non-dual perspective could provide novel insights not only for testing ML but also for testing traditional software. Therefore, we call upon researchers from both communities to collaborate more closely and develop testing methods and tools that can address both traditional and ML software components. While acknowledging their differences has merits, we believe there is great potential in working on unified methods and tools that can address both types of software.
Leaked secrets in source code lead to information security problems. It is important to find sensitive information in the repository as early as possible and neutralize it. By now, there are many different approaches ...
详细信息
ISBN:
(纸本)9789897586477
Leaked secrets in source code lead to information security problems. It is important to find sensitive information in the repository as early as possible and neutralize it. By now, there are many different approaches to leaked secret detection without human intervention. Often, these are heuristic algorithms using regular expressions. Recently, more and more approaches based on machinelearning have appeared. Nevertheless, the problem of detecting secrets in the code remains relevant since the available approaches often give a large number of false positives. In this paper, we propose an approach to leaked secret detection in source code based on machinelearning using bigrams. This approach significantly reduces the number of false positives. The model showed a false positive rate of 2.4% and false negative rate of 1.9% on test dataset.
machinelearning models are increasingly used in practice. However, many machinelearning methods are sensitive to test or operational data that is dissimilar to training data. Out-of-distribution (OOD) data is known ...
详细信息
ISBN:
(纸本)9798350333350
machinelearning models are increasingly used in practice. However, many machinelearning methods are sensitive to test or operational data that is dissimilar to training data. Out-of-distribution (OOD) data is known to increase the probability of error and research into metrics that identify what dissimilarities in data affect model performance is on-going. Recently, combinatorial coverage metrics have been explored in the literature as an alternative to distribution-based metrics. Results show that coverage metrics can correlate with classification error. However, other results show that the utility of coverage metrics is highly dataset-dependent. In this paper, we show that this datasetdependence can be alleviated with metric learning, a machinelearning technique for learning latent spaces where data from different classes is further apart. In a study of 6 open-source datasets, we find that metric learning increased the difference between set-difference coverage metrics (SDCCMs) calculated on correctly and incorrectly classified data, thereby demonstrating that metric learning improves the ability of SDCCMs to anticipate classification error. Paired t-tests validate the statistical significance of our findings. Overall, we conclude that metric learning improves the ability of coverage metrics to anticipate classifier error and identify when OOD data is likely to degrade model performance.
The proceedings contain 17 papers. The topics discussed include: creating happier and more productive softwareengineering teams through AI and machinelearning;hybrid work in agile software development;ai assistant t...
The proceedings contain 17 papers. The topics discussed include: creating happier and more productive softwareengineering teams through AI and machinelearning;hybrid work in agile software development;ai assistant to improve experimentation in software startups using large language model and prompt engineering;improving software startup viability: addressing requirements prioritization challenges amid increasing interest rates;the impact of DevOps critical success factors and organizational practices;diversity, equity and inclusion in the age of generative AI;exploring the 6G software business ecosystem: a morphological analysis approach;digital sovereignty from the perspective of IT consultancy in Germany: a model;the effects of quality of technology on learning performance in remote lectures;and transforming software products for intelligent automation services.
Defect prediction has been a popular research topic where machinelearning (ML) and deep learning (DL) have found numerous applications. However, these ML/DL-based defect prediction models are often limited by the qua...
详细信息
ISBN:
(纸本)9798350311846
Defect prediction has been a popular research topic where machinelearning (ML) and deep learning (DL) have found numerous applications. However, these ML/DL-based defect prediction models are often limited by the quality and size of their datasets. In this paper, we present Defectors, a large dataset for just-in-time and line-level defect prediction. Defectors consists of approximate to 213K source code files (approximate to 93K defective and approximate to 120K defect-free) that span across 24 popular Python projects. These projects come from 18 different domains, including machinelearning, automation, and internet-of-things. Such a scale and diversity make Defectors a suitable dataset for training ML/DL models, especially transformer models that require large and diverse datasets. We also foresee several application areas of our dataset including defect prediction and defect explanation.
In the modern world of software development, ensuring reliability and performance is of paramount importance. However, despite the best efforts from the developers, software defects can still emerge, causing frustrati...
详细信息
The rapidly growing technology specially in the field of software, machinelearning (ML) has played an important role in a range of tasks, including voice, video, and computer vision. It is currently being utilised in...
详细信息
Approximately 270,000 new cases of Breast cancer, the most common cancer in women, are reported each year, with instances identified in 2022. software for detection is required to find it before it becomes dead becaus...
详细信息
Direct ink writing (DIW) is an extrusion-based additive manufacturing technology. It has gained wide attentions in both industry and research because of its simple design and versatile platform. In electric-field-assi...
详细信息
ISBN:
(纸本)9780791887233
Direct ink writing (DIW) is an extrusion-based additive manufacturing technology. It has gained wide attentions in both industry and research because of its simple design and versatile platform. In electric-field-assisted Direct Ink Writing (eDIW) processes, an external electric field is added between the nozzle and the printing substrate to manipulate the ink-substrate wetting dynamics and therefore optimize the ink printability. eDIW was found effective in printing liquids that are typically difficult to print in the conventional DIW processes. In this paper, an eDIW process modeling system based on machinelearning (ML) algorithms is developed. The system is found effective in predicting eDIW printing geometry under varied process parameter settings. Image processing approaches to collect experiment data are developed. Accuracies of different machinelearning algorithms for predicting printing results and trace width are compared and discussed. The capabilities, applications and limitations of the presented machinelearning-based modeling approach are presented.
The main goal of this project is to develop an AI Living Lab providing methods and software tools for AI trustworthiness analysis, running digital twins to simulate Digital Health solutions (Hardware and software) int...
详细信息
ISBN:
(纸本)9798350301137
The main goal of this project is to develop an AI Living Lab providing methods and software tools for AI trustworthiness analysis, running digital twins to simulate Digital Health solutions (Hardware and software) integrated with AI elements in vitro for early-stage validation experiments. In this paper, we present the motivation beyond the need of a AI Living Lab methods for researchers and companies, our idea in practice, and the scheduled roadmap. The insights of the AI Living Lab can enable researchers to understand possible problems on the quality of AI-enabled systems opening new research topics and allows companies to understand how to better address quality issues in their systems.
暂无评论