software vulnerability detection (SVD) aims to identify potential security weaknesses in software. SVD systems have been rapidly evolving from those being based on testing, static analysis, and dynamic analysis to tho...
详细信息
ISBN:
(纸本)9798350301137
software vulnerability detection (SVD) aims to identify potential security weaknesses in software. SVD systems have been rapidly evolving from those being based on testing, static analysis, and dynamic analysis to those based on machinelearning (ML). Many ML-based approaches have been proposed, but challenges remain: training and testing datasets contain duplicates, and building customized end-to-end pipelines for SVD is time-consuming. We present Tenet, a modular framework for building end-to-end, customizable, reusable, and automated pipelines through a plugin-based architecture that supports SVD for several deep learning (DL) and basic ML models. We demonstrate the applicability of Tenet by building practical pipelines performing SVD on real-world vulnerabilities.
machinelearning methods have achieved great success in many softwareengineering tasks. However, as a data-driven paradigm, how would the data quality impact the effectiveness of these methods remains largely unexplo...
详细信息
ISBN:
(纸本)9781665457019
machinelearning methods have achieved great success in many softwareengineering tasks. However, as a data-driven paradigm, how would the data quality impact the effectiveness of these methods remains largely unexplored. In this paper, we explore this problem under the context of just-in-time obsolete comment detection. Specifically, we first conduct data cleaning on the existing benchmark dataset, and empirically observe that with only 0.22% label corrections and even 15.0% fewer data, the existing obsolete comment detection approaches can achieve up to 10.7% relative accuracy improvement. To further mitigate the data quality issues, we propose an adversarial learning framework to simultaneously estimate the data quality and make the final predictions. Experimental evaluations show that this adversarial learning framework can further improve the relative accuracy by up to 18.1% compared to the state-of-the-art method. Although our current results are from the obsolete comment detection problem, we believe that the proposed two-phase solution, which handles the data quality issues through both the data aspect and the algorithm aspect, is also generalizable and applicable to other machinelearning based softwareengineering tasks.
Building and maintaining production-grade ML-enabled components is a complex endeavor that goes beyond the current approach of academic education, focused on the optimization of ML model performance in the lab. In thi...
详细信息
ISBN:
(纸本)9798350322590
Building and maintaining production-grade ML-enabled components is a complex endeavor that goes beyond the current approach of academic education, focused on the optimization of ML model performance in the lab. In this paper, we present a project-based learning approach to teaching MLOps, focused on the demonstration and experience with emerging practices and tools to automatize the construction of ML-enabled components. We examine the design of a course based on this approach, including laboratory sessions that cover the end-to-end ML component life cycle, from model building to production deployment. Moreover, we report on preliminary results from the first edition of the course. During the present year, an updated version of the same course is being delivered in two independent universities;the related learning outcomes will be evaluated to analyze the effectiveness of project-based learning for this specific subject.
Automated identification of self-admitted technical debt (SATD) has been crucial for advancements in managing such debt. However, state-of-the-arts studies often overlook chronological factors, leading to experiments ...
详细信息
ISBN:
(纸本)9798400706585
Automated identification of self-admitted technical debt (SATD) has been crucial for advancements in managing such debt. However, state-of-the-arts studies often overlook chronological factors, leading to experiments that do not faithfully replicate the conditions developers face in their daily routines. This study initiates a chronological analysis of SATD identification through machinelearning models, emphasizing the significance of temporal factors in automated SATD detection. The research is in its preliminary phase, divided into two stages: evaluating model performance trained on historical data and tested in prospective contexts, and examining model generalization across various projects. Preliminary results reveal that the chronological factor can positively or negatively influence model performance and that some models are not sufficiently general when trained and tested on different projects.
There are a large number of disabled people in the world whose lives are seriously affected by the lack of upper limbs. Research on related prostheses is crucial to making these individuals’ lives as convenient as th...
详细信息
The software development life cycle (SDLC) is incomplete without the software testing phase. It's the act of checking that a piece of software really does what it's supposed to. Test case creation is one of th...
详细信息
The software development life cycle (SDLC) is incomplete without the software testing phase. It's the act of checking that a piece of software really does what it's supposed to. Test case creation is one of the testing tasks that have a major impact on the quality and speed with which the process is completed. Research into the automated production of test cases has been extensive because of the time and energy it can save over the human method of creating test cases. While most of the recommended methods are based on UML models, other publications have given a specifications-based method of creating test cases. This literature analysis focuses on automated test case generation strategies based on use case specifications and the techniques used to verify them. The analysis also highlights the ways in which the methods diverge when used to solving certain pressing problems in software testing.
In the domain of the software fault prediction numerous methods have been introduced and implemented using data mining techniques and machinelearning models. Nevertheless, initial and early fault prediction is big ch...
详细信息
The major purpose of the project is to investigate a range of machinelearning techniques in order to improve the accuracy of projecting costs and efforts for software development projects. Numerous strategies in soft...
详细信息
Recently, several studies have proposed frameworks for Quantum Federated learning (QFL). For instance, the Google TensorFlow Quantum (TFQ) and TensorFlow Federated (TFF) libraries have been deployed for realizing QFL....
详细信息
ISBN:
(纸本)9798400707551
Recently, several studies have proposed frameworks for Quantum Federated learning (QFL). For instance, the Google TensorFlow Quantum (TFQ) and TensorFlow Federated (TFF) libraries have been deployed for realizing QFL. However, developers, in the main, are not as yet familiar with Quantum Computing (QC) libraries and frameworks. A Domain-Specific Modeling Language (DSML) that provides an abstraction layer over the underlying QC and Federated learning (FL) libraries would be beneficial. This could enable practitioners to carry out software development and data science tasks efficiently while deploying the state of the art in Quantum machinelearning (QML). In this position paper, we propose extending existing domain-specific Model-Driven engineering (MDE) tools for machinelearning (ML) enabled systems, such as MontiAnna, ML-Quadrat, and GreyCat, to support QFL.
Advancements in mobile technology makes it easier to communicate in real time, but at the cost of having a wider potential attack area for phishing. While there has been research in the field related to Email and SMS,...
详细信息
ISBN:
(纸本)9798350328837;9798350328844
Advancements in mobile technology makes it easier to communicate in real time, but at the cost of having a wider potential attack area for phishing. While there has been research in the field related to Email and SMS, Instant Messages lags behind. The widespread usage of instant messengers by individuals of all ages further motivates the addition of software security features in this context. This research aims to detect phishing in mobile instant messages by analysing the language of the message with the help of Natural Language Processing to detect keywords pointing towards phishing. We built the machinelearning models using 3 different methods for feature extraction and 3 classification algorithms. Our tests showed that balancing the data with random oversampling increased the classifiers' performance, which were able to achieve an accuracy up to 99.2%.
暂无评论