software vulnerabilities inevitably arise during software development and may leave behind huge security risks. In order to detect and mitigate vulnerabilities before they can be exploited, various fine-grained deep l...
详细信息
ISBN:
(纸本)9798400707056
software vulnerabilities inevitably arise during software development and may leave behind huge security risks. In order to detect and mitigate vulnerabilities before they can be exploited, various fine-grained deep learning (DL)-based vulnerablity detection (VD) approaches have been proposed to locate vulnerable statements, among which the Transformer-based methods have shown promising performances. However, existing Transformer-based statement-level approaches still suffer from a crucial limitation: they ignore the intrinsic data/control dependency relations between the statements. In this work, we propose a novel Transformer-based model MatsVD, which aims to address the above challenge from two aspects: Firstly, inspired by the hierarchical structure of code (i.e., tokens, statements, and functions), MatsVD comprises three different Transformer-based layers (i.e., statement embedding layer, statement representation layer, and function representation layer) to gradually aggregate the basic code tokens into meaningful statement/function representations;Secondly, to further exploit the data/control dependencies between statements, we replace the original attention mechanism of the Transformer with a novel dependency-based attention by masking irrelevant attention scores according to the program dependency graph. We comprehensively evaluate MatsVD on the widely used C/C++ vulnerability dataset Big-Vul. The results show that MatsVD significantly outperforms 6 other statement-level methods on both binary classification and ranking metrics. In particular, MatsVD obtains an F1 score of 86% and a Top-1 Accuracy of 93% on statement-le, which improves by respectively 22.97% and 7.76% compared to the state-of-the-art method VELVET.
software vulnerability detection (SVD) aims to identify potential security weaknesses in software. SVD systems have been rapidly evolving from those being based on testing, static analysis, and dynamic analysis to tho...
详细信息
ISBN:
(纸本)9798350301137
software vulnerability detection (SVD) aims to identify potential security weaknesses in software. SVD systems have been rapidly evolving from those being based on testing, static analysis, and dynamic analysis to those based on machine learning (ML). Many ML-based approaches have been proposed, but challenges remain: training and testing datasets contain duplicates, and building customized end-to-end pipelines for SVD is time-consuming. We present Tenet, a modular framework for building end-to-end, customizable, reusable, and automated pipelines through a plugin-based architecture that supports SVD for several deep learning (DL) and basic ML models. We demonstrate the applicability of Tenet by building practical pipelines performing SVD on real-world vulnerabilities.
Vulnerabilities in the source code of the software are critical issues in the realm of software engineering. Coping with vulnerabilities in software source code is becoming more challenging due to several aspects such...
详细信息
Vulnerabilities in the source code of the software are critical issues in the realm of software engineering. Coping with vulnerabilities in software source code is becoming more challenging due to several aspects such as complexity and volume. Deep learning has gained popularity throughout the years as a means of addressing such issues. This paper proposes an evaluation of vulnerabilitydetection performance on source code representations and evaluates how machine learning (ML) strategies can improve them. The structure of our experiment consists of three deep neural networks (DNNs) in conjunction with five different source code representations: abstract syntax trees (ASTs), code gadgets (CGs), semantics-based vulnerability candidates (SeVCs), lexed code representations (LCRs), and composite code representations (CCRs). Experimental results show that employing different ML strategies in conjunction with the base model structure influences the performance results to a varying degree. However, ML-based techniques suffer from poor performance on class imbalance handling and dimensionality reduction when used in conjunction with source code representations.
Context: Desirable characteristics in vulnerability -detection (VD) systems (VDSs) include both good detection capability (high accuracy, low false positive rate, low false negative rate, etc.) and low time overheads....
详细信息
Context: Desirable characteristics in vulnerability -detection (VD) systems (VDSs) include both good detection capability (high accuracy, low false positive rate, low false negative rate, etc.) and low time overheads. The widely used VDSs based on models such as Recurrent Neural Networks (RNNs) have some problems, such as low time efficiency, failing to learn the vulnerability features better, and insufficient amounts of vulnerability features. Therefore, it is very important to construct an automatic detection model with high detection accuracy. Objective: This paper reports on training based on the source code to analyze and learn from the code's patterns and structures by deep -learning techniques to generate an efficient VD model that does not require manual feature design. Method: We propose a software VD model based on multi -feature fusion and deep neural networks called AIdetectorX-SP. It first uses a Temporal Convolutional Network (TCN) and adds a Self -attention Mechanism (SaM) to the TCN to build a model for extracting vulnerability logic features, then transforms the source code into an image input to a Convolutional Neural Network (CNN) to extract structural and semantic information. Finally, we use feature -fusion technology to design and implement an improved deep -learning -based VDS, called AIdetectorX Sequence with Picturization (AIdetectorX-SP). Results: We report on experiments conducted using publicly -available and widely -used datasets to evaluate the effectiveness of AIdetectorX-SP, with results indicating that AIdetectorX-SP is an effective VDS;that the combination of TCN and SaM can effectively extract vulnerability logic features;and that the pictorial code can extract code structure features, which can further improve the VD capability. Conclusion: In this paper, we propose a novel detection model for softwarevulnerability based on TCNs, SaM, and software picturization. The proposed model solves some shortcomings and limitations of exist
Automated code vulnerabilitydetection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven effective in vulnerability det...
详细信息
ISBN:
(纸本)9798350329964
Automated code vulnerabilitydetection has gained increasing attention in recent years. The deep learning (DL)-based methods, which implicitly learn vulnerable code patterns, have proven effective in vulnerabilitydetection. The performance of DL-based methods usually relies on the quantity and quality of labeled data. However, the current labeled data are generally automatically collected, such as crawled from human-generated commits, making it hard to ensure the quality of the labels. Prior studies have demonstrated that the non-vulnerable code (i.e., negative labels) tends to be unreliable in commonly-used datasets, while vulnerable code (i.e., positive labels) is more determined. Considering the large numbers of unlabeled data in practice, it is necessary and worth exploring to leverage the positive data and large numbers of unlabeled data for more accurate vulnerability *** this paper, we focus on the Positive and Unlabeled (PU) learning problem for vulnerabilitydetection and propose a novel model named PILOT, i.e., PositIve and unlabeled Learning mOdel for vulnerabilitydetection. PILOT only learns from positive and unlabeled data for vulnerabilitydetection. It mainly contains two modules: (1) A distance-aware label selection module, aiming at generating pseudo-labels for selected unlabeled data, which involves the inter-class distance prototype and progressive fine-tuning; (2) A mixed-supervision representation learning module to further alleviate the influence of noise and enhance the discrimination of representations. Extensive experiments in vulnerabilitydetection are conducted to evaluate the effectiveness of PILOT based on real-world vulnerability datasets. The experimental results show that PILOT outperforms the popular weakly supervised methods by 2.78%-18.93% in the PU learning setting. Compared with the state-of-the-art methods, PILOT also improves the performance of 1.34%-12.46% in F1 score metrics in the supervised setting. In addition,
暂无评论