vulnerabilitydetection in software sourcecode is crucial in ensuring software security. Existing models face challenges with dataset class imbalance and long training times. To address these issues, this paper intro...
详细信息
vulnerabilitydetection in software sourcecode is crucial in ensuring software security. Existing models face challenges with dataset class imbalance and long training times. To address these issues, this paper introduces a multi-feature screening and integrated sampling model (MFISM) to enhance vulnerabilitydetection efficiency and accuracy. The key innovations include (i) utilizing abstract syntax tree (AST) representation of sourcecode to extract potential vulnerability-related features through multiple feature screening techniques;(ii) conducting analysis of variance (ANOVA) and evaluating feature selection techniques to identify representative and discriminative features;(iii) addressing class imbalance by applying an integrated over-sampling strategy to create synthetic samples from vulnerable code to expand the minority class sample size;(iv) employing outlier detection technology to filter out abnormal synthetic samples, ensuring high-quality synthesized samples. The model employs a bidirectional long short-term memory network (Bi-LSTM) to accurately identify vulnerabilities in the sourcecode. Experimental results demonstrate that MFISM improves the F1 score performance by approximately 10% compared to existing DeepBalance methods and reduces the training time to 2-3 h. These results confirm the effectiveness and superiority of MFISM in source code vulnerability detection tasks.
Aiming at the fact that the existing source code vulnerability detection methods did not explicitly maintain the semantic information related to the vulnerability in the sourcecode, which made it difficult for the vu...
详细信息
ISBN:
(纸本)9781665494250
Aiming at the fact that the existing source code vulnerability detection methods did not explicitly maintain the semantic information related to the vulnerability in the sourcecode, which made it difficult for the vulnerabilitydetection model to extract the vulnerability sentence features and had a high detection false positive rate, a source code vulnerability detection method based on the vulnerability dependency graph is proposed. Firstly, the candidate vulnerability sentences of the function were matched, and the vulnerability dependency representation graph corresponding to the function was generated by analyzing the multi-layer control dependencies and data dependencies of the candidate vulnerability sentences. Secondly, abstracted the function name and variable name of the code sentences node and generated the initial representation vector of the code sentence nodes in the vulnerability dependency representation graph. Finally, the source code vulnerability detection model based on the heterogeneous graph transformer was used to learn the context information of the code sentence nodes in the vulnerability dependency representation graph. In this paper, the proposed method was verified on three datasets. The experimental results show that the proposed method have better performance in source code vulnerability detection, and the recall rate is increased by 1.50%similar to 22.27%, and the F1 score is increased by 1.86%similar to 16.69%, which is better than the existing methods.
To enhance the effectiveness of vulnerabilitydetection in software developed using C and C++ programming languages, our study introduces a novel correlation calculation method for analyzing and evaluating code Proper...
详细信息
To enhance the effectiveness of vulnerabilitydetection in software developed using C and C++ programming languages, our study introduces a novel correlation calculation method for analyzing and evaluating code Property Graphs (CPG). The intelligent computation method proposed in this study comprises three key stages. In the first stage, we present a method for extracting features from the CPG sourcecode. To accomplish this, we integrate three distinct data exploration methods: employing Graph Convolutional Neural (GCN) to extract node features from CPG, utilizing Convolutional Neural Network (CNN) to extract edge features from CPG, and finally employing the Doc2vec natural language processing algorithm to extract sourcecode from CPG nodes. The second stage involves proposing a method for synthesizing CPG sourcecode features. Building on the features acquired in the first stage, our paper introduces a synthesis and construction method to generate feature vectors for the sourcecode. The final stage, stage three, executes the detection of sourcecode vulnerabilities. The experimental results demonstrate that our proposed model in this study achieves higher efficiency compared to other studies, with an improvement ranging from 3% to 4%.
The growing complexity and volume of modern software have led to an increase in sourcecode vulnerabilities, posing significant security risks. In response, deep learning-based automated sourcecodevulnerability dete...
详细信息
The growing complexity and volume of modern software have led to an increase in sourcecode vulnerabilities, posing significant security risks. In response, deep learning-based automated source code vulnerability detection methods, particularly those utilizing sourcecode similarity analysis, have recently emerged as promising solutions. However, existing similarity-based source code vulnerability detection methods frequently fail to fully utilize information from the hierarchical structure of sourcecode and are often computationally expensive, limiting their practicality in real-world scenarios. In this paper, we introduce XTransformer, a novel deep learning-based sourcecodevulnerability detector tailored for comparing target sourcecode against archived vulnerable codes across various levels of the sourcecode's hierarchical structure by leveraging extra cross-attention imposed on the transformer architecture. Additionally, we propose a specialized training strategy based on supervised contrastive learning to improve XTransformer's ability to effectively learn and differentiate between vulnerable and non-vulnerable sourcecodes. Comprehensive experiments demonstrate that XTransformer outperforms current state-of-the-art methods across different datasets and code lengths while significantly reducing the inference time compared to other similarity-based methods that utilize hierarchical information from sourcecode.
The rapid evolution of software development, propelled by competitive demands and the continuous integration of new features, frequently leads to inadvertent security oversights. Traditional security practices, often ...
详细信息
The rapid evolution of software development, propelled by competitive demands and the continuous integration of new features, frequently leads to inadvertent security oversights. Traditional security practices, often reactive in nature, primarily focus on identifying known vulnerabilities, creating a significant shortfall in detecting emergent, zero-day threats. This paper introduces code-SMASH, a novel deep learning-based sourcecodevulnerability detector that utilizes a Siamese neural network with a hierarchical architecture integrating BiGRU and attention mechanisms. Our experiments using real-world datasets, specifically the Chromium and Debian datasets, demonstrate code-SMASH's superiority over existing methods. It achieves significant improvements in detection performance across all key metrics, including accuracy, precision, recall, and F1-score, with average improvements of approximately 8.3%, 11.6%, 27.75%, and 17.7%, respectively, compared to the best-performing existing methods in our experiments. Moreover, code-SMASH shows its superior capability in handling complex and lengthy code sequences, with performance improvements for long-length code (60 to 80 lines) in F1 scores of 4.53 percentage points on the Chromium dataset and 5.62 percentage points on the Debian dataset compared to the second-best model's performance. We believe our research makes a significant contribution to the field of automated vulnerabilitydetection by providing a high-precision solution to the growing challenges in software security. Furthermore, based on our findings, we anticipate that future research could enhance code-SMASH by expanding its generalizability to various programming languages and reducing computational demands to improve efficiency.
It is essential to detect potential vulnerabilities in software to ensure its safety. As software systems become more complex, traditional static vulnerabilitydetection methods perform poorly. Currently, deep learnin...
详细信息
It is essential to detect potential vulnerabilities in software to ensure its safety. As software systems become more complex, traditional static vulnerabilitydetection methods perform poorly. Currently, deep learning-based vulnerabilitydetection models only extract sourcecodevulnerability features using sequences or graphs. Sequential neural networks ignore structural information in the code, such as control flow diagrams and data flow diagrams. Additionally, graph neural networks cannot accurately extract features due to the lack of effective methods for extracting nodes' features and aggregating global information. To address the above issue, we propose a vulnerabilitydetection algorithm based on residual graph attention networks for sourcecode imbalance (RGAN). Firstly, a local feature extraction module (PE-BL-A module) is designed. Using the sequence neural network, the module extracts various useful features, including node features in a control flow diagram based on local semantic features. Secondly, we present the Residual Graph Attention Network module (RGAT). To learn and update node features along the control flow direction, the module uses a graph attention network with residual connections. In this module, a mean biaffine attention pooling mechanism is proposed that can extract total graph vulnerability features more effectively. Thirdly, a dynamic cross-entropy loss function is designed. Using this function, it can handle sample imbalances during training. Finally, experiments conducted on several benchmark datasets demonstrate that the proposed model achieves state-of-the-art results.
The rapid expansion of smart devices leads to the increasing demand for vulnerabilitydetection in the cyber security field. Writing secure sourcecodes is crucial to protect applications and software. Recent vulnerab...
详细信息
The rapid expansion of smart devices leads to the increasing demand for vulnerabilitydetection in the cyber security field. Writing secure sourcecodes is crucial to protect applications and software. Recent vulnerabilitydetection methods are mainly using machine learning and deep learning. However, there are still some challenges, how to learn accurate sourcecode semantic embedding at the function level, how to effectively perform vulnerabilitydetection using the learned semantic embedding of sourcecode and how to solve the overfitting problem of learning-based models. In this paper, we consider codes as various graphs with node features and propose a tensor-based gated graph neural network called TensorGNN to produce code embedding for function-level vulnerabilitydetection. First, we propose a high-dimensional tensor for combining different code graph representations. Second, inspired by the work of tensor technology, we propose the TensorGNN model to produce accurate code representations using the graph tensor. We evaluate our model on 7 C and C++ large open-sourcecode corpus (e.g. SARD&NVD, Debian, SATE IV, FFmpeg, libpng&LibTiff, Wireshark and Github datasets), which contains 13 types of vulnerabilities. Our TensorGNN model improves on existing state-of-the-art works by 10%-30% on average in terms of vulnerabilitydetection accuracy and F1, while our TensorGNN model needs less training time and model parameters. Specifically, compared with other existing works, our model reduces 25-47 times of the number of parameters and decreases 3-10 times of training time. Results of evaluations show that TensorGNN has better performance while using fewer training parameters and less training time. Our paper proposes a high-dimensional tensor to integrate heterogeneous code graphs and a tensor-based gated graph neural network to effectively learn accurate code embedding. Evaluations show that TensorGNN is higher than existing state-of-the-art works by 10%-30% of vuln
Detecting vulnerabilities in sourcecode using graph neural networks (GNN) has gained significant attention in recent years. However, the detection performance of these approaches relies highly on the graph structure,...
详细信息
Detecting vulnerabilities in sourcecode using graph neural networks (GNN) has gained significant attention in recent years. However, the detection performance of these approaches relies highly on the graph structure, and constructing meaningful graphs is expensive. Moreover, they often operate at a coarse level of granularity (such as function-level), which limits their applicability to other scripting languages like Python and their effectiveness in identifying vulnerabilities. To address these limitations, we propose DetectVul, a new approach that accurately detects vulnerable patterns in Python sourcecode at the statement level. DetectVul applies self-attention to directly learn patterns and interactions between statements in a raw Python function;thus, it eliminates the complicated graph extraction process without sacrificing model performance. In addition, the information about each type of statement is also leveraged to enhance the model's detection accuracy. In our experiments, we used two datasets, CVEFixes and Vudenc, with 211,317 Python statements in 21,571 functions from real-world projects on GitHub, covering seven vulnerability types. Our experiments show that DetectVul outperforms GNN-based models using control flow graphs, achieving the best F1 score of 74.47%, which is 25.45% and 18.05% higher than the best GCN and GAT models, respectively.
The detection of software vulnerabilities written in C and C++languages takes a lot of attention and interest *** paper proposes a new framework called DrCSE to improve software vulnerability *** uses an intelligent c...
详细信息
The detection of software vulnerabilities written in C and C++languages takes a lot of attention and interest *** paper proposes a new framework called DrCSE to improve software vulnerability *** uses an intelligent computation technique based on the combination of two methods:Rebalancing data and representation learning to analyze and evaluate the code property graph(CPG)of the sourcecode for detecting abnormal behavior of software *** do that,DrCSE performs a combination of 3 main processing techniques:(i)building the sourcecode feature profiles,(ii)rebalancing data,and(iii)contrastive *** which,the method(i)extracts the sourcecode’s features based on the vertices and edges of the *** method of rebalancing data has the function of supporting the training process by balancing the experimental ***,contrastive learning techniques learn the important features of the sourcecode by finding and pulling similar ones together while pushing the outliers *** experiment part of this paper demonstrates the superiority of the DrCSE Framework for detecting sourcecode security vulnerabilities using the Verum *** a result,the method proposed in the article has brought a pretty good performance in all metrics,especially the Precision and Recall scores of 39.35%and 69.07%,respectively,proving the efficiency of the DrCSE *** performs better than other approaches,with a 5%boost in Precision and a 5%boost in ***,this is considered the best research result for the software vulnerabilitydetection problem using the Verum dataset according to our survey to date.
codevulnerabilitydetection has long been a critical issue due to its potential threat to computer systems. It is imperative to detect sourcecode vulnerabilities in software and remediate them to avoid cyber at-tack...
详细信息
codevulnerabilitydetection has long been a critical issue due to its potential threat to computer systems. It is imperative to detect sourcecode vulnerabilities in software and remediate them to avoid cyber at-tacks. To automate detection and reduce labor costs, many deep learning-based methods have been pro-posed. However, these approaches have been found to be either ineffective in detecting multiple classes of vulnerabilities or limited by treating original sourcecode as a natural language sequence without ex-ploiting the structural information of code. In this paper, we propose VDoTR, a model that leverages a new tensor representation of comprehensive code graphs, including AST, CFG, DFG, and NCS, to detect multiple types of vulnerabilities. Firstly, a tensor structure is introduced to represent the structured in-formation of code, which deeply captures code features. Secondly, a new Circle Gated Graph Neural Net -work (CircleGGNN) is designed based on tensor for hidden state embedding of nodes. CircleGGNN can perform heterogeneous graph information fusion more directly and effectively. Lastly, a 1-D convolution-based output layer is applied to hidden embedding features for classification. The experimental results demonstrate that the detection performance of VDoTR is superior to other approaches with higher ac-curacy, precision, recall, and F1-measure on multiple datasets for vulnerabilitydetection. Moreover, we illustrate which code graph contributes the most to the performance of VDoTR and which code graph is more sensitive to represent vulnerability features for different types of vulnerabilities through ablation experiments..& COPY;2023 Elsevier Ltd. All rights reserved.
暂无评论