Software benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complic...
详细信息
Software benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark's stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach's effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions;and that the combination of all features of the called sourcecode is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the sourcecode, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testin
Detecting sourcecode vulnerabilities is an essential issue today. In this paper, to improve the efficiency of detecting vulnerabilities in software written in C/C++, we propose to use a combination of Deep Graph Conv...
详细信息
Detecting sourcecode vulnerabilities is an essential issue today. In this paper, to improve the efficiency of detecting vulnerabilities in software written in C/C++, we propose to use a combination of Deep Graph Convolutional Neural Network (DGCNN) and code property graph (CPG). Specifically, 3 main proposed phases in the research method include: phase 1: building feature profiles of sourcecode. At this step, we suggest using analysis techniques such as Word2vec, one hot encoding to standardize and analyze the sourcecode;phase 2: extracting features of sourcecode based on feature profiles. Accordingly, at this phase, we propose to use Deep Graph Convolutional Neural Network (DGCNN) model to analyze and extract features of the sourcecode;phase 3: classifying sourcecode based on the features extracted in phase 2 to find normal sourcecode and sourcecode containing security vulnerabilities. Some scenarios for comparing and evaluating the proposed method in this study compared with other approaches we have taken show the superior effectiveness of our approach. Besides, this result proves that our method in this paper is not only correct and reasonable, but it also opens up a new approach to the task of detecting sourcecode vulnerabilities.
暂无评论