In the domain of software testing, the generation of test cases is a critical process for detecting system errors and bugs. However, automated test case generation for smart contracts often encounters challenges relat...
详细信息
In the domain of software testing, the generation of test cases is a critical process for detecting system errors and bugs. However, automated test case generation for smart contracts often encounters challenges related to automation, vulnerability diversity, and coverage. This paper presents a novel method, the self-adaptive learning Genetic Algorithm (self-adaptive learning GA), designed to address these issues. Our research methodology incorporates several construction models, namely the control Dependence graph (CDG), control flow graph (CFG), and Application Binary Interface (ABI). Initially, the ABI model provides essential information for generating and executing test cases. The CFG model subsequently visualizes potential execution paths through the functions of smart contracts. Ultimately, the CDG model identifies potential vulnerabilities in smart contracts. Using these models, our method enhances automatic test case generation in smart contracts by improving coverage and reducing execution time. We selected a variety of smart contracts from the Decentralized Finance (DeFi) ecosystem for data collection and comparative analysis. The experimental results show superior performance rates, with an average code coverage rate of 98.1%, a total of 3500 vulnerabilities detected, a vulnerability detection rate of 98.7%, a false positive rate of 1.3%, a recall of 98.2%, precision of 98.8%, a path uniqueness rate of 96.4%, false negative rate of 3.5%, an execution time of 25 s, and test case generation time of 16 s. In conclusion, our proposed approach demonstrates a significant improvement over existing methods for test case generation by providing a promising solution for the robustness of smart contracts and security enhancement in the DeFi ecosystem.
Automated code summarization tools allow generating descriptions for code snippets in natural language, which benefits software development and maintenance. Recent studies demonstrate that the quality of generated sum...
详细信息
Automated code summarization tools allow generating descriptions for code snippets in natural language, which benefits software development and maintenance. Recent studies demonstrate that the quality of generated summaries can be improved by using additional code representations beyond token sequences. The majority of contemporary approaches mainly focus on extracting code syntactic and structural information from abstract syntax trees (ASTs). However, from the view of macro-structures, it is challenging to identify and capture semantically meaningful features due to fine-grained syntactic nodes involved in ASTs. To fill this gap, we investigate how to learn more code semantics and controlflow features from the perspective of code statements. Accordingly, we propose a novel model entitled CoSS for code summarization. CoSS adopts a Transformer-based encoder and a graph attention network-based encoder to capture token-level and statement-level semantics from code token sequence and control flow graph, respectively. Then, after receiving two-level embeddings from encoders, a joint decoder with a multi-head attention mechanism predicts output sequences verbatim. Performance evaluations on Java, Python, and Solidity datasets validate that CoSS outperforms nine state-of-the-art (SOTA) neural code summarization models in effectiveness and is competitive in execution efficiency. Further, the ablation study reveals the contribution of each model component.
Seeking product's quality is essential nowadays. One of the many quality aspects in software development is the source code complexity. Not taking care for the complexity during the development can result in unexp...
详细信息
Seeking product's quality is essential nowadays. One of the many quality aspects in software development is the source code complexity. Not taking care for the complexity during the development can result in unexpected cost, caused by the difficulty on the source code understanding. The goal of this paper is to introduce an initial approach to identify unnecessary complexity in source code. Besides identifying, also show to its user how to properly rewrite the source code without the unnecessary complexity. The approach is based on the static analysis of the source code control flow graph. Once the unnecessary complexity is identified, the graph is refactored in order to allow the user to understand the improvement on the source code. It was implemented in a software tool in order to prove its concept. A performance evaluation was performed, resulting in a high accuracy. Two experimental studies were also performed to assess its feasibility when used by real users. The evidences provided by these studies suggests that the approach support the unnecessary complexity removal.
Cross site scripting (XSS) vulnerability is mainly caused by the failure of web applications in sanitising user inputs embedded in web pages. Even though state-of-the-art defensive coding methods and vulnerability det...
详细信息
Cross site scripting (XSS) vulnerability is mainly caused by the failure of web applications in sanitising user inputs embedded in web pages. Even though state-of-the-art defensive coding methods and vulnerability detection methods are often used by developers and security auditors, XSS flaws still remain in many applications because of (i) the difficulty of adopting these methods, (ii) the inadequate implementation of these methods, and/or (iii) the lack of understanding of XSS problem. To address this issue, this study proposes a code-auditing approach that recovers the defence model implemented in program source code and suggests guidelines for checking the adequacy of recovered model against XSS attacks. On the basis of the possible implementation patterns of defensive coding methods, our approach extracts all such defences implemented for securing each potentially vulnerable HTML output. It then introduces a variant of control flow graph, called tainted-information flowgraph, as a model to audit the adequacy of XSS defence artefacts. The authors evaluated the proposed method based on the experiments on seven Java-based web applications. In the auditing experiments, our approach was effective in recovering all the XSS defence features implemented in the test subjects. The extracted artefacts were also shown to be useful for filtering the false-positive cases reported by a vulnerability detection method and helpful in fixing the vulnerable code sections.
A simple, inexpensive and time/space efficient signature technique for process monitoring is presented. In this technique, a known signature function is applied to the instruction stream at compilation phase and when ...
详细信息
A simple, inexpensive and time/space efficient signature technique for process monitoring is presented. In this technique, a known signature function is applied to the instruction stream at compilation phase and when the accumulated signature forms an m-out-of-n code, the corresponding instructions are tagged. Error checking is done at run-time by monitoring the signatures accumulated at the tagged locations to determine whether they form m-out-of-n codes. This approach of signature checking does not require the embedding of reference signatures at compilation, thereby leading to savings in memory as well as in execution time. The m-out-of-n code approach offers high error coverage and controllable latency. The results of the experiments conducted to verify the controllability of the latency are discussed. One of the distinguishing features of the proposed scheme is the elimination of reference signatures, which are the main source of memory and time overhead in the existing techniques.
Vulnerability detection is essential to protect software systems. Various approaches based on deep learning have been proposed to learn the pattern of vulnerabilities and identify them. Although these approaches have ...
详细信息
Vulnerability detection is essential to protect software systems. Various approaches based on deep learning have been proposed to learn the pattern of vulnerabilities and identify them. Although these approaches have shown vast potential in this task, they still suffer from the following issues: (1) It is difficult for them to distinguish vulnerability-related information from a large amount of irrelevant information, which hinders their effectiveness in capturing vulnerability features. (2) They are less effective in handling long code because many neural models would limit the input length, which hinders their ability to represent the long vulnerable code snippets. To mitigate these two issues, in this work, we proposed to decompose the syntax-based control flow graph (CFG) of the code snippet into multiple execution paths to detect the vulnerability. Specifically, given a code snippet, we first build its CFG based on its Abstract Syntax Tree (AST), refer to such CFG as syntax-based CFG, and decompose the CFG into multiple paths from an entry node to its exit node. Next, we adopt a pre-trained code model and a convolutional neural network to learn the path representations with intra- and inter-path attention. The feature vectors of the paths are combined as the representation of the code snippet and fed into the classifier to detect the vulnerability. Decomposing the code snippet into multiple paths can filter out some redundant information unrelated to the vulnerability and help the model focus on the vulnerability features. Besides, since the decomposed paths are usually shorter than the code snippet, the information located in the tail of the long code is more likely to be processed and learned. To evaluate the effectiveness of our model, we build a dataset with over 231 k code snippets, in which there are 24 k vulnerabilities. Experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines by at least 22.30%, 42.92%, and 32.5
We propose an efficient method for computing dynamic slices of programs. Our method is based on construction of data dependence edges of program dependence graph at run-time. We introduce the concept of compact dynami...
详细信息
We propose an efficient method for computing dynamic slices of programs. Our method is based on construction of data dependence edges of program dependence graph at run-time. We introduce the concept of compact dynamic dependence graphs (CDDGs) of programs. We show computation of dynamic slices using CDDGs to be more efficient than existing methods. (C) 2002 Elsevier Science B.V. All rights reserved.
The rich semantic information in control flow graphs (CFGs) of executable programs has made graph Neural Networks (GNNs) a key focus for malware detection. However, existing CFG-based detection techniques face limitat...
详细信息
The rich semantic information in control flow graphs (CFGs) of executable programs has made graph Neural Networks (GNNs) a key focus for malware detection. However, existing CFG-based detection techniques face limitations in node feature extraction, such as information loss, neglect of execution sequence information, and redundancy in representation vectors. These limitations compromise the balance between high efficiency and precision when training detectors. Addressing this, we introduce an innovative Malware CFG Node Embedding (MalGNE) method. This approach utilizes a novel instruction encoding rule to address the Out-Of-Vocabulary(OOV) problem, generates high-quality initial vectors. Then, it employs aggregation layer and sequence layer to extract node aggregation feature and execution sequence feature, in conjunction with GNNs to develop a pre-trained node embedding model. The model maps the semantic information of node assembly instruction sequences into a compact, low-dimensional continuous space, ensuring high-quality feature extraction, and enhancing the performance and efficiency of the detector. We trained the MalGNE model using the BIG 2015 dataset and validated MalGNE-enhanced detector on the SOREL-20M and BODMAS datasets. MalGNE-enhanced detector demonstrates outstanding performance and efficiency in low-dimensional spaces, especially when the dimensionality of the node feature vector is reduced to 16. MalGNE-enhanced detector not only maintains a high detection accuracy of 95.49%. sacrificing only about 1.7% of accuracy to save approximately 73% of training time compared to 128 dimensions.
As the number of available multiprocessors increases, so does the importance of providing software support for these systems, including parallel compilers. Data flow analysis, an important component of software tools,...
详细信息
As the number of available multiprocessors increases, so does the importance of providing software support for these systems, including parallel compilers. Data flow analysis, an important component of software tools, may be computed many times during the compilation of a program, especially when compiling for a multiprocessor. Although converting a sequential data flow algorithm to a parallel algorithm can present some opportunities for computing data flow in parallel, more parallelism can be exposed by the development of new parallel data flow algorithms. In this paper, we present a technique that computes rapid data flow problems in parallel and thus is applicable for commonly used classical data flow problems, including reaching definitions, reachable uses, available expressions, and very busy expressions. Unlike previous techniques, our technique exploits the inherent parallelism in the data flow computation that occurs across independent paths, within linear paths, and in paths through loops of a control flow graph. The technique first changes cyclic structures in a control flow graph to acyclic structures and then builds a combining directed acyclic graph (DAG) that represents the paths through the control flow graph needed to compute data flow. Data flow is then computed using two passes over the DAG by computing the data flow for the nodes on each level of the DAG in parallel. We also present experimental results comparing the performance of our algorithm with a sequential algorithm and a parallelized sequential algorithm.
Hardware Trojan detection is a very difficult challenge. However, the combination of symbolic execution and metamorphic testing is useful for detecting hardware Trojans in Verilog code. In this paper, symbolic executi...
详细信息
Hardware Trojan detection is a very difficult challenge. However, the combination of symbolic execution and metamorphic testing is useful for detecting hardware Trojans in Verilog code. In this paper, symbolic execution and metamorphic testing were combined to detect internal conditionally triggered hardware Trojans in the register-transfer level design. First, control flow graphs of Verilog code were generated. Next, parallel symbolic execution and satisfiability modulo theories solver generated test patterns. Finally, metamorphic testing detected the hardware Trojans. The work used Trust-Hub benchmarks in experiments. (C) 2018 Elsevier Ltd. All rights reserved.
暂无评论