Finding and fixing bugs in programs is perhaps one of the most difficult, yet most important, tasks in software maintenance. This is why in the last decades, a lot of work has been done on this topic, most of which is...
详细信息
Finding and fixing bugs in programs is perhaps one of the most difficult, yet most important, tasks in software maintenance. This is why in the last decades, a lot of work has been done on this topic, most of which is based on machine learning methods. Studies on bug prediction can be found for almost all programming languages. The solutions presented generally try to predict bugs based on information that can be easily extracted from the source code, rather than more expensive solutions that require a deeper understanding of the program. Another feature of these solutions is that they usually try to predict faults at a high level (module/file/class), which is useful, but locating the bug itself is still a difficult task. This work presents a solution that attempts to predict bugs at the method level, while also tracking the dependencies in the program using an efficient algorithm, resulting in an approach that can predict bugs more accurately. The practical measurements show that the defined approach really outperforms predictions based on traditional metrics in most cases, and with proper filtering, the best-performing RandomForest algorithm according to the F-measure can even achieve an improvement of up to 11%. Finally, it is proven that the introduced metrics are even suitable for predicting bugs that will appear later in a given project if sufficient learning data is available.
The conventional enhancement-and-perturbation approach to establishing Gaussian extremal inequalities is refined via a novel monotone path argument in the product probability space. This refined approach is illustrate...
详细信息
The conventional enhancement-and-perturbation approach to establishing Gaussian extremal inequalities is refined via a novel monotone path argument in the product probability space. This refined approach is illustrated with simplified/corrected proofs of the Liu-Viswanath extremal inequality and a vector generalization of Costa's entropy power inequality. The power of this refinement is further demonstrated by characterizing two information-theoretic limits, namely, the capacity region of the multiple-input multiple-output (MIMO) Gaussian broadcast channel with private and common messages and the rate-distortion-equivocation function of vector Gaussian secure source coding, which have previously resisted the attack of the conventional approach.
Within the Android mobile operating system, Android permissions act as a system of safeguards designed to restrict access to potentially sensitive data and privileged components. Multiple research studies indicate fla...
详细信息
Within the Android mobile operating system, Android permissions act as a system of safeguards designed to restrict access to potentially sensitive data and privileged components. Multiple research studies indicate flaws and limitations of the Android permission system, prompting Google to implement a more regulated and fine-grained permission model. This newly-introduced complexity creates confusion for developers leading to incorrect permissions and a significant risk to users security and privacy. We present a systematic study of theoretical and practical misuse of permissions. For this analysis we derive the unified permissions and call mappings that represent theoretical requirements of permissions and calls. We develop PChecker, an approach that identifies the discrepancies between the official Android permissions documentation and permission implementation in the Android platform source code based on these mappings. We evaluate four versions of the Android Open source Project code (major versions 10-13) and shed light on the prevalence of discrepancies between the official Android guidelines for permissions and their implementation in the Android platform source code. We further show that these discrepancies result in miscompliance in third-party Android apps.
Free software is software that gives users the right to use the software, to modify the software, and to pass on the software, modified or not, all free of charge and without restrictions on what the software is used ...
详细信息
Free software is software that gives users the right to use the software, to modify the software, and to pass on the software, modified or not, all free of charge and without restrictions on what the software is used for. Open source software provides users with the same rights as free software. For all practical purposes, they are the same.
The source Code Control System (SCCS) was first introduced in 1975 (Rochkind, 1975). It controlled computer program source code by tracking versions and recording who made changes, when, and why. The present retrospec...
详细信息
The source Code Control System (SCCS) was first introduced in 1975 (Rochkind, 1975). It controlled computer program source code by tracking versions and recording who made changes, when, and why. The present retrospective paper assesses the strengths and weaknesses of SCCS and traces its influence on software engineering over the past fifty years.
The widespread use of virtual assistants (e.g., GPT4 and Gemini, etc.) by students in their academic assignments raises concerns about academic integrity. Consequently, various machine-generated text (MGT) detection m...
详细信息
The widespread use of virtual assistants (e.g., GPT4 and Gemini, etc.) by students in their academic assignments raises concerns about academic integrity. Consequently, various machine-generated text (MGT) detection methods, developed from metric-based and model-based approaches, were proposed and shown to be highly effective. The model-based MGT methods often encounter difficulties when dealing with source code due to disparities in semantics compared to natural languages. Meanwhile, the efficacy of metric-based MGT methods on source code has not been investigated. Moreover, the challenge of identifying machine-generated codes (MGC) has received less attention, and existing solutions demonstrate low accuracy and high false positive rates across diverse human-written codes. In this paper, we take into account both semantic features extracted from Large Language Models (LLMs) and the applicability of metrics (e.g., Log-Likelihood, Rank, Log-rank, etc.) for source code analysis. Concretely, we propose MageCode, a novel method for identifying machine-generated codes. MageCode utilizes the pre-trained model CodeT5+ to extract semantic features from source code inputs and incorporates metric-based techniques to enhance accuracy. In order to assess the proposed method, we introduce a new dataset comprising more than 45,000 code solutions generated by LLMs for programming problems. The solutions for these programming problems which were obtained from three advanced LLMs (GPT4, Gemini, and Code-bison-32k), were written in Python, Java, and C++. The evaluation of MageCode on this dataset demonstrates superior performance compared to existing baselines, achieving up to 98.46% accuracy while maintaining a low false positive rate of less than 1%.
Software optimization refines programs for resource efficiency while preserving functionality. Traditionally, it is a process done by developers and compilers. This paper introduces a third option, automated optimizat...
详细信息
Software optimization refines programs for resource efficiency while preserving functionality. Traditionally, it is a process done by developers and compilers. This paper introduces a third option, automated optimization at the source code level. We present Supersonic , a neural approach targeting minor source code modifications for optimization. Using a seq2seq model, Supersonic is trained on C/C++ program pairs ( x(t) , x(t+1) ), where x(t+1) is an optimized version of x(t) , and outputs a diff. Supersonic 's performance is benchmarked against OpenAI's GPT-3.5-Turbo and GPT-4 on competitive programming tasks. The experiments show that Supersonic not only outperforms both models on the code optimization task but also minimizes the extent of the change with a model more than 600x smaller than GPT-3.5-Turbo and 3700x smaller than GPT-4.
We present the multi-language software platform eknows for building reverse engineering tools and documentation generators as a concrete example of how to successfully translate research on software analysis into inno...
详细信息
We present the multi-language software platform eknows for building reverse engineering tools and documentation generators as a concrete example of how to successfully translate research on software analysis into innovative products and services. Platform development includes domain-specific requirements and an architecture supporting reuse of components.
In recent years, Decentralized Finance (DeFi) has grown rapidly due to the development of blockchain technology and smart contracts. As of March 2023, the estimated global cryptocurrency market cap has reached approxi...
详细信息
In recent years, Decentralized Finance (DeFi) has grown rapidly due to the development of blockchain technology and smart contracts. As of March 2023, the estimated global cryptocurrency market cap has reached approximately $949 billion. However, security incidents continue to plague the DeFi ecosystem, and one of the most notorious examples is the "Rug Pull" scam. This type of cryptocurrency scam occurs when the developer of a particular token project intentionally abandons the project and disappears with investors' funds. Despite only emerging in recent years, Rug Pull events have already caused significant financial losses. In this work, we manually collected and analyzed 103 real-world rug pull events, categorizing them based on their scam methods. Two primary categories were identified: Contract-related Rug Pull (through malicious functions in smart contracts) and Transaction-related Rug Pull (through cryptocurrency trading without utilizing malicious functions). Based on the analysis of rug pull events, we propose CRPWarner (short for Contract-related Rug Pull Risk Warner) to identify malicious functions in smart contracts and issue warnings regarding potential rug pulls. We evaluated CRPWarner on 69 open-source smart contracts related to rug pull events and achieved a 91.8% precision, 85.9% recall, and 88.7% F1-score. Additionally, when evaluating CRPWarner on 13,484 real-world token contracts on Ethereum, it successfully detected 4168 smart contracts with malicious functions, including zero-day examples. The precision of large-scale experiments reaches 84.9%.
Vulnerability detection in source code has been a focal point of research in recent years. Traditional rule-based methods fail to identify complex and unknown vulnerabilities, leading to poor performance. While deep l...
详细信息
Vulnerability detection in source code has been a focal point of research in recent years. Traditional rule-based methods fail to identify complex and unknown vulnerabilities, leading to poor performance. While deep learning (DL)-based methods have improved these shortcomings, there is still room for enhancement. For C/C++ source code, effective vulnerability detection requires considering both the information in code statements and the structural information of the code. Graph-based code representation methods can address this need, but existing approaches often use homogeneous graphs that do not differentiate between various types of code statements or dependencies. Few methods use heterogeneous graphs for C/C++ code representation. This study explores this potential and proposes a new C/C++ vulnerability detection method named HeVulD. HeVulD introduces two node definition approaches and a key-node-based program slicing method, generating heterogeneous graph representations for source code. These representations consist of both heterogeneous nodes and edges, providing a more precise representation of source code. HeVulD achieves an F1-score of 96.4% on the SARD dataset, outperforming nine baseline C/C++ vulnerability detection methods. HeVulD has been tested under adversarial attack scenarios to assess its robustness. Additionally, HeVulD has been tested on ten open-source software projects and the latest CVEs, demonstrating its detection and generalization capabilities in real-world scenarios and its ability to identify unknown vulnerabilities.
暂无评论