Many techniques are proposed to assess and prioritize vulnerabilities. To evaluate their performance, researchers often craft datasets from limited data sources, lacking a global overview of broad vulnerability intell...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
Many techniques are proposed to assess and prioritize vulnerabilities. To evaluate their performance, researchers often craft datasets from limited data sources, lacking a global overview of broad vulnerability intelligence. The repetitive data preparation process complicates the evaluation of newsolutions. To solve this issue, we propose VULZOO, a comprehensive vulnerability intelligence dataset that covers 17 vulnerability data sources. We also construct connections among these sources, enabling more straightforward configuration and adaptation for different tasks. VULZOO provides utility scripts for automatic data synchronization and cleaning, relationship mining, and statistics generation. We make VULZOO publicly available and maintain it with incremental updates. We believe that VULZOO serves as a valuable input to vulnerability assessment and prioritization studies. The video is at https://***/EvoxQmUAHtw. The dataset is at https://***/NUS-Curiosity/VulZoo.
software vulnerabilities pose serious threats to the security of modern software systems. Deep Learning-based automated Vulnerability Repair (AVR) has gained attention as a potential solution to accelerate the remedia...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
software vulnerabilities pose serious threats to the security of modern software systems. Deep Learning-based automated Vulnerability Repair (AVR) has gained attention as a potential solution to accelerate the remediation of vulnerabilities. However, recent studies indicate that existing AVR approaches often only generate patches, which may not align with developers' current repair practices or expectations. In this paper, we introduce VulAdvisor, an automated approach that generates natural language suggestions to guide developers or AVR tools in repairing vulnerabilities. VulAdvisor comprises two main components: oracle extraction and suggestion learning. To address the challenge of limited historical data, we propose an oracle extraction method facilitating ChatGPT to construct a comprehensive and high-quality dataset. For suggestion learning, we take the supervised fine-tuning CodeT5 model as the basis, integrating local context into Multi-Head Attention and introducing a repair action loss, to improve the relevance and meaningfulness of the generated suggestions. Extensive experiments on a large-scale dataset from real-world C/C++ projects demonstrate the effectiveness of VulAdvisor, surpassing several alternatives in terms of both lexical and semantic metrics. Moreover, we show that the generated suggestions enhance the patch generation capabilities of existing AVR tools. Human evaluations further validate the quality and utility of VulAdvisor's suggestions, confirming their potential to improve software vulnerability repair practices.
Open-source software is crucial to modern development, but its complexity creates challenges in quality, security, and management. Current governance approaches excel at collaboration but struggle with decentralized m...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
Open-source software is crucial to modern development, but its complexity creates challenges in quality, security, and management. Current governance approaches excel at collaboration but struggle with decentralized management and security. With the rise of large language models (LLM)-based softwareengineering, the need for a finer-grained understanding of software composition is more urgent than ever. To address these challenges, inspired by the Human Genome Project, we treat the software source code as software DNA and propose the software Genome Project (SGP), which is geared towards the secure monitoring and exploitation of open-source software. By identifying and labeling integrated and classified code features at a fine-grained level, and effectively identifying safeguards for functional implementations and non-functional requirements at different levels of granularity, the SGP could build a comprehensive set of software genome maps to help developers and managers gain a deeper understanding of software complexity and diversity. By dissecting and summarizing functional and undesirable genes, SGP could help facilitate targeted software optimization, provide valuable insight and understanding of the entire software ecosystem, and support critical development tasks such as open source governance. SGP could also serve as a comprehensive dataset with abundant semantic labeling to enhance the training of LLMs for code. Based on these, we expect SGP to drive the evolution of software development towards more efficient, reliable, and sustainable software solutions.
The proceedings contain 10 papers. The topics discussed include: same same but different: a comparative analysis of static type checkers in Erlang;nominal types for Erlang;Erlang on TOAST: generating erlang stubs with...
ISBN:
(纸本)9798400710988
The proceedings contain 10 papers. The topics discussed include: same same but different: a comparative analysis of static type checkers in Erlang;nominal types for Erlang;Erlang on TOAST: generating erlang stubs with inline TOAST monitors;modeling Erlang compiler IR as SMT formulas;is this really a refactoring? automated equivalence checking for Erlang projects;controlled scheduling of concurrent elixir programs;unsafe impedance: safe languages and safe by design software;the benefits of tierless Elixir/potato for engineering IoT systems;and Elixir-powered low-income animal shelter support: an experience report from conception to production.
Smart contracts are susceptible to being exploited by attackers, especially when facing real-world vulnerabilities. To mitigate this risk, developers often rely on third-party audit services to identify potential vuln...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
Smart contracts are susceptible to being exploited by attackers, especially when facing real-world vulnerabilities. To mitigate this risk, developers often rely on third-party audit services to identify potential vulnerabilities before project deployment. Nevertheless, repairing the identified vulnerabilities is still complex and labor-intensive, particularly for developers lacking security expertise. Moreover, existing pattern-based repair tools mostly fail to address real-world vulnerabilities due to their lack of high-level semantic understanding. To fill this gap, we propose CONTRACTTINKER, a Large Language Models (LLMs)-empowered tool for real-world vulnerability repair. The key insight is our adoption of the Chain-of-Thought approach to break down the entire generation task into sub-tasks. Additionally, to reduce hallucination, we integrate program static analysis to guide the LLM. We evaluate CONTRACTTINKER on 48 high-risk vulnerabilities. The experimental results show that among the patches generated by CONTRACTTINKER, 23 (48%) are valid patches that fix the vulnerabilities, while 10 (21%) require only minor modifications. A video of CONTRACTTINKER is available at https://***/HWFVi-YHcPE.
Rust is renowned for its robust memory safety capabilities, yet its distinctive memory management model poses substantial challenges in both writing and understanding programs. Within Rust source code, comments are em...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
Rust is renowned for its robust memory safety capabilities, yet its distinctive memory management model poses substantial challenges in both writing and understanding programs. Within Rust source code, comments are employed to clearly delineate conditions that might cause panic behavior, thereby warning developers about potential hazards associated with specific operations. Therefore, comments are particularly crucial for documenting Rust's program logic and design. Nevertheless, as modern software frequently undergoes updates and modifications, maintaining the accuracy and relevance of these comments becomes a labor-intensive endeavor. In this paper, inspired by the remarkable capabilities of Large Language Models (LLMs) in understanding software programs, we propose a code-comment inconsistency detection tool, namely RustC(4), that combines program analysis and LLM-driven techniques to identify inconsistencies in code comments. RustC(4) leverages LLMs' ability to interpret natural language descriptions within code comments, facilitating the extraction of design constraints. Program analysis techniques are then employed to accurately verify the implementation of these constraints. To evaluate the effectiveness of RustC(4), we construct a dataset from 12 large-scale real-world Rust projects. The experiment results demonstrate that RustC(4) is effective in detecting 176 real inconsistent cases and 23 of them have been confirmed and fixed by developers by the time this paper was submitted.
The proceedings contain 94 papers. The topics discussed include: automatic assessment of students’ software models using a simple heuristic and machine learning;from classic to agile: experiences from more than a dec...
ISBN:
(纸本)9781450381352
The proceedings contain 94 papers. The topics discussed include: automatic assessment of students’ software models using a simple heuristic and machine learning;from classic to agile: experiences from more than a decade of project-based modeling education;towards a better understanding of interactions with a domain modeling assistant;on teaching descriptive and prescriptive modeling;a language agnostic approach to modeling requirements: specification and verification;SysML models: studying safety and security measures impact on performance using graph tainting;validity frame concept as effort-cutting technique within the verification and validation of complex cyber-physical systems;and metrics for OCL expressions: development, realization, and applications for validation.
More than ever, Machine Learning (ML) as a subfield of Artificial Intelligence (AI) is on the rise and is finding its way into safety-critical software applications. However, when it comes to quality assurance (QA) an...
详细信息
ISBN:
(纸本)9798400705915
More than ever, Machine Learning (ML) as a subfield of Artificial Intelligence (AI) is on the rise and is finding its way into safety-critical software applications. However, when it comes to quality assurance (QA) and trustworthiness, integrating ML models into software comes with challenges that may not be apparent at first glance. The European Union (EU) aims to tackle this problem with new regulatory requirements in the form of harmonized rules on AI (AI Act). It is a risk-based approach with extensive requirements for high-risk systems as well as for foundation models that can be used in various downstream AI systems. Reliable softwareengineering processes in the form of ML-enabled automated pipelines are likely to become a discerning factor for legally compliant ML systems. Our research project aims to contribute to the field by establishing an empirically grounded foundation on how to achieve trustworthy AI Act compliant ML systems. Both a literature review and an interview study are ongoing. At a later stage, concrete tools shall be developed, ideally in cooperation with an industry partner, possibly by utilizing the concept of regulatory sandboxes.
The proceedings contain 108 papers. The topics discussed include: software heritage: collecting, preserving, and sharing all our source code;automated requirements engineering challenges with examples from small unman...
ISBN:
(纸本)9781450359375
The proceedings contain 108 papers. The topics discussed include: software heritage: collecting, preserving, and sharing all our source code;automated requirements engineering challenges with examples from small unmanned aerial systems;implementation science for softwareengineering: bridging the gap between research and practice;on adopting linters to deal with performance concerns in android apps;PerfLearner: learning from bug reports to understand and generate performance test frames;AutoConfig: automatic configuration tuning for distributed message systems;is this class thread-safe? inferring documentation using graph-based learning;scalable incremental building with dynamic task dependencies;automated directed fairness testing;DeepGauge: multi-granularity testing criteria for deep learning systems;and DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems.
The proceedings contain 35 papers. The topics discussed include: SceML - a graphical modeling framework for scenario-based testing of autonomous vehicles;to build, or not to build: ModelFlow, a build solution for MDE ...
ISBN:
(纸本)9781450370196
The proceedings contain 35 papers. The topics discussed include: SceML - a graphical modeling framework for scenario-based testing of autonomous vehicles;to build, or not to build: ModelFlow, a build solution for MDE projects;efficient generation of graphical model views via lazy model-to-text transformation;an extensible framework for customizable model repair;a compositional framework for systematic modeling language reuse;interactive metamodel/model co-evolution using unsupervised learning and multi-objective search;model-driven digital twin construction: synthesizing the integration of cyber-physical systems with their information systems;model-based fleet deployment of edge computing applications;and a parametric model for creating customized fabrication machines.
暂无评论