software architectures are usually meticulously designed to address multiple quality concerns and support long-term maintenance. However, there may be a lack of motivation for developers to document design rationales ...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
software architectures are usually meticulously designed to address multiple quality concerns and support long-term maintenance. However, there may be a lack of motivation for developers to document design rationales (i.e., the design alternatives and the underlying arguments for making or rejecting decisions) when they will not gain immediate benefit, resulting in a lack of standard capture of these rationales. Withthe turnover of developers, the architecture inevitably becomes eroded. this issue has motivated a number of studies to extract design knowledge from open-source communities in recent years. Unfortunately, none of the existing research has successfully extracted solutions alone withtheir corresponding arguments due to challenges such as the intricate semantics of online discussions and the lack of benchmarks for design rationale extraction. In this paper, we propose a novel approach, named DRMiner, to automatically mine latent design rationales from developers' live discussion in open-source community (i.e., issue logs in Jira). To better identify solutions and their relevant arguments, DRMiner skillfully decomposes the problem into multiple text classification tasks and tackles them using prompt tuning of large language models (LLMs) and specific heuristic features. To evaluate DRMiner, we acquire issue logs from Cassandra, Flink, and Solr repositories in Jira and form a dataset for design rationale mining. Experimental results show that DRMiner outperforms all baselines and achieves F1 improvements of 24%, 22%, and 20% for mining design rationales, solutions, and arguments, respectively, compared to the best baseline. Furthermore, we investigate the usefulness of the design rationales mined by DRMiner for automated program repair (APR) and find that advanced LLMs, when prompted withthese extracted rationales, generate 10x-18x more full-match patches and achieve a 10%-13% gain in CodeBLEU scores.
In a large-scale cluster or cloud computing environment, the execution of data-intensive operations will cause a large amount of redundant energy consumption. In order to improve computing energy efficiency and ensure...
详细信息
Semantic features encoded in the labels have a strong influence on multi-label text classification (MLTC) performance. this paper follows the assumption and proposes an MLTC approach with correlation learning. Unlike ...
详细信息
Cryptography plays a significant role in softwareengineering, its application is paramount for the security of software systems. However, comprehending and securely utilizing cryptography in software development is c...
详细信息
In the cyberoperations community there is a commonly accepted starting point for describing cyberspace as comprising of multiple planes through which information flows. However, the model is not a tool that facilitate...
详细信息
ISBN:
(纸本)9781914587962;9781914587979
In the cyberoperations community there is a commonly accepted starting point for describing cyberspace as comprising of multiple planes through which information flows. However, the model is not a tool that facilitates planning and executing cyberoperations. Tools do exist in the form of technical cybersecurity ontologies. At the moment the link between technical ontologies, that are the tools of experts, and the operational planning process is limited. these technical ontologies provide automated information that would support operational planning. At the moment cybersecurity experts translate the information that military professionals need, which may cause insufficiencies or distortions in communication or cause inconsistencies in the planning process. this paper presents the ongoing work of developing a model of cyberspace in the form of a core ontology. the ontology describes the flow of digital information between persons and the enabling technology as well as geographical data. It is intended as a tool that supports operational planning and decision-making in and through cyberspace, by enabling automation and reasoning. the model is created using the well-established Constructive Research Approach (CRA) methodology, and is developed on earlier research. CRA consists of six phases in which (1) the problem is defined, (2) an understanding of the topic is generated, (3) a solution (model) is constructed which then is (4) demonstrated. then the models (5) theoretical connections are presented and the (6) scope of applicability is assessed. the challenges of developing an ontology of cyberspace as part of the third phase of the methodology are in focus. the ontology serves as an operational core ontology, aiming to link cybersecurity domain ontologies to the DOLCE+DnS Ultralite (DUL) foundational ontology. the ontology is based on research in Cyberspace Geography and Cyber Terrain. No earlier attempts at creating a core ontology of cyberspace grounded in a foundat
Artificial intelligence has been widely used in the field of music and achieved good results. As an important concept in music, chord generation is one of the basic tasks in composition. the application of machine lea...
详细信息
FFT processors are widely used in digital signal processing, but they cause a lot of hardware consumption when computing large point sequences. this paper introduces a high-performance, low-complexity, and the configu...
详细信息
this paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constru...
详细信息
ISBN:
(纸本)9798400703751
this paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). the dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some programs handle complicated tasks like network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Every program is labeled withthe vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name. this is accomplished by employing a formal verification method using the Efficient SMT-based Bounded Model Checker (ESBMC), which uses model checking, abstract interpretation, constraint programming, and satisfiability modulo theories to reason over safety/security properties in programs. this approach definitively detects vulnerabilities and offers a formal model known as a counterexample, thus eliminating the possibility of generating false positive reports. We have associated the identified vulnerabilities with Common Weakness Enumeration (CWE) numbers. We make the source code available for the 112, 000 programs, accompanied by a separate file containing the vulnerabilities detected in each program, making the dataset ideal for training LLMs and machine learning algorithms. Our study unveiled that according to ESBMC, 51.24% of the programs generated by GPT-3.5 contained vulnerabilities, thereby presenting considerable risks to software safety and security.
Unprecedented global challenges such as the COVID-19 pandemic necessitated a widespread transition to Work-From-Home (WFH) arrangements for project teams, posing significant challenges in conveying requirements within...
详细信息
ISBN:
(纸本)9798350376975;9798350376968
Unprecedented global challenges such as the COVID-19 pandemic necessitated a widespread transition to Work-From-Home (WFH) arrangements for project teams, posing significant challenges in conveying requirements within agile Requirements engineering (RE). While numerous studies have examined the impact of transitioning work routines during the pandemic, limited research exists on the specific challenges of agile RE operating within the WFH context. Given the pervasive shift in the software development ecosystem worldwide, where WFH is projected to persist even in the post-COVID era, it is imperative to ascertain the challenges associated with WFH-based agile RE. During the pandemic, we collaborated with startups to conduct an industry-academia project. By adopting the methodology of action research, this study comprehensively analyzed agile RE practices and reported the key challenges encountered within the WFH context. To mitigate these challenges, several collaborative RE techniques were employed in three intervention cycles. Interviews were conducted to thoroughly analyze the results. this study also provides insights into collaborative RE techniques and valuable lessons learned. Considering the increasing prevalence of WFH as a working mode in the post-pandemic era, this study equips the community with practical strategies to navigate agile RE challenges and better prepare for unprecedented challenges in the future.
the Context-observant LLM-Enabled Autonomous Robots (CLEAR) platform offers a general solution for large language model (LLM)-enabled robot autonomy. CLEAR-controlled robots use natural language to perceive and intera...
详细信息
ISBN:
(纸本)9798400703232
the Context-observant LLM-Enabled Autonomous Robots (CLEAR) platform offers a general solution for large language model (LLM)-enabled robot autonomy. CLEAR-controlled robots use natural language to perceive and interact withtheir environment: contextual description deriving from computer vision and optional human commands prompt intelligent LLM responses that map to robotic actions. By emphasizing prompting, system behavior is programmed without manipulating code, and unlike other LLM-based robot control methods, we do not perform any model fine-tuning. CLEAR employs of-the-shelf pre-trained machine learning models for controlling robots ranging from simulated quadcopters to terrestrial quadrupeds. We provide the open-source CLEAR platform, along with sample implementations for a Unity-based quadcopter and Boston Dynamics Spot (R) robot. Each LLM used, GPT-3.5, GPT-4, and LLaMA2, exhibited behavioral differences when embodied by CLEAR, contrasting in actuation preference, ability to apply new knowledge, and receptivity to human instruction. GPT-4 demonstrates best performance compared to GPT-3.5 and LLaMA2, showing successful task execution 97% of the time. the CLEAR platform contributes to HRI by increasing the usability of robotics for natural human interaction.
暂无评论