Code based adversarial attacks play a crucial role in revealing vulnerabilities of software system. Recently, pretrained programming language models (PLMs) have demonstrated remarkable success in various significant s...
详细信息
Code based adversarial attacks play a crucial role in revealing vulnerabilities of software system. Recently, pretrained programming language models (PLMs) have demonstrated remarkable success in various significant software engineering tasks, progressively transforming the paradigm of software development. Despite their impressive capabilities, these powerful models are vulnerable to adversarial attacks. Therefore, it is necessary to carefully investigate the robustness and vulnerabilities of the PLMs by means of adversarial attacks. Adversarial attacks entail imperceptible input modifications that cause target models to make incorrect predictions. Existing approaches for attacking PLMs often employ either identifier renaming or the greedy algorithm, which may yield sub-optimal performance or lead to high inference times. In response to these limitations, we propose CARL, an unsupervised black-box attack model that leverages reinforcement learning to generate imperceptible adversarial examples. Specifically, CARL comprises a programminglanguage encoder and a perturbation prediction layer. In order to achieve more effective and efficient attack, we cast the task as a sequence decision-making process, optimizing through policy gradient with a suite of reward functions. We conduct extensive experiments to validate the effectiveness of CARL on code summarization, code translation, and code refinement tasks, covering various programminglanguages and PLMs. The experimental results demonstrate that CARL surpasses state-of-the-art code attack models, achieving the highest attack success rate across multiple tasks and PLMs while maintaining high attack efficiency, imperceptibility, consistency, and fluency.
Design patterns (DPs) facilitate effective software architecture and design and must be maintained and enforced in existing complex software products, for example, automotive software. Implementing DPs in source code ...
详细信息
ISBN:
(纸本)9798350366266;9798350366259
Design patterns (DPs) facilitate effective software architecture and design and must be maintained and enforced in existing complex software products, for example, automotive software. Implementing DPs in source code facilitates the development of high-quality software products with less effort. However, recognizing DPs in program code is challenging, and this makes it difficult to keep architectural evolution under control in large software products over time. As DPs are abstract solutions, the programs used to recognize them in source code have significant limitations. In this paper, we employ four programming language models based on Bidirectional Encoder Representations from Transformers (BERT) to study to which extent these models can recognize an exemplar DP, in this case, Singleton. We compare four language representation models - OpenAI CodeX, Facebook AI TransCoder, ACoRA/BERT, and CCFlex/bag-of-words, and compare the models' rankings to a simple base metric. We found a discrepancy between models in identifying Singletons and found that the models are inconsistently sensitive to name and semantic changes. Specifically, CodeX recognizes the existence of Singletons better than other models, while only ACoRA shows some signs of recognizing DP semantics.
Design patterns (DPs) provide reusable and general solutions for frequently encountered problems. Patterns are important to maintain the structure and quality of software products, in particular in large and distribut...
详细信息
ISBN:
(纸本)9798400703751
Design patterns (DPs) provide reusable and general solutions for frequently encountered problems. Patterns are important to maintain the structure and quality of software products, in particular in large and distributed systems like automotive software. Modern languagemodels (like Code2Vec or Word2Vec) indicate a deep understanding of programs, which has been shown to help in such tasks as program repair or program comprehension, and therefore show promise for DPR in industrial contexts. The models are trained in a self-supervised manner, using a large unlabelled code base, which allows them to quantify such abstract concepts as programming styles, coding guidelines, and, to some extent, the semantics of programs. This study demonstrates how two languagemodels-Code2Vec and Word2Vec, trained on two public automotive repositories, can show the separation of programs containing specific DPs. The results show that the Code2Vec and Word2Vec produce average F1-scores of 0.781 and 0.690 on open-source Java programs, showing promise for DPR in practice.
Software Engineering (SE) Pre-trained languagemodels (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code cl...
详细信息
Software Engineering (SE) Pre-trained languagemodels (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through the fine-tuning of PLMs. In Natural language Processing (NLP), an alternative in transferring the knowledge of PLMs is explored through the use of adapter, a compact and parameter efficient module that is inserted into a PLM. Although the use of adapters has shown promising results in many NLP-based downstream tasks, their application and exploration in SE-based downstream tasks are limited. Here, we study the knowledge transfer using adapters on multiple downstream tasks including cloze test, code clone detection, and code summarization. These adapters are trained on code corpora and are inserted into a PLM that is pre-trained on English corpora or code corpora. We called these PLMs as NL-PLM and C-PLM, respectively. We observed an improvement in results using NL-PLM over a PLM that does not have adapters, and this suggested that adapters can transfer and utilize useful knowledge from NL-PLM to SE tasks. The results are sometimes on par with or exceed the results of C-PLM;while being more efficient in terms of the number of parameters and training time. Interestingly, adapters inserted into a C-PLM generally yield better results than a traditional fine-tuned C-PLM. Our results open new directions to build more compact models for SE tasks.
We have developed novel techniques for component-based specification of programminglanguages. In our approach, the semantics of each fundamental programming construct is specified independently, using an inherently m...
详细信息
We have developed novel techniques for component-based specification of programminglanguages. In our approach, the semantics of each fundamental programming construct is specified independently, using an inherently modular framework such that no reformulation is needed when constructs are combined. Alanguage specification consists of an unrestricted context-free grammar for the syntax of programs, together with an analysis of each language construct in terms of fundamental constructs. An open-ended collection of fundamental constructs is currently being developed. When supported by appropriate tools, our techniques allow a more agile approach to the design, modelling, and implementation of programming and domain-specific languages. In particular, our approach encourages language designers to proceed incrementally, using prototype implementations generated from specifications to test tentative designs. The components of our specifications are independent and highly reusable, so initial language specifications can be rapidly produced, and can easily evolve in response to changing design decisions. In this paper, we outline our approach, and relate it to the practices and principles of agile modelling.
暂无评论