检索结果-内蒙古大学图书馆

您好，读者！请登录

咨询与建议

检索条件"主题词=Programming Language processing"

共 8 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Finding Reusable Machine Learning Components to Build Progra...

引用

16th European Conference on Software Architecture (ECSA)

作者： Flynn, Patrick Vanderbruggen, Tristan Liao, Chunhua Lin, Pei-Hung Emani, Murali Shen, Xipeng Lawrence Livermore Natl Lab Livermore CA 94550 USA Univ North Carolina Charlotte Charlotte NC 28223 USA Argonne Natl Lab Lemont IL 60439 USA North Carolina State Univ Raleigh NC 27695 USA

ISBN: (纸本)9783031368882;9783031368899

programming language processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets and models being released, and the set of complex compilers or tools involved. To improve the findability, accessibility, interoperability and reusability (FAIRness) of machine learning components, we collect and analyze a set of representative papers in the domain of machine learning-based PLP. We then identify and characterize key concepts including PLP tasks, model architectures and supportive tools. Finally, we show some example use cases of leveraging the reusable components to construct machine learning pipelines to solve a set of PLP tasks.

关键词： reusable datasets reusable machine learning programming language processing interoperable pipelines

来源：评论

学校读者我要写书评

暂无评论

LM4HPC: Towards Effective language Model Application in High-Performance Computing 19th

LM4HPC: Towards Effective Language Model Application in High...

引用

19th International Workshop on OpenMP (IWOMP)

作者： Chen, Le Lin, Pei-Hung Vanderbruggen, Tristan Liao, Chunhua Emani, Murali de Supinski, Bronis Lawrence Livermore Natl Lab Livermore CA 94550 USA Iowa State Univ Ames IA 50010 USA Argonne Natl Lab Lemont IL 60439 USA

ISBN: (纸本)9783031407437;9783031407444

In recent years, language models (LMs), such as GPT-4, have been widely used in multiple domains, including natural language processing, visualization, and so on. However, applying them for analyzing and optimizing high-performance computing (HPC) software is still challenging due to the lack of HPC-specific support. In this paper, we design the LM4HPC framework to facilitate the research and development of HPC software analyses and optimizations using LMs. Tailored for supporting HPC datasets, AI models, and pipelines, our framework is built on top of a range of components from different levels of the machine learning software stack, with Hugging Face-compatible APIs. Using three representative tasks, we evaluated the prototype of our framework. The results show that LM4HPC can help users quickly evaluate a set of state-of-the-art models and generate insightful leaderboards.

关键词： language model programming language processing High-performance computing

来源：评论

学校读者我要写书评

暂无评论

GRACE: language Models Meet Code Edits 2023

GRACE: Language Models Meet Code Edits

引用

31st ACM Joint Meeting of the European Software Engineering Conference / Symposium on the Foundations-of-Software-Engineering (ESEC/FSE)

作者： Gupta, Priyanshu Khare, Avishree Bajpai, Yasharth Chakraborty, Saikat Gulwani, Sumit Kanade, Aditya Radhakrishna, Arjun Soares, Gustavo Tiwari, Ashish Microsoft Hyderabad India Univ Penn Philadelphia PA 19104 USA Microsoft Res Redmond WA USA Microsoft Redmond WA USA Microsoft Res Hyderabad India

ISBN: (纸本)9798400703270

Developers spend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large language models (LLMs) with the knowledge of relevant prior associated edits, which we call the Grace (Generation conditioned on Associated Code Edits) method. The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits helps capture the latent developer intent. We evaluate two well-known LLMs, CODEX and CODET5, in zero-shot and fine-tuning settings respectively. In our experiments with two datasets, GRACE boosts the performance of the LLMs significantly, enabling them to generate 29% and 54% more correctly-edited code in top-1 suggestions relative to the current state-of-the-art symbolic and neural approaches, respectively.

关键词： Code editing Associated edits Large language models Pre-trained model programming language processing

来源：评论

学校读者我要写书评

暂无评论

Learning Algorithm Implementation Structures for Multilabel Classification via CodeBERT 30

Learning Algorithm Implementation Structures for Multilabel ...

引用

30th International Conference on Computers in Education (ICCE)

作者： Roldan, Karl Frederick Jana, Gerd Lowell Lesaba, John Kenneth Martinez, Joshua Ateneo Naga Univ Naga Philippines

ISBN: (纸本)9789869721493

Task constraint feedback is the collective name for any kind of feedback system that checks whether problem-defined constraints were fulfilled by students upon submission of work. This can be as simple as checking if certain programming constructs exist, or if a specific algorithm or data structure required by the problem is fulfilled. Most of these systems use static analysis (Fischer, 2006;Gotel, 2008) or natural language processing techniques (Lane, 2005) to generate feedback. A transformer is a neural network for sequence processing, such as natural languages. Previous work has shown that transformers can be generalized for programming language tasks such as code summarization. In this study, we used the CodeBERT transformer to classify or tag algorithms implemented in some code snippets to check constraint satisfaction. Using a custom dataset containing source code aiming to implement algorithms, we show that CodeBERT is capable of learning structures of how code is implemented regardless of how a programmer names the code. Averaging each label's f1-score, the model was able to obtain an average of 0.85, which showed promising results in the dataset.

关键词： Deep Learning programming language processing Sequence Models Multilabel Classification Transfer Learning

来源：评论

学校读者我要写书评

暂无评论

Contrastive Code-Comment Pre-training 22

Contrastive Code-Comment Pre-training

引用

22nd IEEE International Conference on Data Mining (ICDM)

作者： Pei, Xiaohuan Liu, Daochang Qian, Luo Xu, Chang Univ Sydney Sch Comp Sci Fac Engn Sydney Australia

ISBN: (纸本)9781665450997

Pre-trained models for Natural languages (NL) have been recently shown to transfer well to programming languages (PL) and largely benefit different intelligence coderelated tasks, such as code search, clone detection, programming translation and code document generation. However, existing pre-trained methods for programming languages are mainly conducted by masked language modeling and next sentence prediction at token or graph levels. This restricted form limits their performance and transferability since PL and NL have different syntax rules and the downstream tasks require a multimodal representation. Here we introduce C3P, a Contrastive Code-Comment Pre-training approach, to solve various downstream tasks by pre-training the multi-representation features on both programming and natural syntax. The model encodes the code syntax and natural language description (comment) by two encoders and the encoded embeddings are projected into a multi-modal space for learning the latent representation. In the latent space, C3P jointly trains the code and comment encoders by the symmetric loss function, which aims to maximize the cosine similarity of the correct code-comment pairs while minimizing the similarity of unrelated pairs. We verify the empirical performance of the proposed pre-trained models on multiple downstream code-related tasks. The comprehensive experiments demonstrate that C3P outperforms previous work on the understanding tasks of code search and code clone, as well as the generation tasks of programming translation and document generation. Furthermore, we validate the transferability of C3P to the new programming language which is not seen in the pre-training stage. The results show our model surpasses all supervised methods and in some programming language cases even outperforms prior pre-trained approaches. Code is available at https://***/TerryPei/C3P.

关键词： contrastive learning representation learning pre-training programming language processing

来源：评论

学校读者我要写书评

暂无评论

Learning to Edit Code: Towards Building General Purpose Models for Source Code Editing

Learning to Edit Code: Towards Building General Purpose Mode...

引用

作者： Chakraborty, Saikat Columbia University

学位级别：Ph.D., Doctor of Philosophy

The way software developers edit code day-to-day tends to be repetitive, often using existing code elements. Many researchers have tried to automate the repetitive code editing process by mining specific change templates. However, such templates are often manually implemented for automated applications. Consequently, such template-based automated code editing is very tedious to implement. In addition, template-based code editing is often narrowly-scoped and low noise tolerant. Machine Learning, specially deep learning-based techniques, could help us solve these problems because of their generalization and noise tolerance capacities. The advancement of deep neural networks and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild and applying those in the appropriate context. However, deep neural network-based modeling for code changes, and code, in general, introduces some specific problems that need specific attention from the research community. For instance, source code exhibit strictly defined syntax and semantics inherited from the properties of programming language (PL). In addition, source code vocabulary (possible number of tokens) can be arbitrarily large. This dissertation formulates the problem of automated code editing as a multi-modal translation problem, where, given a piece of code, the context, and some guidance, the objective is to generate edited code. In particular, we divide the problem into two sub-problems—source code understanding and generation. We empirically show that the deep neural networks (models in general) for these problems should be aware of the PL-properties (i.e., syntax, semantics). This dissertation investigates two primary directions of endowing the models with knowledge about PL-properties—(i) explicit encoding: where we design models catering to a specific property, and (ii) implicit encoding: where we train a very-large model to learn these pro

关键词： AI4SE Code editing programming language processing Source code model Source code pretraining Transformer for source code

来源：评论

学校读者我要写书评

暂无评论

Automatic Release Notes Generation 11

Automatic Release Notes Generation

引用

11th IEEE International Conference on Software Engineering and Service Science (IEEE ICSESS)

作者： Ali, Mubashir Aftab, Asad Buttt, Wasi Haider Natl Univ Sci & Technol NUST Coll EME Dept Comp & Software Engn Islamabad Pakistan Natl Univ Sci & Technol NUST Sch Elect Engn & Comp Sci SEECS Islamabad Pakistan

ISBN: (纸本)9781728165790

Release Notes (RNs) are one of the important artifacts in software development and maintenance. As, RNs are required when a new release of a software is planned to deploy. They contain all the changes made to the new release of project i.e. description of new features, improvements, bug fixes, deprecated features, etc. Generating these notes manually is a very complex,and time-consuming task. In this paper, we present an approach for generating RNs automatically. We implemented the approach in python and generate these notes for *** projects. Our system extracts changes from Git repository, summarize changes, get deprecated features, get library changes, fetch issues from issue tracker, and link these issues to code, etc. Our system hierarchically set up these changes and produce an output in a document. We evaluated our results manually from 14 industry developers. The results obtained from our system shows that these RNs are very good and accurate than ones always produced manually.

关键词： Software Release Notes Software documentation Software Automation Software Maintenance programming language processing Natural language processing

来源：评论

学校读者我要写书评

暂无评论

An End-to-End Detection Method for WebShell with Deep Learning 8

An End-to-End Detection Method for WebShell with Deep Learni...

引用

8th International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC)

作者： Qi, Longchen Kong, Rui Lu, Yang Zhuang, Honglin Peking Univ MOE Key Lab Network & Software Secur Beijing Peoples R China Natl Key Lab Sci & Technol Informat Syst Secur Beijing Peoples R China Mil Representat Bur Chongqing Off 497 Chongqing Peoples R China

ISBN: (纸本)9781538682463

In this paper, a generic static end-to-end detection framework with deep neural network for WebShell is designed, which is free from human labor and domain knowledge. In this paper, we simultaneously introduce word embedding in Natural language processing(NLP) and lexical analysis in programming language processing(PLP) to obtain an accurate, structured, semantic-rich vector representation of the script code. For the obtaining's sake, a series of effective tricks are designed to further dig out the high-value information in the script while filtering noise. Then, we provide a desirable algorithm to down-sampling, which drastically reduces the computational costs at a relatively small information loss. Finally, we achieve high detection accuracy by employing the Deep Neural Network (DNN) composed of LSTM and pooling layers. The framework has a significant advantage at least on data set of the experiment.

关键词： WebShell detection Down-sampling Deep learning programming language processing

来源：评论

学校读者我要写书评

暂无评论

全选清除本页清除全部题录导出标记到“检索档案”

共1页 << < 1 > >>

回到顶部

执行限定条件

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：