The way software developers edit code day-to-day tends to be repetitive, often using existing code elements. Many researchers have tried to automate the repetitive code editing process by mining specific change templa...
详细信息
The way software developers edit code day-to-day tends to be repetitive, often using existing code elements. Many researchers have tried to automate the repetitive code editing process by mining specific change templates. However, such templates are often manually implemented for automated applications. Consequently, such template-based automated code editing is very tedious to implement. In addition, template-based code editing is often narrowly-scoped and low noise tolerant. Machine Learning, specially deep learning-based techniques, could help us solve these problems because of their generalization and noise tolerance capacities. The advancement of deep neural networks and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild and applying those in the appropriate context. However, deep neural network-based modeling for code changes, and code, in general, introduces some specific problems that need specific attention from the research community. For instance, sourcecode exhibit strictly defined syntax and semantics inherited from the properties of Programming Language (PL). In addition, sourcecode vocabulary (possible number of tokens) can be arbitrarily large. This dissertation formulates the problem of automated code editing as a multi-modal translation problem, where, given a piece of code, the context, and some guidance, the objective is to generate edited code. In particular, we divide the problem into two sub-problems—sourcecode understanding and generation. We empirically show that the deep neural networks (models in general) for these problems should be aware of the PL-properties (i.e., syntax, semantics). This dissertation investigates two primary directions of endowing the models with knowledge about PL-properties—(i) explicit encoding: where we design models catering to a specific property, and (ii) implicit encoding: where we train a very-large model to learn these pro
暂无评论