Automated program repair (APR) has great potential to reduce bug-fixing effort and many approaches have been proposed in recent years. APRs are often treated as a search problem where the search space consists of all ...
详细信息
ISBN:
(纸本)9781450356992
Automated program repair (APR) has great potential to reduce bug-fixing effort and many approaches have been proposed in recent years. APRs are often treated as a search problem where the search space consists of all the possible patches and the goal is to identify the correct patch in the space. Many techniques take a data-driven approach and analyze data sources such as existing patches and similar source code to help identify the correct patch. However, while existing patches and similar code provide complementary information, existing techniques analyze only a single source and cannot be easily extended to analyze both. In this paper, we propose a novel automatic program repair approach that utilizes both existing patches and similar code. Our approach mines an abstract search space from existing patches and obtains a concrete search space by differencing with similar code snippets. Then we search within the intersection of the two search spaces. We have implemented our approach as a tool called SimFix, and evaluated it on the Defects4J benchmark. Our tool successfully fixed 34 bugs. To our best knowledge, this is the largest number of bugs fixed by a single technology on the Defects4J benchmark. Furthermore, as far as we know, 13 bugs fixed by our approach have never been fixed by the current approaches.
As current tree-differencing approaches ignore changes that occur within a statement or do not present them in an abstract way, it is difficult to automatically understand revisions involving statement updates. We pro...
详细信息
ISBN:
(纸本)9781728130941
As current tree-differencing approaches ignore changes that occur within a statement or do not present them in an abstract way, it is difficult to automatically understand revisions involving statement updates. We propose a tree-differencing approach to identifying the within-statement changes. It calculates edit operations based on an element-sensitive strategy and the longest common sequence algorithm. Then, it generates the metadata for each edit operation. Meta-data include the type of operation, the type of entity and the name of the element part, the content, the content pattern and all references involved. We have implemented the approach as a free accessible tool. It is built upon ChangeDistiller and refines its statement-update type. Finally, to demonstrate how to use the proposed approach for change understanding, we studied the condition-expression changes in four open projects. We analyzed the non-essential condition changes, the effective changes that definitely affect the condition, and other changes. The results show that for revisions with condition-expression changes, nearly 20% contain non-essential changes, while more than 60% have effective changes. Furthermore, we found many common patterns. For example, we found that half of the revisions with effective changes were caused by adding or removing expressions in logical expressions. And, in these revisions, 47% enhanced the condition, while 49% weakened it.
Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the c...
详细信息
ISBN:
(纸本)9781538605356
Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studying the differences between auto-generated and human-written messages, we found that 82% of the human-written messages have only one sentence, while the automatically generated messages often have multiple lines. Furthermore, we found that the commit messages often begin with a verb followed by an direct object. This finding inspired us to use a "verb+object" format in this paper to generate short commit summaries. We split the approach into two parts: verb generation and object generation. As our first try, we trained a classifier to classify a diff to a verb. We are seeking feedback from the community before we continue to work on generating direct objects for the commits.
Analyzing and understanding source code changes is important in a variety of software maintenance tasks. To this end, many code differencing and code change summarization methods have been proposed. For some tasks (e....
详细信息
ISBN:
(数字)9781450359375
ISBN:
(纸本)9781450359375
Analyzing and understanding source code changes is important in a variety of software maintenance tasks. To this end, many code differencing and code change summarization methods have been proposed. For some tasks (e.g. code review and software merging), however, those differencing methods generate too fine-grained a representation of code changes, and those summarization methods generate too coarse-grained a representation of code changes. Moreover, they do not consider the relationships among code changes. Therefore, the generated differences or summaries make it not easy to analyze and understand code changes in some software maintenance tasks. In this paper, we propose a code differencing approach, named CLDIFF, to generate concise linked code differences whose granularity is in between the existing code differencing and code change summarization methods. The goal of CLDIFF is to generate more easily understandable code differences. CLDIFF takes source code files before and after changes as inputs, and consists of three steps. First, it pre-processes the source code files by pruning unchanged declarations from the parsed abstract syntax trees. Second, it generates concise code differences by grouping fine-grained code differences at or above the statement level and describing high-level changes in each group. Third, it links the related concise code differences according to five pre-defined links. Experiments with 12 Java projects (74,387 commits) and a human study with 10 participants have indicated the accuracy, conciseness, performance and usefulness of CLDIFF.
Software written in legacy programming languages is notoriously ubiquitous and often comprises business-critical portions of codebases and portfolios. Some of these languages, like COBOL, mature, grow, and acquire mod...
详细信息
ISBN:
(纸本)9783030641481;9783030641474
Software written in legacy programming languages is notoriously ubiquitous and often comprises business-critical portions of codebases and portfolios. Some of these languages, like COBOL, mature, grow, and acquire modern tooling that makes maintenance activities more bearable. Others, like many fourth generation languages (4GLs), stagnate and become obsolete and unmaintained, which first urges and eventually forces migrating to other languages, if the software needs to be kept in production. In this paper, we dissect a software modernisation process endorsed by Raincode Labs, utilised in particular to migrate software from a 4GL called PACBASE, to pure COBOL. Having migrated upwards of 500 MLOC of production code to COBOL using this process, the company has ample experience with this process. Nevertheless, we identify some improvement points and explain the technical side of a possible solution, based on migration log differencing, that is currently being put to the test by Raincode migration engineers.
Understanding code changes is of crucial importance in a wide range of software evolution activities. The traditional approach is to use textual differencing, as done with success since the 1970s with the ubiquitous d...
详细信息
ISBN:
(纸本)9798400702174
Understanding code changes is of crucial importance in a wide range of software evolution activities. The traditional approach is to use textual differencing, as done with success since the 1970s with the ubiquitous diff tool. However, textual differencing has the important limitation of not aligning the changes to the syntax of the source code. To overcome these issues, structural (i.e. syntactic) differencing has been proposed in the literature, notably GumTree which was one of the pioneering approaches. The main drawback of GumTree's algorithm is the use of an optimal, but expensive tree-edit distance algorithm that makes it difficult to diff large ASTs. In this article, we describe a less expensive heuristic that enables GumTree to scale to large ASTs while yielding results of better quality than the original GumTree. We validate this new heuristic against 4 datasets of changes in two different languages, where we generate edit-scripts with a median size 50% smaller and a total speedup of the matching time between 50x and 281x.
暂无评论