Program comprehension usually focuses on the significance of textual information to capture the programmers' intent and knowledge in the software, in particular the sourcecode. In the sourcecode, most of the dat...
详细信息
ISBN:
(纸本)9781450329651
Program comprehension usually focuses on the significance of textual information to capture the programmers' intent and knowledge in the software, in particular the sourcecode. In the sourcecode, most of the data is unstructured data, such as the natural language text in comments and identifier names. Researchers in software engineering community have developed many techniques for handling such unstructured data, such as natural language processing (NLP) and information retrieval (IR). Before using the IR technique on the unstructured sourcecode, we must preprocess the sourcecode since these data is different from that used in our daily life. During this process, several operations, i.e, tokenization, splitting, stemming, etc., are usually used. These preprocessing operations will affect the quality of the data used in the IR process. But how these preprocessing operations affect the results of IR is still an open problem. To the best of our knowledge, there are still no studies focusing on this problem. This paper attempts to fill this gap, and conducts some empirical studies to show what are the differences before and after these preprocessing operations. The empirical results show some interesting phenomena based on using or not using these preprocessing operations.
Global sourcecode transformations, such as Global Loop Transformations (GLT), are usually performed on a Geometrical Model (GM) which is very effective in dealing with complex transformations. However, this model imp...
详细信息
Global sourcecode transformations, such as Global Loop Transformations (GLT), are usually performed on a Geometrical Model (GM) which is very effective in dealing with complex transformations. However, this model imposes strict limitations on the input code, and it is not capable to deal with data dependent conditions. The technique presented in this paper can deal with data dependent conditions at any loop level. At the outermost loop level hot code paths are grouped together into limited number of clusters called scenarios to maximise the GLT benefit for a given code size growth. On the middle and innermost loop level we manipulate the abstract syntax tree to move the data dependent conditions out of the GLT optimisation scope. Results show up to 45.8% improvement compared to state-of-the-art.
暂无评论