algorithm recognition, which is the problem of verifying whether a program implements a given algorithm, is an important topic in program analysis. We propose an approach for algorithm recognition in binary code. For ...
详细信息
ISBN:
(纸本)9781450341486
algorithm recognition, which is the problem of verifying whether a program implements a given algorithm, is an important topic in program analysis. We propose an approach for algorithm recognition in binary code. For this paper, we have chosen the Dalvik Virtual Machine (DVM) bytecode. Given an algorithm A that is compiled into a DVM method M-0, and a DVM program P that includes a series of methods {M-1,...M-n}, the approach is able to identify those blocks M from P that essentially implement the algorithm A. The technique we propose first translates binary code into Horn clauses. Then we consider programs as implementing the same algorithm if their Horn clause representations can be reduced to a single common set of Horn clauses by means of a sequence of transformations.
The automated recognition of algorithm implementations can support many software maintenance and reengineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing ineff...
详细信息
ISBN:
(纸本)9798350366464;9798350366471
The automated recognition of algorithm implementations can support many software maintenance and reengineering activities by providing knowledge about the concerns present in the code base. Moreover, recognizing inefficient algorithms like Bubble Sort and suggesting superior alternatives from a library can help in assessing and improving the quality of a system. Approaches from related work suffer from usability as well as scalability issues and their accuracy is not evaluated. In this paper, we investigate how well our approach based on the abstract syntax tree of a program performs for automatic algorithm recognition. To this end, we have implemented a prototype consisting of: A domain-specific language designed to capture the key features of an algorithm and used to express a search pattern on the abstract syntax tree, a matching algorithm to find these features, and an initial catalog of "ready to use" patterns. To create our search patterns we performed a web search using the algorithm name and described key features of the found reference implementations with our domain-specific language. We evaluate our prototype on a subset of the BigCloneEval benchmark containing algorithms like Fibonacci, Bubble Sort, and Binary Search. We achieve an average F-1-score of 0.74 outperforming the large language model Codellama which attains 0.35. Additionally, we use multiple code clone detection tools as a baseline for comparison, achieving a recall of 0.62 while the best-performing tool reaches 0.20.
This paper deals with the problem of deciding whether a System of Affine Recurrent Equations (SARE) is an instantiation of a SARE template. A solution to this problem would be a step toward algorithm template recognit...
详细信息
This paper deals with the problem of deciding whether a System of Affine Recurrent Equations (SARE) is an instantiation of a SARE template. A solution to this problem would be a step toward algorithm template recognition and open new perspectives in program analysis, optimization and parallelization. The problem is known to be undecidable and we show that there exists a semi-decision procedure, in which the key ingredient is the computation of transitive closures of affine relations. This is a non-effective process which has been extensively studied. We then describe the limitations of our algorithm and point to unsolved problems.
Program comprehension (PC) is a research field that has been extensively studied from different points of view, including human program understanding and mental models, automated program understanding, etc. In this pa...
详细信息
Program comprehension (PC) is a research field that has been extensively studied from different points of view, including human program understanding and mental models, automated program understanding, etc. In this paper, we discuss algorithm recognition (AR) as a subfield of PC and explain their relationship. We present a method for automatic AR from Java source code. The method is based on static analysis of program code including various statistics of language constructs, software metrics, as well as analysis of roles of variables in the target program. In the first phase of the method, a number of different implementations of the supported algorithms are analyzed and stored in the knowledge base of the system as learning data, and in the second phase, previously unseen algorithms are recognized using this information. We have developed a prototype and successfully applied the method for recognition of sorting algorithms. This process is explained in the paper along with the experiment we have conducted to evaluate the performance of the method. Although the method, at its current state, is still sensitive to changes made to target algorithms, the encouraging results of the experiment demonstrate that it can be further developed to be used as a PC method in various applications, as an example, in automatic assessment tools to check the algorithms used by students, the functionality that is currently missing from these tools.
We discuss algorithm recognition (AR) and present a method for recognizing algorithms automatically from Java source code. The method consists of two phases. In the first phase, the recognizable algorithms are convert...
详细信息
We discuss algorithm recognition (AR) and present a method for recognizing algorithms automatically from Java source code. The method consists of two phases. In the first phase, the recognizable algorithms are converted into the vectors of characteristics, which are computed based on static analysis of program code, including various statistics of language constructs and analysis of Roles of Variables in the target program. In the second phase, the algorithms are classified based on these vectors using the C4.5 decision tree classifier. We demonstrate the performance of the method by applying it to sorting algorithms. Using leave-one-out cross-validation technique, we have conducted an experimental evaluation of the classification performance showing that the average classification accuracy is 98.1% (the data set consisted of five different types of sorting algorithms). The results show the applicability and usefulness of roles of variables in AR, and illustrate that the C4.5 algorithm is a suitable decision tree classifier for our purpose. The limitations of the method are also discussed.
In this study, we examined freshmen students' sorting algorithm implementations in data structures and algorithms' course in two phases: at the beginning of the course before the students received any instruct...
详细信息
In this study, we examined freshmen students' sorting algorithm implementations in data structures and algorithms' course in two phases: at the beginning of the course before the students received any instruction on sorting algorithms, and after taking a lecture on sorting algorithms. The analysis revealed that many students have insufficient understanding of implementing sorting algorithms. For example, they include unnecessary swaps in their Insertion or Selection sort implementations resulting in more complicated and inefficient code. Based on the data, we present a categorization of these types of variations and discuss the implications of the results. In addition, we introduce an instrument to recognize these algorithms automatically. This is done in terms of white-box testing. Our aim is to develop an automatic assessment system to help teachers in the burden of marking students' assignments and give feedback to the students on their algorithmic solutions. We outline how the presented results can be used to develop the instrument further.
暂无评论