In this work, we analyze the state of the art in source code analysis area with a focus on plagiarism detection and provide a proposal for a future work in this area. Detection of plagiarism combines the detection of ...
详细信息
In this work, we analyze the state of the art in source code analysis area with a focus on plagiarism detection and provide a proposal for a future work in this area. Detection of plagiarism combines the detection of clones and methods for determining similarity. Nowadays, there are several approaches that can be divided into three levels. The first one is text based and uses plain text as an input. The second level is token based. The top level is model based and uses models to represent sourcecode. These advanced algorithms (token and model based) can't work with large datasets. We believe the future belongs to the algorithms that will be able to handle large amount of sourcecode. These algorithms should use one of model-based representations. They can be used for formation of large-scale anti-plagiarism systems. They can be used also in the area of sourcecode optimization. (C) 2017 The Authors. Published by Elsevier Ltd.
There has been an ongoing trend towards open and shared sourcecode that is published on the Internet in large software repositories to support collaborative development processes. While traditional sourcecode analys...
详细信息
ISBN:
(纸本)9780769547367
There has been an ongoing trend towards open and shared sourcecode that is published on the Internet in large software repositories to support collaborative development processes. While traditional source code analysis techniques perform well in single project contexts, new types of global source code analysis techniques are slowly introduced to address the analysis of global distributed and often incomplete sourcecode. In this article, we discuss how the Semantic Web, an enabling technology for these emerging source code analysis domains, can support a standardized, formal, and semantic rich representation to model these corpora. We also illustrate how inference services can be used to provide support for emerging source code analysis approaches on this data, such as search, call graph construction, and clone detection.
The optimal number of latent topics required to model the most accurate latent substructure for a sourcecode corpus is an open question in source code analysis. Most estimates about the number of latent topics that e...
详细信息
ISBN:
(纸本)9780769541785
The optimal number of latent topics required to model the most accurate latent substructure for a sourcecode corpus is an open question in source code analysis. Most estimates about the number of latent topics that exist in a software corpus are based on the assumption that the data is similar to natural language, but there is little empirical evidence to support this. In order to help determine the appropriate number of topics needed to accurately represent the sourcecode, we generate a series of Latent Dirichlet Allocation models with varying topic counts. We use a heuristic to evaluate the ability of the model to identify related sourcecode blocks, and demonstrate the consequences of choosing too few or too many latent topics.
source code analysis and Manipulation (SCAM) underpins virtually every operational software system. Despite the impact and ubiquity of SCAM principles and techniques in software engineering, there are still frontiers ...
详细信息
ISBN:
(纸本)9780769543475
source code analysis and Manipulation (SCAM) underpins virtually every operational software system. Despite the impact and ubiquity of SCAM principles and techniques in software engineering, there are still frontiers to be explored. Looking "inward" to existing techniques, one finds frontiers of performance, efficiency, accuracy, and usability;looking "outward" one finds new languages, new problems, and thus new approaches. This paper presents a reflective framework for characterizing source languages and domains. It draws on current research projects in music program analysis, musical score processing, and machine knitting to identify new frontiers for SCAM. The paper also identifies opportunities for SCAM to inspire, and be inspired by, problems and techniques in other domains.
The Industrial Control Systems (ICS) and its sub-processes, hardware and software, make possible the management and operation of industrial critical infrastructure and services such as: energy, water, defense, transpo...
详细信息
ISBN:
(纸本)9780738112657
The Industrial Control Systems (ICS) and its sub-processes, hardware and software, make possible the management and operation of industrial critical infrastructure and services such as: energy, water, defense, transportation. Nowadays, the biggest vendors on the market started developing for the ICS marketplace new systems with more power, control, stability, but these complex systems are susceptible to different threats such as: insider attack, thirdparty, technical or physical failure, external attacks. Therefore, is critical to protect ICS assets. Paying attention to the ISA/IEC 62443 standard, this paper is proposing methods for source code analysis using open source tools that can be used in development or testing phase by ICS professionals in order to detect new vulnerabilities and bugs (e.g. weak encryption, code disclosure, clear text passwords) using a vulnerability remediation management tool in order to have a complete view of new and existing security breaches. The purpose of this research paper is providing valuable information to ICS developers to increase security level in the production area with very little effort for the Internet exposed Programmable Logic Controllers (PLC).
作者:
Harman, MarkUCL
Dept Comp Sci CREST Ctr Malet Pl London WC1E 6BT England
This paper(1) makes a case for source code analysis and Manipulation. The paper argues that it will not only remain important, but that its importance will continue to grow. This argument is partly based on the 'l...
详细信息
ISBN:
(纸本)9780769541785
This paper(1) makes a case for source code analysis and Manipulation. The paper argues that it will not only remain important, but that its importance will continue to grow. This argument is partly based on the 'law' of tendency to executability, which the paper introduces. The paper also makes a case for source code analysis purely for the sake of analysis. analysis for its own sake may not be merely indulgent introspection. The paper argues that it may ultimately prove to be hugely important as sourcecode gradually gathers together all aspects of human socioeconomic and governmental processes and systems.
Last years developers became interested in competitive programming, producing the burst of online platforms which design competitions, where a set of programming problems should be solved. People realize this format i...
详细信息
Last years developers became interested in competitive programming, producing the burst of online platforms which design competitions, where a set of programming problems should be solved. People realize this format is also useful for recruiting and training purposes. As developers participate in contests they gain points and are classified in rankings, according to the performance they exhibit in each competition. Today, these ranks are analogous to reputation systems since they indicate the programmers' expertise. In this thesis dissertation, it is explored the relationship between this reputation and sourcecode features, extracted via source code analysis, of the solutions sent by participants. The aim of this research is to find a set of features, classified into categories lexical, syntactic, quality and readability, which are shared by developers in the same level of knowledge. These features allow us to build a model to predict the programmers' expertise, considering two classes to divide the developers, who can be assigned to beginners or advanced. In that sense, an experiment was designed to create a classifier model to accomplish that task. Overall, the results look promising, achieving an approximate accuracy of 80 %, and having the lexical features as the most relevant for the model.
There has been an ongoing trend toward collaborative software development using open and shared sourcecode published in large software repositories on the Internet. While traditional source code analysis techniques p...
详细信息
There has been an ongoing trend toward collaborative software development using open and shared sourcecode published in large software repositories on the Internet. While traditional source code analysis techniques perform well in single project contexts, new types of source code analysis techniques are ermerging, which focus on global source code analysis challenges. In this article, we discuss how the Semantic Web, can become an enabling technology to provide a standardized, formal, and semantic rich representations for modeling and analyzing large global sourcecode corpora. Furthermore, inference services and other services provided by Semantic Web technologies can be used to support a variety of core source code analysis techniques, such as semantic code search, call graph construction, and clone detection. In this paper, we introduce SeCold, the first publicly available online linked data sourcecode dataset for software engineering researchers and practitioners. Along with its dataset, SeCold also provides some Semantic Web enabled core services to support the analysis of Internet-scale sourcecode repositories. We illustrated through several examples how this linked data combined with Semantic Web technologies can be harvested for different source code analysis tasks to support software trustworthiness. For the case studies, we combine both our linked-data set and Semantic Web enabled source code analysis services with knowledge extracted from StackOverflow, a crowdsourcing website. These case studies, we demonstrate that our approach is not only capable of crawling, processing, and scaling to traditional types of structured data (e.g., sourcecode), but also supports emerging non-structured data sources, such as crowdsourced information (e.g., ***) to support a global source code analysis context. (C) 2013 Elsevier Inc. All rights reserved.
In this work, we analyze the state of the art in source code analysis area with a focus on plagiarism detection and provide a proposal for a future work in this area. Detection of plagiarism combines the detection of ...
详细信息
In this work, we analyze the state of the art in source code analysis area with a focus on plagiarism detection and provide a proposal for a future work in this area. Detection of plagiarism combines the detection of clones and methods for determining similarity. Nowadays, there are several approaches that can be divided into three levels. The first one is text based and uses plain text as an input. The second level is token based. The top level is model based and uses models to represent sourcecode. These advanced algorithms (token and model based) can’t work with large datasets. We believe the future belongs to the algorithms that will be able to handle large amount of sourcecode. These algorithms should use one of model-based representations. They can be used for formation of large-scale anti-plagiarism systems. They can be used also in the area of sourcecode optimization.
暂无评论