检索结果-内蒙古大学图书馆

ACM TRANSACTIONS ON COMPUTING EDUCATION 2019年第3期19卷 27-27页

作者： Novak, Matija Joy, Mike Kermek, Dragutin Univ Zagreb Fac Org & Informat Pavlinska 2 Varazhdin 42000 Croatia Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England

Teachers deal with plagiarism on a regular basis, so they try to prevent and detect plagiarism, a task that is complicated by the large size of some classes. Students who cheat often try to hide their plagiarism (obfuscate), and many different similarity detection engines (often called plagiarism detection tools) have been built to help teachers. This article focuses only on plagiarism detection and presents a detailed systematic review of the field of source-code plagiarism detection in academia. This review gives an overview of definitions of plagiarism, plagiarism detection tools, comparison metrics, obfuscation methods, datasets used for comparison, and algorithm types. Perspectives on the meaning of source-code plagiarism detection in academia are presented, together with categorisations of the available detection tools and analyses of their effectiveness. While writing the review, some interesting insights have been found about metrics and datasets for quantitative tool comparison and categorisation of detection algorithms. Also, existing obfuscation methods classifications have been expanded together with a new definition of "source-code plagiarism detection in academia."

关键词： source-code plagiarism similarity detection academia education programming systematic review

来源：评论

学校读者我要写书评

暂无评论

Overview of the PAN@FIRE 2020 Task on the Authorship Identification of source code 12

Overview of the PAN@FIRE 2020 Task on the Authorship Identif...

引用

12th Annual Meeting of the Forum-for-Information-Retrieval-Evaluation (FIRE)

作者： Fadel, Ali Musleh, Husam Tuffaha, Ibraheem Al-Ayyoub, Mahmoud Jararweh, Yaser Benkhelifa, Elhadj Jordan Univ Sci & Technol Irbid Jordan Duquesne Univ Pittsburgh PA 15219 USA Staffordshire Univ Stoke On Trent Staffs England

ISBN: (纸本)9781450389785

Authorship identification is essential to the detection of undesirable deception of others' content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for programming languages. Accordingly, a PAN@FIRE task, named Authorship Identification of source code (AI-SOCO), is proposed with the focus on the identification of source code authors. The dataset consists of crawled source codes submitted by the top 1,000 human users with 100 correct C++ submissions or more from the codeForces online judge platform. The participating systems are asked to predict the author of a given source code from the predefined list of code authors. In total, 60 teams registered on the task's CodaLab page. Out of them, 14 teams submitted 94 runs. The results are surprisingly high with many teams and baselines breaking the 90% accuracy barrier. These systems used a wide range of models and techniques from pretrained word embeddings (especially, those that are tweaked to handle source code) to stylometric features.

关键词： authorship-identification source-code datasets

来源：评论

学校读者我要写书评

暂无评论

Human Languages in source code: Auto-Translation for Localized Instruction 20

Human Languages in Source Code: Auto-Translation for Localiz...

引用

7th Conference on Learning at Scale

作者： Piech, Chris Abu-El-Haija, Sami Stanford Univ Stanford CA 94305 USA USC Informat Sci Inst Marina Del Rey CA USA

ISBN: (纸本)9781450379519

Computer science education has promised open access around the world, but access is largely determined by what human language you speak. As younger students learn computer science it is less appropriate to assume that they should learn English beforehand. To that end, we present codeInternational, the first tool to translate code between human languages. To develop a theory of non-English code, and inform our translation decisions, we conduct a study of public code repositories on GitHub. The study is to the best of our knowledge the first on human-language in code and covers 2.9 million Java repositories. To demonstrate codeInternational's educational utility, we build an interactive version of the popular English-language Karel reader and translate it into 100 spoken languages. Our translations have already been used in classrooms around the world, and represent a first step in an important open CS-education problem.

关键词： human-language translation source-code github

来源：评论

学校读者我要写书评

暂无评论

Overview of the [email protected] 2020 Task on the Authorship Identification of source code 20

Overview of the [email protected] 2020 Task on the Authorshi...

引用

Proceedings of the 12th Annual Meeting of the Forum for Information Retrieval Evaluation

作者： Ali Fadel Husam Musleh Ibraheem Tuffaha Mahmoud Al-Ayyoub Yaser Jararweh Elhadj Benkhelifa Paolo Rosso Jordan University of Science and Technology Jordan Duquesne University USA Staffordshire University UK Universitat Politècnica de València Spain

ISBN: (纸本)9781450389785

Authorship identification is essential to the detection of undesirable deception of others’ content misuse or exposing the owners of some anonymous malicious content. While it is widely studied for natural languages, it is rarely considered for programming languages. Accordingly, a [email protected] task, named Authorship Identification of source code (AI-SOCO), is proposed with the focus on the identification of source code authors. The dataset consists of crawled source codes submitted by the top 1,000 human users with 100 correct C++ submissions or more from the codeForces online judge platform. The participating systems are asked to predict the author of a given source code from the predefined list of code authors. In total, 60 teams registered on the task’s CodaLab page. Out of them, 14 teams submitted 94 runs. The results are surprisingly high with many teams and baselines breaking the 90% accuracy barrier. These systems used a wide range of models and techniques from pretrained word embeddings (especially, those that are tweaked to handle source code) to stylometric features.

关键词： source-code authorship-identification datasets

来源：评论

学校读者我要写书评

暂无评论

Calibration of source-code similarity detection tools for objective comparisons 41

Calibration of source-code similarity detection tools for ob...

引用

41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

作者： Novak, M. Kermek, D. Joy, M. Univ Zagreb Fac Org & Informat Varazhdin Croatia Univ Warwick Dept Comp Sci Coventry W Midlands England

ISBN: (纸本)9789532330953

Today there are many source-code similarity detection tools. These tools are used for many purposes and one of them is plagiarism detection, in which context this paper is written. Every time a new tool is developed authors want to show that it is better than existing ones, and so they perform comparisons. Often these comparisons tend to be unfair towards the existing tools, for which there can be multiple reasons, such as the lack of calibration of existing tools. Almost all tools have configuration parameters, but often they are not calibrated before the comparison. The paper presents a way of calibrating the tools to keep the comparison more objective.

关键词： source-code plagiarsim similarity detection calibration

来源：评论

学校读者我要写书评

暂无评论

Perceptual Comparison of source-code Plagiarism within Students from UK, China, and South Cyprus Higher Education Institutions

引用

ACM TRANSACTIONS ON COMPUTING EDUCATION 2017年第2期17卷 8-8页

作者： Cosma, Georgina Joy, Mike Sinclair, Jane Andreou, Margarita Zhang, Dongyong Cook, Beverley Boyatt, Russell Nottingham Trent Univ Sch Sci & Technol Nottingham England Univ Warwick Dept Comp Sci Warwick England PA Coll Dept Business Comp Larnax Cyprus Henan Agr Univ Coll Informat & Management Sci Zhengzhou Peoples R China

Perspectives of students on what constitutes source-code plagiarism may differ based on their educational background. Surveys have been conducted with home students undertaking computing and joint computing subject degrees at higher education institutions throughout the UK, China, and South Cyprus, and a total of 984 responses have been statistically analysed to determine the common areas of understanding and misunderstanding among students on various topics related to source-code plagiarism. The study identifies those topics which are well understood, and those topics which are not properly understood across the different groups of students, and is the first study which specifically discusses Cypriot student perceptions on source-code plagiarism. This study provides useful information to educators (teaching home and international students) who wish to better inform their students on the issues of plagiarism and source-code plagiarism. Finally, the survey results revealed that although students who were informed about plagiarism better understood what actions constitute plagiarism, some topics were still unclear among students regardless of the students' educational background and whether they had been previously informed about plagiarism.

关键词： Software source-code plagiarism academic integrity China Cyprus UK

来源：评论

学校读者我要写书评

暂无评论

Energy-Aware GPU Programming at source-code Levels

引用

Tsinghua Science and Technology 2012年第3期17卷 278-286页

作者： Changyou Zhang Kun Huang Xiang Cui Yifeng Chen Key Laboratory of High Confidence Software Technologies(Peking University)School of Electronics Engineering and Computer Science Peking UniversityBeijing 100871China

To enhance the energy efficiency and performance of algorithms with Graphics Processing Unit （GPU） accelerators in source-code development, we consider the power efficiency based on data transfer bandwidth and power consumption in key situations. First, a set of primitives is abstracted from program statements. Then, data transfer bandwidth and power consumption in different granularity sizes are consid- ered and mapped into proper primitives. With these mappings, a programmer can intuitively determine the power efficiency and performance in different running states of a thread. Finally, this intuition enables the programmer to tune the algorithm in order to achieve the best energy efficiency and performance. Using these power-aware principles, two Fast Fourier Transform （FFT） methods are compared. The mapping be- tween power consumption and primitives is helpful for algorithm tuning in source-code levels.

关键词： GPU power-aware source-code primitive

来源：评论

学校读者我要写书评

暂无评论

Batch source-code Plagiarism Detection Using an Algorithm for the Bounded Longest Common Subsequence Problem

Batch Source-Code Plagiarism Detection Using an Algorithm fo...

引用

9th International Conference on Electrical Engineering, Computing Science and Automatic Control

作者： Campos, R. A. Castro Martinez, F. J. Zaragoza UAM Azcapotzalco Dept Sistemas Mexico City DF Mexico

ISBN: (纸本)9781467321686;9781467321709

source-code plagiarism detection is an unfortunate but necessary activity when reviewing assignments of programming courses. While being reasonably easy to fool, string-based comparisons offer a high degree of accuracy with almost no false positives and usually a good string similarity metric is the length of their longest common subsequence. In the case of two strings, the dynamic programming algorithm for this calculation unfortunately takes quadratic time even if the strings are equal. In this paper we present an algorithm that, given a batch of source-code files, efficiently finds all pairs of similar files by preprocessing the files and then using a fast branch-and-bound algorithm to find only those pairs whose longest common subsequence is indicative of plagiarism.

关键词： Plagiarism detection longest common subsequence branch and bound source-code

来源：评论

学校读者我要写书评

暂无评论

Plagiarism Detection in Programming Assignments Using Deep Features 4

Plagiarism Detection in Programming Assignments Using Deep F...

引用

4th IAPR Asian Conference on Pattern Recognition (ACPR)

作者： Yasaswi, Jitendra Purini, Suresh Jawahar, C. V. IIIT Hyderabad Hyderabad India

ISBN: (纸本)9781538633540

This paper proposes a method for detecting plagiarism in source-codes using deep features. The embeddings for programs are obtained using a character-level Recurrent Neural Network (char-RNN), which is pre-trained on Linux Kernel source-code. Many popular plagiarism detection tools are based on n-gram techniques at syntactic level. However, these approaches to plagiarism detection fail to capture long term dependencies (non-contiguous interaction) present in the source-code. Contrarily, the proposed deep features capture non-contiguous interaction within n-grams. These are generic in nature and there is no need to fine-tune the char-RNN model again to program submissions from each individual problem-set. Our experiments show the effectiveness of deep features in the task of classifying assignment program submissions as copy, partial-copy and non-copy. Comparing our proposed features with handcrafted features (source-code metrics and textual features), we report f1-score improvement of 9.5% for binary classification and 5% for three-way classification tasks respectively.

关键词： deep features recurrent neural networks plagiarism detection source-code

来源：评论

学校读者我要写书评

暂无评论

Towards a source-code Oriented Attestation

引用

China Communications 2009年第4期6卷 82-87页

作者： Ruan Anbang Shen Qingni Wang Li Qin Chao Gu Liang Chen Zhong School of Software and Microelectronics Peking University Beijing 102600 China Network and Information Security Laboratory Institute of Software School of Electronics Engineering and Computer Science Peking University Beijing 100871 China Key Laboratory of High Confidence Software Technologies of the Ministry of Education Peking University Beijing 100871 China First Research Institute of Ministry of Public Security of China Beijing 100048 China

The Binary-based attestation （BA） mechanism presented by the Trusted Computing Group can equip the application with the capability of genuinely identifying configurations of remote system. However, BA only supports the attestation for specific patterns of binary codes defined by a trusted party, mostly the software vendor, for a particular version of a software. In this paper, we present a source-code Oriented Attestation （SCOA） framework to enable custom built application to be attested to in the TCG attestation architecture. In SCOA, security attributes are bond with the source codes of an application instead of its binaries codes. With a proof chain generated by a Trusted Building System to record the building procedure, the challengers can determine whether the binary interacted with is genuinely built from a particular set of source codes. Moreover, with the security attribute certificates assigned to the source codes, they can determine the trustworthiness of the binary. In this paper, we present a TBS implementation with virtualization.

关键词： remote attestation trusted building system virtualization source-code

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：