检索结果-内蒙古大学图书馆

Understanding security vulnerabilities in student code: A case study in a non-security course

JOURNAL OF SYSTEMS AND SOFTWARE 2022年 185卷 111150-111150页

作者： Yilmaz, Tolga Ulusoy, Ozgur Bilkent Univ Dept Comp Engn TR-06800 Ankara Turkey

Secure coding education is quite important for students to acquire the skills to quickly adapt to the evolving threats towards the software they are expected to create once they graduate. Educators are also more aware of this situation and incorporate teaching security in their respective fields. An effective application of this is only possible by cultivating the teaching and learning perspectives. Understanding the security awareness and practice of students is useful as an initial step to create a baseline for teaching methods and content. In this paper, we first survey to investigate how students approach security and what could motivate them to learn and apply security practices. Then, we analyze the source code for 6 semesters of coding assignments for 2 tasks using a source code vulnerability analysis tool. In our analysis, we report the types of vulnerabilities and various aspects between them while incorporating the effect of student grades. We then explore the lexical and structural features of security in student code using data analysis and machine learning. For the lexical analysis, we build a classifier to extract informative features and for the structural analysis, we utilize Syntax Trees to represent code and perform clustering in terms of structural features where clusters themselves yield different vulnerability levels. (C) 2021 Published by Elsevier Inc.

关键词： Secure coding education source code analysis Data mining Vulnerability analysis

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Detection of Solving Strategies for Competitive Programming 22nd

Unsupervised Detection of Solving Strategies for Competitive...

引用

22nd International Conference on Intelligent Data Engineering and Automated Learning

作者： Stoica, Alexandru Stefan Balbiceanu, Daniel Mihaescu, Marian Cristian Rebedea, Traian Univ Craiova Craiova Romania Univ Politehn Bucuresti Bucharest Romania

ISBN: (纸本)9783030916077;9783030916084

Transformers are becoming more and more used for solving various Natural Language Processing tasks. Recently, they have also been employed to process source code to analyze very large code-bases automatically. This paper presents a custom-designed data analysis pipeline that can classify source code from competitive programming solutions. Our experiments show that the proposed models accurately determine the number of distinct solutions for a programming challenge task, even in an unsupervised setting. Together with our model, we also introduce a new dataset called AlgoSol-10 for this task that consists of ten programming problems together with all the source code submissions manually clustered by experts based on the algorithmic solution used to solve each problem. Taking into account the success of the approach on small source codes, we discuss the potential of further using transformers for the analysis of large code bases.

关键词： Transformers Competitive programming source code analysis Unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

How Do Programmers Express High-Level Concepts using Primitive Data Types? 28

How Do Programmers Express High-Level Concepts using Primiti...

引用

28th Asia-Pacific Software Engineering Conference (APSEC)

作者： Shinyama, Yusuke Arahori, Yoshitaka Gondow, Katsuhiko Tokyo Inst Technol Dept Name Org Meguro Ku Tokyo Japan

ISBN: (纸本)9781665437844

We investigated how programmers express high-level concepts such as path names and coordinates using primitive data types. While relying too much on primitive data types is sometimes criticized as a bad smell, it is still a common practice among programmers. We propose a novel way to accurately identify expressions for certain predefined concepts by examining API calls. We defined twelve conceptual types used in the Java Standard API. We then obtained expressions for each conceptual type from 26 open source projects. Based on the expressions obtained, we trained a decision tree-based classifier. It achieved 83% F-score for correctly predicting the conceptual type for a given expression. Our result indicates that it is possible to infer a conceptual type from a source code reasonably well once enough examples are given. The obtained classifier can be used for potential bug detection, test case generation and documentation.

关键词： Program comprehension Software maintenance source code analysis Dataflow analysis Conceptual types

来源：评论

学校读者我要写书评

暂无评论

Improving Semantic Consistency of Variable Names with Use-Flow Graph analysis 28

Improving Semantic Consistency of Variable Names with Use-Fl...

引用

28th Asia-Pacific Software Engineering Conference (APSEC)

作者： Shinyama, Yusuke Arahori, Yoshitaka Gondow, Katsuhiko Tokyo Inst Technol Dept Name Org Meguro Ku Tokyo Japan

ISBN: (纸本)9781665437844

Consistency is one of the keys to maintainable source code and hence a successful software project. We propose a novel method of extracting the intent of programmers from source code of a large project (similar to 300 kLOC) and checking the semantic consistency of its variable names. Our system learns a projectspecific naming convention for variables based on its role solely from source code, and suggest alternatives when it violates its internal consistency. The system can also show the reasoning why a certain variable should be named in a specific way. The system does not rely on any external knowledge. We applied our method to 12 open-source projects and evaluated its results with human reviewers. Our system proposed alternative variable names for 416 out of 1080 (39%) instances that are considered better than ones originally used by the developers. Based on the results, we created patches to correct the inconsistent names and sent them to its developers. Three open-source projects adopted it.

关键词： Program comprehension source code analysis Dataflow analysis Software maintenance Naming Semantic consistency

来源：评论

学校读者我要写书评

暂无评论

An Empirical Study on the Usage and Evolution of Identifier Styles in Practice 28

An Empirical Study on the Usage and Evolution of Identifier ...

引用

28th Asia-Pacific Software Engineering Conference (APSEC)

作者： Zhang, Jingxuan Zou, Weiqin Huang, Zhiqiu Nanjing Univ Aeronaut & Astronaut Coll Comp Sci & Technol Nanjing Peoples R China

ISBN: (纸本)9781665437844

Identifiers play an important role in helping developers comprehend and maintain source code. In practice, developers usually employ two widely-used identifier styles, i.e., snake case and camel case, to format identifiers to make them understandable and informative. Despite researchers have empirically investigated the impacts of identifier styles on code comprehension activities, the usage and evolution of identifier styles, however, have not been fully explored. How are individual identifier styles formed in practice? How would identifier styles change and evolve? What are the potential impacts of identifier style-changes? Questions like these are important but have not been fully answered yet. In this paper, we conducted an empirical study on 9,792 GitHub projects to gain some insights into these problems. Specifically, we first analyzed how different identifier styles were formed in real software projects. Next, we explored the change patterns of identifier styles along with the project evolution. Finally, we investigated the potential impacts as well as categories of identifier style-changes. Our empirical results achieved some interesting findings. For example, we first reported some identifier style-change patterns (e.g., snake case -> camel case -> snake case), which could help developers resolve style-change problems in practice. Our study also provided some hints for researchers and developers when they use specific identifier styles in programs. For example, when researchers explore the impacts of identifier styles on code comprehension, they are suggested to consider the imbalanced distribution phenomenon of individual identifier styles. Besides, it is worthwhile for developers to build an identifier style-change prediction and propagation tool to reduce the style-change costs.

关键词： source code analysis Identifier Style Empirical Study GitHub

来源：评论

学校读者我要写书评

暂无评论

A Novel Detection Method for the Security Vulnerability of Time-of-Check to Time-of-Use

引用

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 2022年第6期38卷 1171-1188页

作者： Zhuang, YungYu Tseng, Yao-Nang Natl Cent Univ Dept Comp Sci & Informat Engn Taoyuan 32001 Taiwan

Since Artificial Intelligence (AI) is applied to various applications for intelligent and automatic processing, ensuring systems security is even important. Many developers still prefer C-like languages for flexibility, usability, and historical reasons to implement underlay systems, though other languages support more modern features. As a result of lacking higher-level abstraction and exception handling, languages like C are known to risk several security vulnerabilities. Time-of-Check to Time-of-Use (TOCTOU) is one of the security vulnerabilities in C codes, a kind of bug caused by race conditions. Unexpected use of certain function calls might be executed and result in failure or abnormal behaviors of systems if someone injects malicious operations between the time of check on system status and the use of the check result. Several research activities on code analysis, including static and dynamic approaches, were devoted to developing detection methods, but there is room for improvement. We propose a novel method to statically detect the TOCTOU vulnerability and implement a tool built atop of a solid static analyzer to show the feasibility of our idea. Our tool was evaluated with the test cases for TOCTOU vulnerabilities and compared with existing detection methods. The results show that our method can detect TOCTOU vulnerabilities more accurately and cover all possible paths in the source code.

关键词： security vulnerability source code analysis static analysis time-of-check to time-of-use TOCTOU

来源：评论

学校读者我要写书评

暂无评论

Java decompiler diversity and its application to meta-decompilation

引用

JOURNAL OF SYSTEMS AND SOFTWARE 2020年 168卷 110645-110645页

作者： Harrand, Nicolas Soto-Valero, Cesar Monperrus, Martin Baudry, Benoit KTH Royal Inst Technol SE-10044 Stockholm Sweden

During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. The highest ranking decompiler in this study produces syntactically correct, and semantically equivalent code output for 84%, respectively 78%, of the classes in our dataset. Our results demonstrate that each decompiler correctly handles a different set of bytecode classes. We propose a new decompiler called Arlecchino that leverages the diversity of existing decompilers. To do so, we merge partial decompilation into a new one based on compilation errors. Arlecchino handles 37.6% of bytecode classes that were previously handled by no decompiler. We publish the sources of this new bytecode decompiler. (C) 2020 Published by Elsevier Inc.

关键词： Java bytecode Decompilation Reverse engineering source code analysis

来源：评论

学校读者我要写书评

暂无评论

Decompiled APK based malicious code classification

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2020年 110卷 135-147页

作者： Mateless, Roni Rejabek, Daniel Margalit, Oded Moskovitch, Robert Ben Gurion Univ Negev Beer Sheva Israel IBM Corp Cyber Secur Ctr Excellence Petah Tiqwa Israel

Due to the increasing growth in the variety of Android malware, it is important to distinguish between the unique types of each. In this paper, we introduce the use of a decompiled source code for malicious code classification. This decompiled source code provides deeper analysis opportunities and understanding of the nature of malware. Malicious code differs from text due to syntax rules of compilers and the effort of attackers to evade potential detection. Hence, we adapt Natural Language Processing-based techniques under some constraints for malicious code classification. First, the proposed methodology decompiles the Android Package Kit files, then API calls, keywords, and non-obfuscated tokens are extracted from the source code and categorized to stop-tokens, feature-tokens, and long-tail-tokens. We also introduce the use of generalized N-tokens to represent tokens that are typically less frequent. Our approach was evaluated, in comparison to the use of API calls and permissions for features, as a baseline, and their combination, as well as in comparison to the use of neural network architectures based on decompiled Android Package Kits. A rigorous evaluation of comprehensive public real-world Android malware datasets, including 24,553 apps that were categorized to 71 families for the malicious families classification, and 60,000 apps for malicious code detection was performed. Our approach outperformed the baselines in both tasks. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Android malware Malicious code source code analysis

来源：评论

学校读者我要写书评

暂无评论

ConFuzz-A Concurrency Fuzzer 1st

ConFuzz-A Concurrency Fuzzer

引用

1st International Conference on Sustainable Technologies for Computational Intelligence (ICTSCI)

作者： Vinesh, Nischai Sethumadhavan, M. Amrita Vishwa Vidyapeetham Amrita Sch Engn TIFAC CORE Cyber Secur Coimbatore Tamil Nadu India

ISBN: (纸本)9789811500299;9789811500282

Concurrency bugs are as equally vulnerable as the bugs found in the single-threaded programs and these bugs can be exploited using concurrency attacks. Unfortunately, there is not much literature available in detecting various kinds of concurrency issues in a multi-threaded program due to its complexity and uncertainty. In this paper, we aim at detecting concurrency bugs by using directed evolutionary fuzzing with the help of static analysis of the source code. Concurrency bug detection involves two main entities: an input and a particular thread execution order. The evolutionary part of fuzzing will prefer inputs that involve memory access patterns across threads (data flow interleaving) and thread ordering that disturb the data dependence more and direct them to trigger concurrency bugs. This paper suggests the idea of a concurrency fuzzer, which is first of its kind. We use a combination of LLVM, Thread Sanitizer and fuzzing techniques to detect various concurrency issues in an application. The source code of the application is statically analyzed for various paths, from the different thread related function calls to the main function. Every basic block in these paths are assigned a unique ID and a weight based on the distance of the basic block from the thread function calls. These basic blocks are instrumented to print their ID and weight upon execution. The knowledge about the basic blocks in the sliced paths are used to generate new sets of inputs from the old ones, thus covering even more basic blocks in the path and thereby increasing the chances of hitting a concurrency warning. We use Thread Sanitizer present in the LLVM compiler infrastructure to detect the concurrency bug warnings while executing each input. The inputs are directed to discover even new address locations with possible concurrency issues. The system was tested on three simple multi-threaded applications pigz, pbzip2, and pixz. The results show a quicker detection of unique addresses in th

关键词： Concurrency fuzzing Concurrency bugs LLVM Fuzzing Static analysis source code analysis

来源：评论

学校读者我要写书评

暂无评论

Querying Big source code 8

Querying Big Source Code

引用

8th IEEE International Conference on Big Data (Big Data)

作者： Garcia-Alvarado, Carlos Ordonez, Carlos Autonomic LLC Palo Alto CA 94304 USA Univ Houston Dept Comp Sci Houston TX 77204 USA

ISBN: (纸本)9781728162515

Software compliance, auditing, and maintainability of large application repositories force organizations to rely on source code analysis tools to identify code vulnerabilities, data flows, technical debt, and bugs. We propose a novel method to identify data flows within an application by analyzing the code traces or `links` that exist between the code and the data. Our application, sourceDB, leverages a relational database system as the backend to perform such discovery and computations. Our experiments show that sourceDB is able to process, analyze, and query the data source, logs, and source code in seconds.

关键词： keyword search source code analysis SQL

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：