During compilation from java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of java code is not symmetric. Consequently, decompilation, which aims at prod...
详细信息
During compilation from java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. The highest ranking decompiler in this study produces syntactically correct, and semantically equivalent code output for 84%, respectively 78%, of the classes in our dataset. Our results demonstrate that each decompiler correctly handles a different set of bytecode classes. We propose a new decompiler called Arlecchino that leverages the diversity of existing decompilers. To do so, we merge partial decompilation into a new one based on compilation errors. Arlecchino handles 37.6% of bytecode classes that were previously handled by no decompiler. We publish the sources of this new bytecode decompiler. (C) 2020 Published by Elsevier Inc.
Modern software is bloated. Demand for new functionality has led developers to include more and more features, many of which become unneeded or unused as software evolves. This phenomenon, known as software bloat, res...
详细信息
ISBN:
(纸本)9781450370431
Modern software is bloated. Demand for new functionality has led developers to include more and more features, many of which become unneeded or unused as software evolves. This phenomenon, known as software bloat, results in software consuming more resources than it otherwise needs to. How to effectively and automatically debloat software is a long-standing problem in software engineering. Various debloating techniques have been proposed since the late 1990s. However, many of these techniques are built upon pure static analysis and have yet to be extended and evaluated in the context of modern java applications where dynamic language features are prevalent. To this end, we develop an end-to-end bytecode debloating framework called JSHRINK. It augments traditional static reachability analysis with dynamic profiling and type dependency analysis and renovates existing bytecode transformations to account for new language features in modern java. We highlight several nuanced technical challenges that must be handled properly and examine behavior preservation of debloated software via regression testing. We find that (1) JSHRINK is able to debloat our real-world java benchmark suite by up to 47% (14% on average);(2) accounting for dynamic language features is indeed crucial to ensure behavior preservation-reducing 98% of test failures incurred by a purely static equivalent, Jax, and 84% for ProGuard;and (3) compared with purely dynamic approaches, integrating static analysis with dynamic profiling makes the debloated software more robust to unseen test executions-in 22 out of 26 projects, the debloated software ran successfully under new tests.
BISM (bytecode-level Instrumentation for Software Monitoring) is a lightweight java bytecode instrumentation tool which features an expressive high-level control-flow-aware instrumentation language. The language follo...
详细信息
ISBN:
(纸本)9783030605087;9783030605070
BISM (bytecode-level Instrumentation for Software Monitoring) is a lightweight java bytecode instrumentation tool which features an expressive high-level control-flow-aware instrumentation language. The language follows the aspect-oriented programming paradigm by adopting the joinpoint model, advice inlining, and separate instrumentation mechanisms. BISM provides joinpoints ranging from bytecode instruction to method execution, access to comprehensive context information, and instrumentation methods. BISM runs in two modes: build-time and load-time. We demonstrate BISM effectiveness using two experiments: a security scenario and a general runtime verification case. The results show that BISM instrumentation incurs low runtime and memory overheads.
Sequences of duplicate code, either with or without modification, are known as code clones or just clones. Code clones are generally considered undesirable for a number of reasons, although they can offer some conveni...
详细信息
ISBN:
(纸本)9781538603673
Sequences of duplicate code, either with or without modification, are known as code clones or just clones. Code clones are generally considered undesirable for a number of reasons, although they can offer some convenience to developers. The detection of code clones helps to improve the quality of source code through software re-engineering. Numerous methods have been proposed for code clone detection in java code. However, the existing methods are mostly based on the java source code, while only a few focus on its bytecode;in fact, the java bytecode reflects more of the semantic nature of the source code than the source code itself does. In this paper, we propose a novel code clone detection method based on java bytecode. Using the block-level code fragments extracted from bytecode, it can simultaneously detect code clones at both method level and block level. In addition, during the process of code clone detection, the similarities of both method call sequences and instruction sequences are calculated in order to improve accuracy. We conduct two extensive experiments to evaluate the performance of our method. The results show that the proposed method can detect code clones more effectively than the state-of-the-art methods.
This paper presents a method for translation of java byte-code sequences into synthesizable hardware, using the Bluespec SystemVerilog (BSV) environment. At the core of our approach lies a BSV description of a subset ...
详细信息
ISBN:
(纸本)9781450316880
This paper presents a method for translation of java byte-code sequences into synthesizable hardware, using the Bluespec SystemVerilog (BSV) environment. At the core of our approach lies a BSV description of a subset of java bytecodes, that can be used to directly translate bytecode sequences into a BSV specification. The result is intended as an accelerator for existing java processors (JOP, BlueJEP) or even standalone hardware. Preliminary evaluation shows our solution to produce hardware on par with established methods (area/performance), while supporting rare features (e.g. easy to automate, method calls and recursion).
Code coverage measurement is an important element in white-box testing, both in industrial practice and academic research. Other related areas are highly dependent on code coverage as well, including test case generat...
详细信息
ISBN:
(纸本)9781509018550
Code coverage measurement is an important element in white-box testing, both in industrial practice and academic research. Other related areas are highly dependent on code coverage as well, including test case generation, test prioritization, fault localization, and others. Inaccuracies of a code coverage tool sometimes do not matter that much but in certain situations they can lead to serious confusion. For java, the prevalent approach to code coverage measurement is to use bytecode instrumentation due to its various benefits over source code instrumentation. However, if the results are to be mapped back to source code this may lead to inaccuracies due to the differences between the two program representations. In this paper, we systematically investigate the amount of differences in the results of these two java code coverage approaches, enumerate the possible reasons and discuss the implications on various applications. For this purpose, we relied on two widely used tools to represent the two approaches and a set of benchmark programs from the open source domain.
This paper proposes to explore the following question: can software evolution systems like FINCH, that evolve linear representations originating from a higher-level structural language, take advantage of building bloc...
详细信息
ISBN:
(纸本)9781450349390
This paper proposes to explore the following question: can software evolution systems like FINCH, that evolve linear representations originating from a higher-level structural language, take advantage of building blocks inherent to that original language?
In [5, 15] we presented an approach to prove termination of non-recursive java bytecode (JBC) programs automatically. Here, JBC programs are first transformed to finite termination graphs which represent all possible ...
详细信息
ISBN:
(纸本)9783939897309
In [5, 15] we presented an approach to prove termination of non-recursive java bytecode (JBC) programs automatically. Here, JBC programs are first transformed to finite termination graphs which represent all possible runs of the program. Afterwards, the termination graphs are translated to term rewrite systems (TRSs) such that termination of the resulting TRSs implies termination of the original JBC programs. So in this way, existing techniques and tools from term rewriting can be used to prove termination of JBC automatically. In this paper, we improve this approach substantially in two ways: (1) We extend it in order to also analyze recursive JBC programs. To this end, one has to represent call stacks of arbitrary size. (2) To handle JBC programs with several methods, we modularize our approach in order to reuse termination graphs and TRSs for the separate methods and to prove termination of the resulting TRS in a modular way. We implemented our approach in the tool AProVE. Our experiments show that the new contributions increase the power of termination analysis for JBC significantly.
This paper argues that graph grammars naturally model dynamic data structures such as lists, trees and combinations thereof. These grammars can be exploited to obtain finite abstractions of pointer-manipulating progra...
详细信息
This paper argues that graph grammars naturally model dynamic data structures such as lists, trees and combinations thereof. These grammars can be exploited to obtain finite abstractions of pointer-manipulating programs, thus enabling model checking. Experimental results for verifying Lindstrom's variant of the Deutsch-Schorr-Waite tree traversal algorithm illustrate this. (C) 2013 Elsevier B.V. All rights reserved.
It is important to prove that supposedly terminating programs actually terminate, particularly if those programs must be run on critical systems or downloaded into a client such as a mobile phone. Although termination...
详细信息
It is important to prove that supposedly terminating programs actually terminate, particularly if those programs must be run on critical systems or downloaded into a client such as a mobile phone. Although termination of computer programs is generally undecidable, it is possible and useful to prove termination of a large, nontrivial subset of the terminating programs. In this article, we present our termination analyzer for sequential java bytecode, based on a program property called path-length. We describe the analyses which are needed before the path-length can be computed such as sharing, cyclicity, and aliasing. Then we formally define the path-length analysis and prove it correct with respect to a reference denotational semantics of the bytecode. We show that a constraint logic program P-CLP can be built from the result of the path-length analysis of a java bytecode program P and formally prove that if P-CLP terminates, then P also terminates. Hence a termination prover for constraint logic programs can be applied to prove the termination of P. We conclude with some discussion of the possibilities and limitations of our approach. Ours is the first existing termination analyzer for java bytecode dealing with any kind of data structures dynamically allocated on the heap and which does not require any help or annotation on the part of the user.
暂无评论