Variational execution is a novel dynamic analysis technique for exploring highly configurable systems and accurately tracking information flow. It is able to efficiently analyze many configurations by aggressively sha...
详细信息
Variational execution is a novel dynamic analysis technique for exploring highly configurable systems and accurately tracking information flow. It is able to efficiently analyze many configurations by aggressively sharing redundancies of program executions. The idea of variational execution has been demonstrated to be effective in exploring variations in the program, especially when the configuration space grows out of control. Existing implementations of variational execution often require heavy lifting of the runtime interpreter, which is painstaking and error-prone. Furthermore, the performance of this approach is suboptimal. For example, the state-of-the-art variational execution interpreter for java, VarexJ, slows down executions by 100 to 800 times over a single execution for small to medium size java programs. Instead of modifying existing JVMs, we propose to transform existing bytecode to make it variational, so it can be executed on an unmodified commodity JVM. Our evaluation shows a dramatic improvement on performance over the state-of-the-art, with a speedup of 2 to 46 times, and high efficiency in sharing computations.
Reflective supertype information (RSI) is useful for many instrumentation-based type-specific analyses on the java virtual machine (JVM). On the one hand, while such information can be obtained when performing the ins...
详细信息
Reflective supertype information (RSI) is useful for many instrumentation-based type-specific analyses on the java virtual machine (JVM). On the one hand, while such information can be obtained when performing the instrumentation within the same JVM process executing the instrumented program, in-process instrumentation severely limits the bytecode coverage of the analysis. On the other hand, performing the instrumentation in a separate process can achieve full bytecode coverage, but complete RSI is generally not available, often requiring the insertion of expensive runtime type checks in the instrumented program. In this article, we present a novel technique to accurately reify complete RSI in a separate instrumentation process. This is challenging, because the observed application may make use of custom classloaders and the loaded classes in one application execution are generally only known upon termination of the application. We implement our technique in an extension of the dynamic analysis framework DiSL. The resulting framework guarantees full bytecode coverage, while providing RSI. Evaluation results on a task profiler demonstrate that our technique can achieve speedups up to a factor of 6.24 x wrt. resorting to runtime type checks in the instrumentation code for an analysis with full bytecode coverage.
Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) ma...
详细信息
ISBN:
(纸本)9781450356176
Task granularity, i.e., the amount of work performed by parallel tasks, is a key performance attribute of parallel applications. On the one hand, fine-grained tasks (i.e., small tasks carrying out few computations) may introduce considerable parallelization overheads. On the other hand, coarse-grained tasks (i.e., large tasks performing substantial computations) may not fully utilize the available CPU cores, resulting in missed parallelization opportunities. In this paper, we provide a better understanding of task granularity for applications running on a java virtual machine. We present a novel profiler which measures the granularity of every executed task. Our profiler collects carefully selected metrics from the whole system stack with only little overhead, and helps the developer locate performance problems. We analyze task granularity in the DaCapo and ScalaBench benchmark suites, revealing several inefficiencies related to fine-grained and coarse-grained tasks. We demonstrate that the collected task-granularity profiles are actionable by optimizing task granularity in two benchmarks, achieving speedups up to 1.53x.
Fast, byte-addressable non-volatilememory (NVM) embraces both near-DRAM latency and disk-like persistence, which has generated considerable interests to revolutionize system software stack and programming models. Howe...
详细信息
ISBN:
(纸本)9781450349116
Fast, byte-addressable non-volatilememory (NVM) embraces both near-DRAM latency and disk-like persistence, which has generated considerable interests to revolutionize system software stack and programming models. However, it is less understood how NVM can be combined with managed run-time like java virtual machine (JVM) to ease persistence management. This paper proposes Espresso(1), a holistic extension to java and its runtime, to enable java programmers to exploit NVM for persistence management with high performance. Espresso first provides a general persistent heap design called Persistent java Heap (PJH) to manage persistent data as normal java objects. The heap is then strengthened with a recoverable mechanism to provide crash consistency for heap metadata. Espresso further provides a new abstraction called Persistent java Object (PJO) to provide an easy-to-use but safe persistence programming model for programmers to persist application data. Evaluation confirms that Espresso significantly outperforms state-of-art NVM support for java (i.e., JPA and PCJ) while being compatible to data structures in existing java programs.
Spark is increasingly becoming the platform of choice for several big-data analyses mainly due to its fast fault tolerant, and in-memory processing model. Despite the popularity and maturity of the Spark framework, tu...
详细信息
ISBN:
(纸本)9781728119700
Spark is increasingly becoming the platform of choice for several big-data analyses mainly due to its fast fault tolerant, and in-memory processing model. Despite the popularity and maturity of the Spark framework, tuning Spark applications to achieve high performance remains challenging. In this paper, we present Int, a novel tool that assists users hi improving the level of paralklism of applications running on top of Spark in the local amide. 1pt helps users tune the level of parallelism of Spark applications to spaw,n a number of tasks able to fully exploit the available computing resources. Our evaluation results show that optimizations guided by Ipt can achieve speedups up to 2.72x.
java runtime frees applications from manual memory management by its automatic garbage collection (GC), at the cost of stop-the-world pauses. State-of-the-art collectors leverage multiple generations, which will inevi...
详细信息
ISBN:
(纸本)9781450360067
java runtime frees applications from manual memory management by its automatic garbage collection (GC), at the cost of stop-the-world pauses. State-of-the-art collectors leverage multiple generations, which will inevitably suffer from a full GC phase scanning the whole heap and induce a pause tens of times longer than normal collections, which largely affects both throughput and latency of the entire system. In this paper, we analyze the full GC performance of HotSpot Parallel Scavenge garbage collector comprehensively and study its algorithm design in depth. We find out that heavy dependencies among heap regions cause poor thread utilization. Furthermore, many heap regions contain mostly live objects (referred to as dense regions), which are unnecessary to collect. To solve these problems, we introduce two kinds of optimizations: allocating shadow regions dynamically as compaction destination to eliminate region dependencies and skipping dense regions to reduce GC workload. Evaluation results show the optimizations lead to averagely 2.6X (up to 4.5X) improvement in full GC throughput and thereby boost the application performance by 18.2% on average (58.4% at best).
Emerging cloud computing arouses need for large-scale data processing which in turn promises vigorous developments on big data platforms running on java virtual machine (JVM), such as Hadoop, Spark and Flink. Storing ...
详细信息
ISBN:
(纸本)9781538652060
Emerging cloud computing arouses need for large-scale data processing which in turn promises vigorous developments on big data platforms running on java virtual machine (JVM), such as Hadoop, Spark and Flink. Storing a large amount of data in memory allows those platforms to benefit from satisfying performance and powerful memory management and garbage collection service in java. Non-volatile memory (NVM) provides nonvolatility, byte-addressable and fast access speed characteristics and thus becomes a superior alternative for volatile memory utilizing in future cloud system and java world. This paper presents a recoverable garbage collector named DwarfGC to manage java objects in NVM so as to ensure crash consistency and durability. DwarfGC persists heaprelated metadata into NVM at the beginning of GC and relies on it for recovery. The metadata is stored in a spaceefficient fashion but incurring little time overhead.
The proliferation of applications, frameworks, and services built on java have led to an ecosystem critically dependent on the underlying runtime system, the java virtual machine (JVM). However, many applications runn...
详细信息
ISBN:
(纸本)9781450355841
The proliferation of applications, frameworks, and services built on java have led to an ecosystem critically dependent on the underlying runtime system, the java virtual machine (JVM). However, many applications running on the JVM, e.g., big data analytics, suffer from long garbage collection (GC) time. The long pause time due to GC not only degrades application throughput and causes long latency, but also hurts overall system efficiency and scalability. In this paper, we present an in-depth performance analysis of GC in the widely-adopted HotSpot JVM. Our analysis uncovers a previously unknown performance issue the design of dynamic GC task assignment, the unfairness of mutex lock acquisition in HotSpot, and the imperfect operating system (OS) load balancing together cause loss of concurrency in Parallel Scavenge, a state-of-the-art and the default garbage collector in HotSpot. To this end, we propose a number of solutions to these issues, including enforcing GC thread affinity to aid multicore load balancing and designing a more efficient work stealing algorithm. Performance evaluation demonstrates that these proposed approaches lead to the improvement of the overall completion time, GC time and application tail latency by as much as 49.6%, 87.1%, 43%, respectively.
A memory consistency model (or simply memory model) defines the possible values that a shared-memory read may return in a multithreaded programming language. Choosing a memory model involves an inherent performance-pr...
详细信息
A memory consistency model (or simply memory model) defines the possible values that a shared-memory read may return in a multithreaded programming language. Choosing a memory model involves an inherent performance-programmability tradeoff. The java language has adopted a relaxed (or weak) memory model that is designed to admit most traditional compiler optimizations and obviate the need for hardware fences on most shared-memory accesses. The downside, however, is that programmers are exposed to a complex and unintuitive semantics and must carefully declare certain variables as volatile in order to enforce program orderings that are necessary for proper behavior. This paper proposes a simpler and stronger memory model for java through a conceptually small change: every variable has volatile semantics by default, but the language allows a programmer to tag certain variables, methods, or classes as relaxed and provides the current java semantics for these portions of code. This volatile-by-default semantics provides sequential consistency (SC) for all programs by default. At the same time, expert programmers retain the freedom to build performance-critical libraries that violate the SC semantics. At the outset, it is unclear if the volatile-by-default semantics is practical for java, given the cost of memory fences on today's hardware platforms. The core contribution of this paper is to demonstrate, through comprehensive empirical evaluation, that the volatile-by-default semantics is arguably acceptable for a predominant use case for java today - server-side applications running on Intel x86 architectures. We present VBD-HoTSPoT, a modification to Oracle's widely used HotSpot JVM that implements the volatile-by-default semantics for x86. To our knowledge VBD-HoTSPoT is the first implementation of SC for java in the context of a modern JVM. VBD-HoTSPoT incurs an average overhead versus the baseline HotSpot JVM of 28% for the Da Capo benchmarks, which is significant tho
Embedded systems provide limited storage capacity. This limitation conflicts with the demands of modern virtualmachine platforms, which require large amounts of library code to be present on each client device. These...
详细信息
Embedded systems provide limited storage capacity. This limitation conflicts with the demands of modern virtualmachine platforms, which require large amounts of library code to be present on each client device. These conflicting requirements are often resolved by providing specialized embedded versions of the standard libraries, but even these stripped down libraries consume significant resources. We present a solution for "always connected" mobile devices based on a zero footprint client paradigm. In our approach, all code resides on a remote server. Only those parts of applications and libraries that are likely to be needed are transferred to the mobile client device. Since it is difficult to predict statically which library parts will be needed at run time, we combine static analysis, opportunistic off-target linking and lazy code loading to transfer code with a high likelihood of execution ahead of time while the other code, such as exception code, remains on the server and is transferred only on demand. This allows us to perform not only dead code elimination, but also aggressive elimination of unused code. The granularity of our approach is flexible from class files all the way down to individual basic blocks. Our method achieves total code size reductions of up to 95%. (C) 2010 Elsevier B.V. All rights reserved.
暂无评论