Hepatocellular carcinoma (HCC) represents the most prevalent form of primary liver cancer, accounting for 75 % of all cases. Individuals with metabolic dysfunctions are at risk of developing significant symptoms, incl...
详细信息
A novel escape analysis framework that handles the Java open-world features is proposed and evaluated. The novel approach analyzes a Java program with an optimistic view that the program is in a closed world and appli...
详细信息
A novel escape analysis framework that handles the Java open-world features is proposed and evaluated. The novel approach analyzes a Java program with an optimistic view that the program is in a closed world and applies optimizations aggressively. The framework also provides a mechanism that controls the analysis complexity. The results show that the escape analysis framework, which has been implemented in Intel's Open Runtime Platform on X86, eliminated about 70% and 94% synchronization operations, and improved the runtime performance 15.77% and 31.28%, for SPECjbb2000 and 209_db respectively.
This paper describes an experimental message-driven programming system for fine-grain multicomputers. The initial target architecture is the J-machine designed at MIT. This machine combines a unique collection of arch...
详细信息
This paper describes an experimental message-driven programming system for fine-grain multicomputers. The initial target architecture is the J-machine designed at MIT. This machine combines a unique collection of architectural features that include fine-grain processes, on-chip associative memory;and hardware support for process synchronization. The programming system uses these mechanisms via a simple message-driven process model that blurs the distinction between processes and messages: messages correspond to processes that are executed elsewhere in the network. This model allows code and data to be distributed across the computers in the machine, and is supported at every stage of the program development cycle. The prototype system we have developed includes a basic set of programming tools to support the model;these include a compiler, linker, archiver, loader and microkernel. Although the concepts are language independent, our prototype system is based on GNU-C.
This paper introduced the optimization and deoptimization technologies for Escape analysis in open world. These technologies are used in a novel Escape analysis framework that has been implemented in Open runtime plat...
详细信息
This paper introduced the optimization and deoptimization technologies for Escape analysis in open world. These technologies are used in a novel Escape analysis framework that has been implemented in Open runtime platform, Intel's opensource Java virtual machine. We introduced the optimization technologies for synchronization removal and object stack allocation, as well as the runtime deoptimization and compensation work. The deoptimization and compensation technologies are crucial for a practical Escape analysis in open world. We evaluated the runtime efficiency of the deoptimization and compensation work on benchmarks like SPECjbb2000 and SPECjvm98.
Language definitions by abstract interpreters are appropriate to the design and development of a language. Axiomatic definitions are more appropriate to proving program properties and verification of compilers. The pr...
详细信息
Networking technology, undoubtedly, plays a vital role in modern warfare especially in Network Centric Operations (NCOs) and Global Information Grid (GIG) concept. However, the current popular network infrastructure, ...
详细信息
A new field in distributed computing, called Ambient In-telligence, has emerged as a consequence of the increasing availability of wireless devices and the mobile networks they induce. Developing software for such mob...
详细信息
A new field in distributed computing, called Ambient Intelligence, has emerged as a consequence of the increasing availability of wireless devices and the mobile networks they induce. Developing software for such mobi...
详细信息
Cache miss stalls hurt performance because of the large gap between memory and processor speeds - for example, the popular server benchmark SPEC JBB2000 spends 45% of its cycles stalled waiting for memory requests on ...
详细信息
Cache miss stalls hurt performance because of the large gap between memory and processor speeds - for example, the popular server benchmark SPEC JBB2000 spends 45% of its cycles stalled waiting for memory requests on the Itanium® 2 processor. Traversing linked data structures causes a large portion of these stalls. Prefetching for linked data structures remains a major challenge because serial data dependencies between elements in a linked data structure preclude the timely materialization of prefetch addresses. This paper presents Mississippi Delta (MS Delta), a novel technique for prefetching linked data structures that closely integrates the hardware performance monitor (HPM), the garbage collector's global view of heap and object layout, the type-level metadata inherent in type-safe programs, and JIT compiler analysis. The garbage collector uses the HPM's data cache miss information to identify cache miss intensive traversal paths through linked data structures, and then discovers regular distances (deltas) between these linked objects. JIT compiler analysis injects prefetch instructions using deltas to materialize prefetch addresses. We have implemented MS Delta in a fully dynamic profile-guided optimization system: the StarJIT dynamic compiler and the ORP Java virtual machine, We demonstrate a 28-29% reduction in stall cycles attributable to the high-latency cache misses targeted by MS Delta and a speedup of 11-14% on the cache miss intensive SPEC JBB2000 benchmark.
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer...
详细信息
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer in the TOP500 list, built at NUDT (National University of Defense technology) last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications. By analyzing the SIMT (single-instruction, multiple-thread) characteristics of programs running on GPGPUs, we have developed PartialRC, a new checkpoint-based compiler-directed partial recomputing method, for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs. In this paper, we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region, describe a checkpoint-based faulttolerance framework developed on PartialRC, and discuss an implementation on the CUDA platform. Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing Checkpoint-Rollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FullRC, by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average. In addition, PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens.
暂无评论