In DSM and nanometer technology, there will present more and more new fault types, which are difficult to predict and avoid. Applying fault tolerant algorithms to achieve reliable on-chip communication is one of the m...
详细信息
Superimpose one protein tertiary structure to another can help to find similarity between them and further identify functional and evolutionary relationships. We first extract invariant features under rigid body trans...
详细信息
Optimization directed inlining is a good direction for inlining, but it does not consider the factor of the execution frequency and size of the function. Although a traditional inlining model considers the factor of e...
详细信息
Optimization directed inlining is a good direction for inlining, but it does not consider the factor of the execution frequency and size of the function. Although a traditional inlining model considers the factor of execution frequency and size of the function, it does not consider the optimization after inlining. In this paper, a new inline model, loop fusion conscious inline model, is proposed to avoid these drawbacks of the inline model of the past. It considers both execution frequency and size and optimization. The inlining method which only considers loop fusion is implemented and is added into the ORC's original inline model. Then the new inline model is built and the model is tuned for high performance. In the experiment, some fact is found that temperature (execution frequency) isn't effective in some cases, and the reason is analyzed. Experiment result shows that the new model can greatly improve the performance of the compiler, and some SPEC CPU 2000 benchmark's peak performance can increase as high as 6%, and 1% on average.
Trend towards providing heterogeneous services concurrently by ISPs and low utilization of servers make it necessary to consolidate various services computing into a single platform. In such a shared environment, meet...
详细信息
ISBN:
(纸本)9781595939036
Trend towards providing heterogeneous services concurrently by ISPs and low utilization of servers make it necessary to consolidate various services computing into a single platform. In such a shared environment, meeting application-level QoS goals and avoiding interaction among services become challenges as each application consumes different amount of resource and requires different QoS. Video-on-demand (VoD) has been identified as an important application among multimedia services. In this paper, we study a case of interaction among services concurrently running on a virtualized computing environment. The contributions of this paper are follows. 1) We design a novel capability service computing framework for service computing consolidation (CSCF) to study the interaction among concurrent services in a VM-Based Virtualized computing Environment;2) A dynamic and lazy memory flowing algorithm (DLMFaVM) among VMs is proposed to partially fulfill this computing model by resource flowing;3) We also develop a virtualized computing platform to study the possibility of service computing consolidation;4) We analyze the interaction among VoD service and other typical enterprise services in our capability service computing environment and draw a conclusion that the VoD streaming service co-existing with other services on a VM-based virtualized computing platform is a trend for ISPs. But it presents some challenges for the designers of service soft and the designers of service computing platform. Copyright 2007 ACM.
The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi- ported caches are costly to implement. In this paper we propose technique for...
详细信息
The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi- ported caches are costly to implement. In this paper we propose technique for using a simplified dual-ported cache instead, which is mostly composed of single- ported SRAMs, without decreasing the performance of the processor apparently. We evaluate this technique using realistic applications that include the operating system. Our technique using a simplified multi-ported banking cache, reduces the delay of select logic in LSQ by 16.1%, and achieves 98.1% of the performance of an ideal dual-ported cache.
This paper presents a framework for implementing the X86 FP stack used in an x86-compliant processor based on a general RISC architecture. Architectural supports are added to a typical RISC architecture to maintain th...
详细信息
This paper presents a framework for implementing the X86 FP stack used in an x86-compliant processor based on a general RISC architecture. Architectural supports are added to a typical RISC architecture to maintain the FP stack status. Some speculative techniques are applied to the decode stage to enable pipelined and efficient FP operations. An optimized register renaming scheme is proposed to eliminate redundant micro-ops in FP programs, resulting in an increased performance while mitigating the burden on register rename table. The simulation results show that on average more than 10% fmov micro-ops are removed. Elimination of micro-ops significantly speeds up the execution of programs. The IPC increases are as high as 30% for some programs, and near 10% on average
A 64 bit low power, high speed floating-point adder design is presented in this paper. The proposed floating-point adder is based on dual path architecture, and both dynamic and leakage power are reduced by exploiting...
详细信息
A 64 bit low power, high speed floating-point adder design is presented in this paper. The proposed floating-point adder is based on dual path architecture, and both dynamic and leakage power are reduced by exploiting architecture opportunities to minimize switching activity and maximize the stack effect of the circuits concurrently. Experimental result based on 130 nm CMOS standard cell design shows that average power consumptions of the FP adder can be reduced by 61.4% with proposed low power techniques.
Grid system software is inherently complex, hard to build and maintain. In this paper, we propose a self-managing building block: grid unit, which facilitates constructing grid system with higher availability and lowe...
详细信息
Grid system software is inherently complex, hard to build and maintain. In this paper, we propose a self-managing building block: grid unit, which facilitates constructing grid system with higher availability and lower management overhead. We present an agent organization as autonomic management framework, and propose a self-recovering protocol to eliminate most of tough jobs from system administrator's routines. The system has been deployed on Dawning 4000A since 2004, the biggest node for China grid system. We have done extensive experiments to evaluate grid unit, and the collected log data shows the availability of a grid parallel process management service, built on the basis of grid unit, reaches 99.997%.
In recent years, the power efficiency of NoC (network on chip) is becoming a new research direction. For tiled CMP (single-chip multi processor), the characteristics of transmission data of NoC in a tiled CMP should b...
详细信息
In recent years, the power efficiency of NoC (network on chip) is becoming a new research direction. For tiled CMP (single-chip multi processor), the characteristics of transmission data of NoC in a tiled CMP should be noticed that the probability of which the transmitted bits are zero is much bigger than that of which the bits are one. This paper proposes an innovative power-efficient architecture of input buffer of NoC, which makes use of the mentioned characteristics, can improve the power efficiency of the NoC of tiled CMP significantly.
The application range of cluster has expanded beyond scientific computing, but the present cluster system software fails to provide a flexible architecture to promote code reuse and facilitate building cluster system ...
详细信息
The application range of cluster has expanded beyond scientific computing, but the present cluster system software fails to provide a flexible architecture to promote code reuse and facilitate building cluster system software for different computing contexts, most of which are developed from scratch case by case, or integrated or packaged with ldquothe best practicerdquo. In this paper, we have proposed a layered design methodology to build cluster system stack with different layers concentrating on different functions, and developed common sets of core service as reusing framework for different computing context. Following this methodology, we have built Phoenix-a complete cluster system stack for both scientific and business computing, which is verified and deployed on Dawning 4000 A super computer for scientific computing and other cluster systems for business computing. The qualitative evaluation and our practices show the design methodology of Phoenix has advantages over other methodologies.
暂无评论