Optimization directed inlining is a good direction for inlining, but it does not consider the factor of the execution frequency and size of the function. Although a traditional inlining model considers the factor of e...
详细信息
Optimization directed inlining is a good direction for inlining, but it does not consider the factor of the execution frequency and size of the function. Although a traditional inlining model considers the factor of execution frequency and size of the function, it does not consider the optimization after inlining. In this paper, a new inline model, loop fusion conscious inline model, is proposed to avoid these drawbacks of the inline model of the past. It considers both execution frequency and size and optimization. The inlining method which only considers loop fusion is implemented and is added into the ORC's original inline model. Then the new inline model is built and the model is tuned for high performance. In the experiment, some fact is found that temperature (execution frequency) isn't effective in some cases, and the reason is analyzed. Experiment result shows that the new model can greatly improve the performance of the compiler, and some SPEC CPU 2000 benchmark's peak performance can increase as high as 6%, and 1% on average.
The proposed method is focused on synthesis-based static circuits, and a power modeling library is developed for modeling processors by means of parametric RTL and physical annotation, and all kinds of processor modul...
详细信息
The proposed method is focused on synthesis-based static circuits, and a power modeling library is developed for modeling processors by means of parametric RTL and physical annotation, and all kinds of processor modules are mapped into combinations of basic components. Those models are linked to an architectural simulator, running benchmarks to get power results. The power analysis of benchmark platforms proves to be effective and highly correlated, with an average 10% error and little speed penalty compared with the gate-level power analysis.
The bandwidth becomes the major bottleneck of the performance improvement for modern microprocessors. A cache adaptive write allocate policy that improves the bandwidth of microprocessor significantly is proposed by i...
详细信息
The bandwidth becomes the major bottleneck of the performance improvement for modern microprocessors. A cache adaptive write allocate policy that improves the bandwidth of microprocessor significantly is proposed by investigating cache store misses. The cache adaptive write allocate policy collects fully modified blocks in miss queue. Fully modified blocks are written to lower level memory based on non-write allocate policy which can switch to write allocate policy adaptively. Compared with other cache store miss policies, the cache adaptive write allocate policy avoids unnecessary memory traffic, reduces cache pollution and decreases load and store queue full rate without increasing hardware overhead. Experiment results indicate that on average 62.6% memory bandwidth in STREAM benchmarks is improved by utilizing the cache adaptive write allocate policy. The performance of SPEC CPU 2000 benchmarks is also improved efficiently. The average IPC speedup is 5.9%.
Now the research of computer architecture focuses on how to utilize the energy of CPU to attain high performance as much as possible. Obviously the architecture-level power estimation tool is important. Existing archi...
详细信息
Now the research of computer architecture focuses on how to utilize the energy of CPU to attain high performance as much as possible. Obviously the architecture-level power estimation tool is important. Existing architecture-level power simulators only focus on full-custom dynamic circuits modeling, but ignores the power modeling of ASIC designs which are mainly composed of static circuits or standard cell libraries. So this paper is concerned with the implementation of a high performance and low power general purpose CPU, the Godson-2 processor, and analyzes the power characteristics of the CPU, and implements an architecture-level power estimation methodology which aims at the Godson-2 processor. This methodology takes the power modeling methodology of CMOS static circuits into account carefully, so it is better for the estimation of current high performance CPU architecture which is designed with ASIC methodology. Compared with the RTL power estimating method, this methodology has high speed and high flexibility and the accuracy is also very good. On the platform of Intel Xeon 2.4 GHz, the speed of this methodology is about 300 K instructions per second, which is 5000 times that of the RTL power estimating method with only little error penalty.
This paper proposes a random routing algorithm with end-to-end feedback. Random routing has the capability of handling random transmission errors efficiently with high forwarding speed. End-to-end feedback promises th...
详细信息
This paper proposes a random routing algorithm with end-to-end feedback. Random routing has the capability of handling random transmission errors efficiently with high forwarding speed. End-to-end feedback promises the correctness of transmission and reduces the power consumption. Experimental results demonstrated that our random routing algorithm has lower latency, lower power consumption, and can provide on-chip communication with high reliability.
It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to sim...
详细信息
It can be observed from looking backward that processor architecture is improved through spirally shifting from simple to complex and from complex to simple. Nowadays we are facing another shifting from complex to simple, and new innovative architecture will emerge to utilize the continuously increasing transistor budgets. The growing importance of wire delays, changing workloads, power consumption, and design/verification complexity will drive the forthcoming era of Chip Multiprocessors (CMPs). Furthermore, typical CMP projects both from industries and from academics are investigated. Through going into depths for some primary theoretical and implementation problems of CMPs, the great challenges and opportunities to future CMPs are presented and discussed. Finally, the Godson series microprocessors designed in China are introduced.
In a cluster or a database server system, the performance of some data intensive applications will be degraded much because of the limited local memory and large amount of interactions with slow disk. In high speed ne...
详细信息
In a cluster or a database server system, the performance of some data intensive applications will be degraded much because of the limited local memory and large amount of interactions with slow disk. In high speed network, utilizing remote memory of other nodes or customized memory server to be as second level buffer can decrease access numbers to disks and benefit application performance. With second level buffer mode, this paper made some improvements for a recently proposed buffer cache replacement algorithm-LIRS, and brings forward an adaptive algorithm-LIRS-A. LIRS-A can adaptively adjust itself according to application characteristic, thus the problem of not suiting for time locality of LIRS is avoided. In TPC-H benchmarks, LIRS-A could improve hit rate over LIRS by 7.2% at most. In a Groupby query with network stream analyzing database, LIRS-A could improve hit rate over LIRS by 31.2% at most. When compared with other algorithms, LIRS-A also show similar or better performance.
Dynamic programming has been one of the most efficient approaches to sequence analysis and structure prediction in biology. However, their performance is limited due to the drastic increase in both the number of biolo...
详细信息
Under visualization idea based on large-scale dismantling and sharing, the implementing of network interconnection of calculation components and storage components by loose coupling, which are tightly coupling in trad...
详细信息
Under visualization idea based on large-scale dismantling and sharing, the implementing of network interconnection of calculation components and storage components by loose coupling, which are tightly coupling in traditional server, achieves computing capacity, storage capacity and service capacity distribution according to need in application-level. Under the new server model, the segregation and protection of user space and system space as well as the security monitoring of virtual resources are the important factors of ultimate security guarantee. This article presents a large-scale and expansible distributed invasion detection system of virtual computing environment based on virtual machine. The system supports security monitoring management of global resources and provides uniform view of security attacks under virtual computing environment, thereby protecting the user applications and system security under capacity services domain.
Under virtualization idea based on large-scale dismantling and sharing, the implementing of network interconnection of calculation components and storage components by loose coupling, which are tightly coupling in tra...
详细信息
Under virtualization idea based on large-scale dismantling and sharing, the implementing of network interconnection of calculation components and storage components by loose coupling, which are tightly coupling in traditional server, achieves computing capacity, storage capacity and service capacity distri- bution according to need in application-level. Under the new server model, the segregation and protection of user space and system space as well as the security monitoring of virtual resources are the important factors of ultimate security guarantee. This article presents a large-scale and expansible distributed invasion detection system of virtual computing environment based on virtual machine. The system supports security monitoring management of global resources and provides uniform view of security attacks under virtual computing environment, thereby protecting the user applications and system security under capacity services domain.
暂无评论