With the increasing demand and the wide application of high performance commodity multi-core processors,both the quantity and scale of data centers grow dramatically and they bring heavy energy *** and engineers have ...
详细信息
With the increasing demand and the wide application of high performance commodity multi-core processors,both the quantity and scale of data centers grow dramatically and they bring heavy energy *** and engineers have applied much effort to reducing hardware energy consumption,but software is the true consumer of power and another key in making better use of *** software is critical to better energy utilization,because it is not only the manager of hardware but also the bridge and platform between applications and *** this paper,we summarize some trends that can affect the efficiency of data ***,we investigate the causes of software *** on these studies,major technical challenges and corresponding possible solutions to attain green system software in programmability,scalability,efficiency and software architecture are ***,some of our research progress on trusted energy efficient system software is briefly introduced.
The Godson project with an R&D history of 10 years is an independent national program of China that aims at developing advanced microprocessor technologies based on fundamental research and commercialization of the c...
详细信息
The Godson project with an R&D history of 10 years is an independent national program of China that aims at developing advanced microprocessor technologies based on fundamental research and commercialization of the chip technology. We will give a comprehensive presentation of the Godson project, including its history, technical roadmaps, and several unique technical merits.
Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion *** this paper,we present an attempt to design efficient mult...
详细信息
Due to the huge size of patterns to be searched,multiple pattern searching remains a challenge to several newly-arising applications like network intrusion *** this paper,we present an attempt to design efficient multiple pattern searching algorithms on multi-core *** observe an important feature which indicates that the multiple pattern matching time mainly depends on the number and minimal length of *** multi-core algorithm proposed in this paper leverages this feature to decompose pattern set so that the parallel execution time is *** formulate the problem as an optimal decomposition and scheduling of a pattern set,then propose a heuristic algorithm,which takes advantage of dynamic programming and greedy algorithmic techniques,to solve the optimization *** results suggest that our decomposition approach can increase the searching speed by more than 200% on a 4-core AMD Barcelona system.
This paper describes the design for testability (DFT) challenges and techniques of Godson-3 microprocessor, which is a scalable multicore processor based on the scalable mesh of crossbar (SMOC) on-chip network and...
详细信息
This paper describes the design for testability (DFT) challenges and techniques of Godson-3 microprocessor, which is a scalable multicore processor based on the scalable mesh of crossbar (SMOC) on-chip network and targets high-end applications. Advanced techniques are adopted to make the DFT design scalable and achieve low-power and low-cost test with limited IO resources. To achieve a scalable and flexible test access, a highly elaborate test access mechanism (TAM) is implemented to support multiple test instructions and test modes. Taking advantage of multiple identical cores embedding in the processor, scan partition and on-chip comparisons are employed to reduce test power and test time. Test compression technique is also utilized to decrease test time. To further reduce test power, clock controlling logics are designed with ability to turn off clocks of non-testing partitions. In addition, scan collars of CACHEs are designed to perform functional test with low-speed ATE for speed-binning purposes, which poses low complexity and has good correlation results.
Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 20...
详细信息
Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. system tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70 GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.
A comparison of on-body and body-to-body channels in an indoor high scattered environment is performed through the characterization and the evaluation of the achievable capacity when using a MIMO PIFA array system. Fo...
详细信息
A comparison of on-body and body-to-body channels in an indoor high scattered environment is performed through the characterization and the evaluation of the achievable capacity when using a MIMO PIFA array system. For the on-body channels, the belt-head channel offers a better capacity than the belt-chest channel at the high SNR because of its more rich scattering quality. However the presence of a high LOS signal compensates for such a limitation and allows the belt-chest channel to yield a similar capacity as the belt-head channel at low SNR. The body-to-body belt-belt and belt-head channels yield the same capacity values because they exhibit the same statistical parameters. Their average capacity is comparable to the on-body belt-chest channel, which is viable in high-data communications.
Instruction-level redundancy is an effective scheme to reduce the susceptibility of microprocessors to soft errors, offering high error detection and recovery capability;however, it usually incurs significant performa...
详细信息
ISBN:
(纸本)9781467344975
Instruction-level redundancy is an effective scheme to reduce the susceptibility of microprocessors to soft errors, offering high error detection and recovery capability;however, it usually incurs significant performance degradation due to resource racing. Motivated by the fact that narrow-width operands are commonly seen in applications, we exploit data-level parallelism to accelerate instruction-level redundancy. For the instructions within sphere of replication (SoR) of data-level redundancy, normal and redundant versions of the narrow-width operand of the instruction are folded into one register to share the same functional unit during execution hence alleviating resource racing. The other instructions are all protected by instructionlevel redundancy. We run SPECint2000 benchmarks on a modified version of SimpleScalar simulator, and synthesize the extra hardware to evaluate area overhead of the proposed pipeline. Experimental results show that our acceleration scheme outperforms conventional instruction-level redundancy by 13% in IPC. Besides, the extra area overhead is negligible.
The wide application of General Purpose Graphic Processing Units (GPGPUs) results in large manual efforts on porting and optimizing algorithms on them. However, most existing automatic ways of generating GPGPU code fa...
详细信息
Cloud computing is a new computing model. The resource monitoring tools are immature compared to traditional distributed computing and grid computing. In order to better monitor the virtual resource in cloud computing...
详细信息
Cloud computing is a new computing model. The resource monitoring tools are immature compared to traditional distributed computing and grid computing. In order to better monitor the virtual resource in cloud computing, a periodically and event-driven push (PEP) monitoring model is proposed. Taking advantage of the push and event-driven mechanism, the model can provide comparatively adequate information about usage and status of the resources. It can simplify the communication between Master and Work Nodes without missing the important issues happened during the push interval. Besides, we develop "mon" to make up for the deficiency of Libvirt in monitoring of virtual CPU and memory.
Moore's law continues to grant computer architects ever more transistors in the foreseeable future, and para-llelism is the key to continued performance scaling in modern microprocessors. In this paper, the achiev...
详细信息
暂无评论