With the increasing demand and the wide application of high performance commodity multi-core processors,both the quantity and scale of data centers grow dramatically and they bring heavy energy *** and engineers have ...
详细信息
With the increasing demand and the wide application of high performance commodity multi-core processors,both the quantity and scale of data centers grow dramatically and they bring heavy energy *** and engineers have applied much effort to reducing hardware energy consumption,but software is the true consumer of power and another key in making better use of *** software is critical to better energy utilization,because it is not only the manager of hardware but also the bridge and platform between applications and *** this paper,we summarize some trends that can affect the efficiency of data ***,we investigate the causes of software *** on these studies,major technical challenges and corresponding possible solutions to attain green system software in programmability,scalability,efficiency and software architecture are ***,some of our research progress on trusted energy efficient system software is briefly introduced.
ADML is an architectural description language based on Dynamic Description Logic for defining and simulating the behavior of system architecture. ADML is being developed as a new formal language and/or conceptual mode...
详细信息
Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates t...
详细信息
ISBN:
(纸本)9781622765034
Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates the target string in order, can improve decoding efficiency by simplifying the language model evaluation. This paper presents a novel left to right decoding algorithm for tree-to-string translation, using a bottom-up parsing strategy and dynamic future cost estimation for each partial translation. Our method outperforms previously published tree-to-string decoders, including a competing left-to-right method.
This paper exploits sink mobility to prolong the network lifetime in wireless sensor networks (WSNs) where the information delay caused by moving the sink should be bounded. We build a unified framework for analyzing ...
详细信息
ISBN:
(纸本)9781457720529
This paper exploits sink mobility to prolong the network lifetime in wireless sensor networks (WSNs) where the information delay caused by moving the sink should be bounded. We build a unified framework for analyzing this joint sink mobility and routing problem. We offer a mathematical modeling that is general and captures diversified issues, e.g. sink mobility, routing, delay, etc. We discuss the induced subproblems and present efficient solutions for them. Then, we generalize these solutions and propose a polynomial-time optimal algorithm for the origin problem. In simulations, we show the benefits of involving a mobile sink. We also show that the impact of the delay bound on the network lifetime.
Productive information system has high-level security requirements, but Trusted computing Group's solution of trust chain isn't competent. After analysis and comparison of two different trust transfer ways, a ...
详细信息
This paper describes the design for testability (DFT) challenges and techniques of Godson-3 microprocessor, which is a scalable multicore processor based on the scalable mesh of crossbar (SMOC) on-chip network and...
详细信息
This paper describes the design for testability (DFT) challenges and techniques of Godson-3 microprocessor, which is a scalable multicore processor based on the scalable mesh of crossbar (SMOC) on-chip network and targets high-end applications. Advanced techniques are adopted to make the DFT design scalable and achieve low-power and low-cost test with limited IO resources. To achieve a scalable and flexible test access, a highly elaborate test access mechanism (TAM) is implemented to support multiple test instructions and test modes. Taking advantage of multiple identical cores embedding in the processor, scan partition and on-chip comparisons are employed to reduce test power and test time. Test compression technique is also utilized to decrease test time. To further reduce test power, clock controlling logics are designed with ability to turn off clocks of non-testing partitions. In addition, scan collars of CACHEs are designed to perform functional test with low-speed ATE for speed-binning purposes, which poses low complexity and has good correlation results.
Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 20...
详细信息
Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. System tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70 GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.
The wide application of General Purpose Graphic Processing Units (GPGPUs) results in large manual efforts on porting and optimizing algorithms on them. However, most existing automatic ways of generating GPGPU code fa...
详细信息
With the growing scale of high-performance computing (HPC) systems, today and more so tomorrow, faults are a norm rather than an exception. HPC applications typically tolerate fail-stop failures under the stop-and-wai...
详细信息
Instruction-level redundancy is an effective scheme to reduce the susceptibility of microprocessors to soft errors, offering high error detection and recovery capability;however, it usually incurs significant performa...
详细信息
ISBN:
(纸本)9781467344975
Instruction-level redundancy is an effective scheme to reduce the susceptibility of microprocessors to soft errors, offering high error detection and recovery capability;however, it usually incurs significant performance degradation due to resource racing. Motivated by the fact that narrow-width operands are commonly seen in applications, we exploit data-level parallelism to accelerate instruction-level redundancy. For the instructions within sphere of replication (SoR) of data-level redundancy, normal and redundant versions of the narrow-width operand of the instruction are folded into one register to share the same functional unit during execution hence alleviating resource racing. The other instructions are all protected by instructionlevel redundancy. We run SPECint2000 benchmarks on a modified version of SimpleScalar simulator, and synthesize the extra hardware to evaluate area overhead of the proposed pipeline. Experimental results show that our acceleration scheme outperforms conventional instruction-level redundancy by 13% in IPC. Besides, the extra area overhead is negligible.
暂无评论