This paper describes the low power test challenges and features of a multi-core processor, Godson-T, which contains 16 identical coresSince the silicon design technology scales to ultra deep submicron and even nanomet...
详细信息
This paper describes the low power test challenges and features of a multi-core processor, Godson-T, which contains 16 identical coresSince the silicon design technology scales to ultra deep submicron and even nanometers, the complexity and cost of testing is growing up, and the test power of such designs is extremely curious, especially for multicore processorsIn this paper, we use the modular design methodology and scaleable design-for-testability(DFT) structure to achieve low test power, at the same time, an improved test pattern generation method is studied to reduce test power further moreThe experimental results from the real chip show that the test power and test time are well balanced while achieving acceptable test coverage and cost.
Bugs are tending to be unavoidable in the design of complex integrated circuits. It is imperative to identify the bugs as soon as possible by post-silicon debug. The main challenge for post-silicon debug is the observ...
详细信息
ISBN:
(纸本)9781424437696
Bugs are tending to be unavoidable in the design of complex integrated circuits. It is imperative to identify the bugs as soon as possible by post-silicon debug. The main challenge for post-silicon debug is the observability of the internal signals. This paper exploits the fact that it is not necessary to observe the error free states. Then we introduce "suspect window" and present a method for determining its boundary. Based on suspect window, we propose a debug approach to achieve high observability by reusing scan chain. Since scan dumps take place only in suspect window, debug time is greatly reduced. Experimental results demonstrate the effectiveness of the proposed approach.
Superimpose one protein tertiary structure to another can help to find similarity between them and further identify functional and evolutionary relationships. We first extract invariant features under rigid body trans...
详细信息
Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, ...
详细信息
Test power consumption is becoming a major concern in low power integrated circuits(ICs). This paper presents a revised low power compression architecture for scan test. In this paper, the variance in power consumptio...
详细信息
For many Operating systems and device drivers, memory copy is the most time-consuming operation which has always been paid special attention. In this paper, we propose a processor DMA based memory copy hardware accele...
详细信息
With the dramatic increase in network speed during the past ten years, network processing efficiency has been significantly decreased. In this paper, we propose a network accelerating scheme, which employs cache locki...
详细信息
The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi- ported caches are costly to implement. In this paper we propose technique for...
详细信息
The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi- ported caches are costly to implement. In this paper we propose technique for using a simplified dual-ported cache instead, which is mostly composed of single- ported SRAMs, without decreasing the performance of the processor apparently. We evaluate this technique using realistic applications that include the operating system. Our technique using a simplified multi-ported banking cache, reduces the delay of select logic in LSQ by 16.1%, and achieves 98.1% of the performance of an ideal dual-ported cache.
Due to decreasing supply voltages and increasing power consumption of today’s VLSI chips,IR drops on on-chip power/ground(P/G) grids have to be explicitly considered during floor-planning stage in the today’s physic...
详细信息
ISBN:
(纸本)1424401607
Due to decreasing supply voltages and increasing power consumption of today’s VLSI chips,IR drops on on-chip power/ground(P/G) grids have to be explicitly considered during floor-planning stage in the today’s physical design flow. It is therefore very important to adjust the double-mesh P/G grids in the floor-planning for efficiently minimizing the worst-case IR drop subject to limited routing resource in early-stage P/G network design of high-end *** this paper, we present a novel feasible methodology to efficiently optimize the problem of mesh-structured center-bumped P/G grids under given routing ***[11],we have proposed the approximate current distribution(ACD) simulation method and the OSMACD optimization approach for early-stage single-level P/G meshes].In this work,a feasible theory is induced to directly compute the optimal solutions OSMACD for practical double-mesh P/G grids of high-end *** results show that OS DMACD matches very well with the exact counterparts inefficiently obtained with ICCG,which can leads to significant speedup in the today IR-drop aware floor-planning.
Kinetic Monte Carlo(KMC) algorithm has been widely applied for simulation of radiation damage, grain growth and chemical reactions. To simulate at a large temporal and spatial scale, domain decomposition is commonly u...
详细信息
Kinetic Monte Carlo(KMC) algorithm has been widely applied for simulation of radiation damage, grain growth and chemical reactions. To simulate at a large temporal and spatial scale, domain decomposition is commonly used to parallelize the KMC algorithm. However, through experimental analysis, we find that the communication overhead is the main bottleneck which affects the overall performance and limits the scalability of parallel KMC algorithm on large-scale clusters. To alleviate the above problems, we present a communication aggrega‐tion approach to reduce the total number of messages and eliminate the commu‐nication redundancy, and further utilize neighborhood collective operations to optimize the communication scheduling. Experimental results show that the opti‐mized KMC algorithm exhibits better performance and scalability compared with the well-known open-source library—SPPARKS. On 32-node Xeon E5-2680 cluster(total 640 cores), the optimized algorithm reduces the total execution time by 16 %, reduces the communication time by 50 % on average, and achieves 24 times speedup over the single node(20 cores) execution.
暂无评论