Different from the traditional wireless networks, in wireless sensor networks(WSNs) , power consumption is a critical aspect, because current nodes are always supported by constrained-energy battery. A particular chal...
详细信息
Different from the traditional wireless networks, in wireless sensor networks(WSNs) , power consumption is a critical aspect, because current nodes are always supported by constrained-energy battery. A particular challenge in maintaining extended battery lifetime lies in achieving communications with low power. For better understanding of the design trade offs of WSNs, a more accurate power consumption model for wireless sensor node is proposed. Different from models ever showed, in which the power consumption of each component in WSN node was assumed constant , the new one takes into account energy dissipation of circuits in practical physical layer. It shows that there are some parameters such as data rate, carrier frequency, RF output power et al, which have a significant effect on the WSN node with the metric energy per useful bit (EPUB). Then an optimal design method to make the network energy efficient is described by adjusting one or more of these parameters. The efficiency of the strategy was validated by mathematical analysis and simulations. The results showed that the EPUB can be reduced by optimally choosing the rate-power combination based on the proposed power consumption model for a given load.
This paper proposed hierarchical fault tolerance techniques for ultrahigh-density memories based on 3- dimension interconnect technology. It describes how to implement hierarchical architecture with different granular...
详细信息
This paper proposed hierarchical fault tolerance techniques for ultrahigh-density memories based on 3- dimension interconnect technology. It describes how to implement hierarchical architecture with different granularity redundancies and how to combine error correction code (ECC), built-in self-test (BIST), built-in repair-analysis (BIRA), and built-in self-repair (BISR) capabilities. Simulation is employed to estimate the memory behavior of various configurations, and experimental results indicate that the proposed method has substantial reliability improvements over conventional techniques. For a memory with 1% bit-level failure rate and 50% device-level defect density, the proposed method can gain 100% reliability by using less than 30% extra overhead. It proves the availability of the proposed architecture.
This paper describes the scan test challenges and techniques used in the Godson-3 microprocessor, which is a scalable multicore processor based on the SMOC (scalable mesh of crossbar) on-chip network and targets high-...
详细信息
This paper describes the scan test challenges and techniques used in the Godson-3 microprocessor, which is a scalable multicore processor based on the SMOC (scalable mesh of crossbar) on-chip network and targets high-end applications. Advanced techniques are adopted to achieve the scalable, low-power and low-cost scan architecture at the challenge of limited I/O resources and large scale of transistors. To achieve a scalable and flexible test access, a highly elaborate TAM (test access mechanism) is implemented with supporting multiple test instructions and test modes. Taking advantage of multiple cores embedding in the processor, scan partitions are employed to reduce test power and test time, and test compression with more than 10X compression ratio are utilized to decrease the scan chain length. To further decrease test time, a data-synchronous-comparator (DSC) is proposed for comparing the scan responses of the identical cores.
Soft Errors have emerged as a key challenge to microprocessor design. Traditional soft error tolerance techniques (such as redundant multithreading and instruction duplication) can achieve high fault coverage but at t...
详细信息
Soft Errors have emerged as a key challenge to microprocessor design. Traditional soft error tolerance techniques (such as redundant multithreading and instruction duplication) can achieve high fault coverage but at the cost of significant performance degradation. Prior research reports that soft errors can be masked at the architecture level, and the degree of such masking, named as architecture vulnerability factor (AVF), can vary significantly across workloads and individual structures, hence strict redundant execution may not be necessary for soft error tolerance. In this work, we exploit the AVF varying feature to adaptively tune reliability and performance. We present an infrastructure to online compute and predict AVF for three microprocessor structures (IQ, ROB, and LSQ), guiding when the protection scheme should be activated to improve reliability. Experimental results show that our method can efficiently compute the AVF for different structures independent of hardware configurations. The average differences between our method and a prior offline AVF computing method are 0.10, 0.01, and 0.039 for IQ, ROB, and LSQ, respectively.
Dynamic programming has been one of the most efficient approaches to sequence analysis and structure prediction in biology. However, their performance is limited due to the drastic increase in both the number of biolo...
详细信息
Dynamic programming has been one of the most efficient approaches to sequence analysis and structure prediction in biology. However, their performance is limited due to the drastic increase in both the number of biological data and variety of the computerarchitectures. With regard to such predicament, this paper creates excellent algorithms aimed at addressing the challenges of improving memory efficiency and network latency tolerance for nonserial polyadic dynamic programming where the dependences are nonuniform. By relaxing the nonuniform dependences, we proposed a new cache oblivious scheme to enhance its performance on memory hierarchy architectures. Moreover we develop and extend a tiling technique to parallelize this nonserial polyadic dynamic programming using an alternate block-cyclic mapping strategy for balancing the computational and memory load, where an analytical parameterized model is formulated to determine the tile volume size that minimizes the total execution time and an algorithmic transformation is used to schedule the tile to overlap communication with computation to further minimize communication overhead on parallel architectures. The numerical experiments were carried out on several high performance computersystems. The new cache-oblivious dynamic programming algorithm achieve 2-10 speedup and the parallel tiling algorithm with communication-computation overlapping shows a desired potential for fine-grained parallel computing on massively parallel computersystems
Scan-based test methodology is used to resolve the sequential-test problem but suffers from high power dissipation. In this paper, we propose a scheme to prevent transitions of scan chain from reflecting into the circ...
详细信息
Scan-based test methodology is used to resolve the sequential-test problem but suffers from high power dissipation. In this paper, we propose a scheme to prevent transitions of scan chain from reflecting into the circuit line. It not only can save 23% power consumption without performance loss, but also can be easily implemented with popular industrial design tools.
Nondeterminism of multi-clock systems often complicates various system validation processes such as post silicon debugging and at-speed testing, which has brought many difficulties to system designers and testers. The...
详细信息
ISBN:
(纸本)9783981080162
Nondeterminism of multi-clock systems often complicates various system validation processes such as post silicon debugging and at-speed testing, which has brought many difficulties to system designers and testers. The major source of nondeterministic behaviors is clock domain crossing, because the clocks that determine the timing of events are sensitive to variations. In this paper, we propose a general method to eliminate the nondeterminism resulted from clock domain crossing. This method does not assume any specific relationship among the clocks. Instead, to adapt to various clock conditions, an automatic configuration procedure and a periodic error canceling mechanism, which only require trivial hardware support, are proposed by analyzing the deterministic boundaries theoretically. To demonstrate the applicability of our method in practice, we implement it on a FPGA platform. Experiment results validate that the performance loss brought by our method over conventional multi-clock FIFO is less than 2%.
Faster-than-at-speed testing provides an effective way for detecting and debugging small delay defects in modern fabricated chips. However, the use of external automatic test equipment for faster-than-at-speed delay t...
详细信息
ISBN:
(纸本)9783981080162
Faster-than-at-speed testing provides an effective way for detecting and debugging small delay defects in modern fabricated chips. However, the use of external automatic test equipment for faster-than-at-speed delay testing could be costly. In this paper, we present an on-chip clock generation scheme which facilitates faster-than-at-speed delay testing for both launch on capture and launch on shift test frameworks. The required test clock frequency with a high resolution can be obtained by specifying the information in the test patterns, which is then shifted into the delay control stages to configure the launch and capture clock generation circuit (LCCG) embedded on-chip. Similarly, the control information for selecting various test frameworks and clock signals can also be embedded in the test patterns. Experimental results are presented to validate the proposed scheme.
CAM is widely used in microprocessors and SOC TLB modules. It gives great advantage for software development. And TLB operations become bottleneck of the microprocessor performance. The test cost of normal BIST approa...
详细信息
CAM is widely used in microprocessors and SOC TLB modules. It gives great advantage for software development. And TLB operations become bottleneck of the microprocessor performance. The test cost of normal BIST approach of the CAM can not be ignored. The paper analyses the fault models of CAM and proposes an instruction suitable march-like algorithm. The algorithm requires 14N+2L operations, where N is the number of words of the CAM and L is the width of a word. The algorithm covers 100% targeted faults. Instruction-level test using the algorithm has not any test cost on area and performance. Moreover the algorithm can be used in BIST approaches and have less performance lost for microprocessors. The paper instances the algorithm in a MIPS compatible microprocessor and have good results.
Enhanced scan delay testing approach can achieve high transition delay fault coverage by a small size of test pattern set but with significant hardware overhead. Although the implementation cost of launch on capture (...
详细信息
Enhanced scan delay testing approach can achieve high transition delay fault coverage by a small size of test pattern set but with significant hardware overhead. Although the implementation cost of launch on capture (LOC) approach is relatively low, the generated pattern set for testing delay faults is typically very large. In this paper, we present a novel flip-flop selection method to combine the respective advantages of the two approaches, by replacing a small number of selected regular scan cells with enhanced scan cells, thus to reduce the overall volume of transition delay test patterns effectively. Moreover, higher fault coverage can also be obtained by this approach compared to the standard LOC approach. Experimental results on larger ISCAS-89 and ITC-99 benchmark circuits using a commercial test generation tool show that the volume of test patterns can be reduced by over 70% and the transition delay fault coverage can be improved by up to 8.7%.
暂无评论