To estimate the overall rare probability of correlated failure events for complex system containing a large number of replicated cells, we summarize two novel techniques in this paper. First, Asymptotic Probability Ap...
详细信息
To estimate the overall rare probability of correlated failure events for complex system containing a large number of replicated cells, we summarize two novel techniques in this paper. First, Asymptotic Probability Approximation (APA) method is introduced to capture the correlations among the failures of different cells by exploring a series of carefully defined partial failure events. Next, the entire system failure rate can be approximated by using a linear function of the total number of cells. Second, to further improve the estimation accuracy, we present the Asymptotic Probability Evaluation (APE) approach. The key idea of APE is to approximate the failure rate of the entire system by solving a set of nonlinear equations derived from a general analytical model. The numerical experiments demonstrate that APA and APE can accurately and efficiently estimates the overall failure rate of correlated rare failure events.
LU decomposition is widely used in the field of numerical analysis and engineering to solve large-scale sparse linear equations. The complex data dependency makes it difficult to parallelize the LU decomposition. In t...
详细信息
LU decomposition is widely used in the field of numerical analysis and engineering to solve large-scale sparse linear equations. The complex data dependency makes it difficult to parallelize the LU decomposition. In this paper, an architecture with an efficient cache for parallel sparse LU decomposition using FPGA is proposed. The proposed architecture is based on the Gilbert-Peierls (G-P) algorithm. By using the elimination graph, we find the column dependency of the LU decomposition. It is thus possible to exploit the parallelism. Through a dependency table, a simple but efficient cache strategy and its corresponding architecture are proposed. The proposed cache strategy avoids the cache miss and reduces the size of cache used to store all the intermediate data on chip. The experiment demonstrates that, our design can achieve speedup of 2.85x-10.27x, compared with UMFPACK running on general purpose processors. The cache size can be reduced by 50.93% on average with the proposed cache strategy.
In this paper, a high-flexibility and energy-efficien reconfigurable symmetric cryptographic processor architecture is presented, which is based on very-long instruction word(VLIW) structure. By analyzing basic operat...
详细信息
ISBN:
(纸本)9781467397209
In this paper, a high-flexibility and energy-efficien reconfigurable symmetric cryptographic processor architecture is presented, which is based on very-long instruction word(VLIW) structure. By analyzing basic operations and storage characteristics of symmetric ciphers, the application-specific instruction-set system for symmetric ciphers is proposed. Eleven kinds of reconfigurable cryptographic arithmetic units are designed to support different operation modes and parameters for symmetric ciphers. It has been fabricated with 0.18μm CMOS technology, the test results show tha the max frequency can reach 200 MHz. Ten kinds of block stream and hash ciphers were mapped in our processor And the encryption throughput of AES, IDEA, Grain128SNOW2.0 and SHA-2 algorithm can achieve 882Mbps449 Mbps, 60 Mbps, 840 Mbps, and 287 Mbps respectively Moreover, the energy efficiency of AES implementation is 0.47 nJ /bit. The result demonstrated that proposed processor outperforms other designs in terms of energy efficiency, throughput and flexibility.
In this work, p-type SnO thin films by DC sputtering at low temperature and TFT structures were fabricated. A probable process window of sputtering atmosphere of a mixture of Ar-O was found for SnO TFTs application. F...
详细信息
ISBN:
(纸本)9781467397209
In this work, p-type SnO thin films by DC sputtering at low temperature and TFT structures were fabricated. A probable process window of sputtering atmosphere of a mixture of Ar-O was found for SnO TFTs application. Fabricated-type SnO TFTs with I/Iof 5ⅹ103 and mobility of 0.17 cm/V·s on AlO dielectrics were fabricated. An unusual drain current shake in subthreshold field was found and more measurements and analysis should be carried out to explain this phenomenon.
Coarse-grained reconfigurable block encryption array(REBA) provides massively parallel computing resources but traditional mapping scheme does not develop the advantages of REBA. In this paper, aiming to improve the...
详细信息
ISBN:
(纸本)9781467397209
Coarse-grained reconfigurable block encryption array(REBA) provides massively parallel computing resources but traditional mapping scheme does not develop the advantages of REBA. In this paper, aiming to improve the performance and resource efficiency of algorithm mapping, we research the structure of familiar block cipher algorithm, and propose the speed-up model based on loop unrolling and the modified strategy of unrolling in resource-constrained situation. Experimental results show that the proposed scheme develops the advantage of parallel resources that has 25-55 times higher throughput and 3-11 times higher throughput per unit of array area than the traditional scheme. Compared with other methods, our scheme has a higher performance-area ratio and smaller solving complexity.
To realize the high-speed performance of the processor, we need to research an efficient and flexible interconnection structure. In this paper, we propose a multistage interconnect structure based on Crossbar in the C...
详细信息
ISBN:
(纸本)9781467397209
To realize the high-speed performance of the processor, we need to research an efficient and flexible interconnection structure. In this paper, we propose a multistage interconnect structure based on Crossbar in the Coarse-Grained Reconfigurable Logic Array(CGRLA). Inner internet implements the connection of Functional operation unit flexibly and the outer internet implements the data transmission of different level of function units. Through the simulation verification, the results show that the structure we put up is better than similar design and there are some characteristics, such as small area, low occupancy rate of resources, high flexibility, high area transfer rate and so on, can effectively reduce the routing time in the algorithm implementation process, and improve the processing performance of the processor.
Stereo Match is one of the key fields in computer vision. Although many dense two-frame stereo algorithms have been developed in this domain, few utilize cross check and disparity gradient based refinement method. Thi...
详细信息
ISBN:
(纸本)9781467397209
Stereo Match is one of the key fields in computer vision. Although many dense two-frame stereo algorithms have been developed in this domain, few utilize cross check and disparity gradient based refinement method. This paper proposes:(1) Cross check method using two generated disparity maps based on left and right original images.(2) A novel occluded and low-texture region growth method based on disparity gradient.(3) Disparity voting method to reduce random error. These practical methods are hardware friendly and can notably improve match accuracy as well as lower the computational and space complexity. The proposed algorithm reaches the highest match accuracy for HR images among existing local methods, attesting to its outstanding effectiveness.
In this work, a novel hybrid computing architecture with memristor-based processing-in-memory (MPiM) is proposed to resolve the memory-wall issue for data-intensive applications. The datapath and control logic are red...
详细信息
In this work, a novel hybrid computing architecture with memristor-based processing-in-memory (MPiM) is proposed to resolve the memory-wall issue for data-intensive applications. The datapath and control logic are redesigned to incorporate MPiM. Thus, some data can be processed in the memory array or the specific logic near the memory array. So the data to be transfered over the bus can be reduced considerably, saving power as well as improving the performance. Evaluations on data-intensive applications show reductions of 80% in energy consumption and 75% reduction in processing latency.
In this paper, barrier and seed process with physical vapor deposition (PVD) methods for Cu interconnect were developed for the 28 nm node generations. We show that metal filling can be improved by optimizing the seed...
详细信息
In this paper, barrier and seed process with physical vapor deposition (PVD) methods for Cu interconnect were developed for the 28 nm node generations. We show that metal filling can be improved by optimizing the seed process. Under non-optimized condition, metal diffusion can be observed, which is mainly caused by the resputter steps during PVD. By increasing the barrier thickness, the metal diffusion can be eliminated. In these demonstrations, we measure the electrical characteristics, including metal line resistance and line leakage (<10 pA).
A standing wave oscillator(SWO) is a perfect clock source which can be used to produce a high frequency clock signal with a low skew and high reliability. However, it is difficult to tune the SWO in a wide range of fr...
详细信息
A standing wave oscillator(SWO) is a perfect clock source which can be used to produce a high frequency clock signal with a low skew and high reliability. However, it is difficult to tune the SWO in a wide range of frequencies. We introduce a frequency tunable SWO which uses an inversion mode metal-oxide-semiconductor(IMOS) field-effect transistor as a varactor, and give the simulation results of the frequency tuning range and power dissipation. Based on the frequency tunable SWO, a new phase locked loop(PLL) architecture is presented. This PLL can be used not only as a clock source, but also as a clock distribution network to provide high quality clock signals. The PLL achieves an approximately 50% frequency tuning range when designed in Global Foundry 65 nm 1P9 M complementary metal-oxide-semiconductor(CMOS) technology, and can be used directly in a high performance multi-core microprocessor.
暂无评论