LU decomposition is widely used in the field of numerical analysis and engineering to solve large-scale sparse linear *** complex data dependency makes it difficult to parallelize the LU *** this paper,an architecture...
详细信息
ISBN:
(纸本)9781509066261;9781509066254
LU decomposition is widely used in the field of numerical analysis and engineering to solve large-scale sparse linear *** complex data dependency makes it difficult to parallelize the LU *** this paper,an architecture with an efficient cache for parallel sparse LU decomposition using FPGA is *** proposed architecture is based on the Gilbert-Peierls(GP) *** using the elimination graph,we find the column dependency of the LU *** is thus possible to exploit the *** a dependency table,a simple but efficient cache strategy and its corresponding architecture are *** proposed cache strategy avoids the cache miss and reduces the size of cache used to store all the intermediate data on *** experiment demonstrates that,our design can achieve speedup of 2.85 x-10.27 x,compared with UMFPACK running on general purpose *** cache size can be reduced by 50.93% on average with the proposed cache strategy.
Band matrix multiplication is widely used in the concurrent system. But traditional Kung-Leiserson systolic array for band matrix multiplication cannot realize high cell efficiency because only about 1/3 cells are ope...
详细信息
Band matrix multiplication is widely used in the concurrent system. But traditional Kung-Leiserson systolic array for band matrix multiplication cannot realize high cell efficiency because only about 1/3 cells are operated in each step. Thus three alternative designs are presented based on the ideas of "Matrix compression" and "Super pipelined". These new arrays arrange and compress the data matrix skillfully, and add the Processing elements (PE) or readjust the operation sequence to increase the cell efficiency. These changes realize higher cell efficiency and faster operation speed with more intricate architectures. The results show that the best systolic array for band matrix multiplication can use almost 100% processing elements in each step, which is nearly triplication of the traditional Kung-Leiserson system. Also, these modifications increase the operation speed and at best spend only 1/3 processing time to complete the multiplication operation.
Hafnium tetrachloride is one of the most commonly used precursors for atomic layer deposition of hafnium based gate dielectrics. According to the previously reported experimental result, chlorine residue is almost una...
详细信息
Hafnium tetrachloride is one of the most commonly used precursors for atomic layer deposition of hafnium based gate dielectrics. According to the previously reported experimental result, chlorine residue is almost unavoidably incorporated and piled up near the interface. We performed first-principles calculations to study the effect of chlorine residue in HfSiO(4), which explained the experimental observations. The chlorine at interstitial site serves as a source of negative fixed charge, while the chlorine at oxygen substitutional site changes its charge state depending on the position of electron chemical potential within the HfSiO(4) band gap, which possibly enlarges the hysteresis of the gate dielectrics. Moreover, chlorine incorporation also reduces the band gap of HfSiO(4) by inducing lattice strain. (C) 2008 American Institute of Physics.
A robust low-complexity synchronization architecture for Digital Terrestrial Video Broadcasting (DVB-T) systems over fading channel is presented in this *** consists of symbol timing recovery, sampling clock recovery ...
详细信息
ISBN:
(纸本)0780392108
A robust low-complexity synchronization architecture for Digital Terrestrial Video Broadcasting (DVB-T) systems over fading channel is presented in this *** consists of symbol timing recovery, sampling clock recovery and carrier frequency *** this paper,all these problems are analyzed and an optimum system solution is *** results show that the proposed scheme works well even in the presence of 10% carrier frequency offset(CFO) and 330 ppm sampling frequency offset(SFO).Meanwhile,the circuit area is reduced by more than 40%compared to the conventional scheme.
This paper proposes a simplified AES algorithm resistant to zero-value DPA(Differential Power Analysis) attack and its VLSI *** paper makes some improvements to the additive masking AES algorithm to decrease its ***,s...
详细信息
ISBN:
(纸本)1424401615
This paper proposes a simplified AES algorithm resistant to zero-value DPA(Differential Power Analysis) attack and its VLSI *** paper makes some improvements to the additive masking AES algorithm to decrease its ***,such methods as module reuse and calculation order alteration are used to reduce chip area while maintaining its *** the HHNEC 0.25μm CMOS process,the scale of the design is about 43K equivalent gates and its system frequency is up to *** throughputs of the 128-bit data encryption and decryption are as high as 470Mbit/s.
With the rapid development on the software-hardware co-verification of SoC, FPGA verification has become more and more critical for VLSI Design,and it requires much more portion of time within the life circle of chip ...
详细信息
ISBN:
(纸本)0780392108
With the rapid development on the software-hardware co-verification of SoC, FPGA verification has become more and more critical for VLSI Design,and it requires much more portion of time within the life circle of chip *** time spent on the FPGA verification should be reduced to achieve a more efficient Time-to-Market for the IC product. Therefore,Several strategies using both dynamic and static methods to execute this verification are proposed in this *** using a variety of techniques such as software static breakpoint monitoring and interrupt vectors remapping,the software verification is accelerated.A bus analyzer is adopted to provide real-time bus monitoring with a vivid evaluation of the system *** this paper,experiments show that above methods have greatly enhanced the efficiency and speed of the FPGA co-verification process.
A scalable design of RSA Crypto-coprocessor is presented in this paper,which supports variable keys up to 4096- *** analyzing and improving the modified multiple-word Montgomery multiplication algorithm,its pipeline a...
详细信息
ISBN:
(纸本)1424401607
A scalable design of RSA Crypto-coprocessor is presented in this paper,which supports variable keys up to 4096- *** analyzing and improving the modified multiple-word Montgomery multiplication algorithm,its pipeline architecture is optimized and critical path is greatly ***, its performance is much higher compared with previous work. Therefore,the proposed design is very suitable to the low-cost and high-performance RSA cryptosystem and can be easily implemented in VLSI technology.
A direct conversion receiver for WLAN 802.11b is presented in 0.18μm CMOS *** contains a complete receiver chain with low noise amplifier,I/Q mixer,programmable gain amplifier and base band filter.A 4.8GHz divider is...
详细信息
ISBN:
(纸本)0780392108
A direct conversion receiver for WLAN 802.11b is presented in 0.18μm CMOS *** contains a complete receiver chain with low noise amplifier,I/Q mixer,programmable gain amplifier and base band filter.A 4.8GHz divider is used to generate 2.4GHz quadrature clock for I/Q *** reception path is dc coupled and a feed back low pass filter is added to reduce the dc-offset and 1/f *** noise figure of receiver is 5.2dB,the UP3 is -l4.5dBm at high gain *** the supply voltage of 1.8V,the over all power consummation is about 100mW. The chip area with pads is 2.6mm×2.5mm.
Stereo matching is a most challenging topic in artificial intelligence. Local algorithms are better than global ones with lower complexity and higher speed for HD real-time stereo matching. This paper proposes an opti...
详细信息
ISBN:
(纸本)9781467397209
Stereo matching is a most challenging topic in artificial intelligence. Local algorithms are better than global ones with lower complexity and higher speed for HD real-time stereo matching. This paper proposes an optimized local stereo algorithm adaptive window disparity estimation with cross check(AWDE-CC). A region growth algorithm to fill bad pixels is also proposed which can bring an improvement of the performance in low texture regions efficiently. The average error rate of the generated disparity maps is approximately 8%.
A standing wave oscillator(SWO) is a perfect clock source which can be used to produce a high frequency clock signal with a low skew and high reliability. However, it is difficult to tune the SWO in a wide range of fr...
详细信息
A standing wave oscillator(SWO) is a perfect clock source which can be used to produce a high frequency clock signal with a low skew and high reliability. However, it is difficult to tune the SWO in a wide range of frequencies. We introduce a frequency tunable SWO which uses an inversion mode metal-oxide-semiconductor(IMOS) field-effect transistor as a varactor, and give the simulation results of the frequency tuning range and power dissipation. Based on the frequency tunable SWO, a new phase locked loop(PLL) architecture is presented. This PLL can be used not only as a clock source, but also as a clock distribution network to provide high quality clock signals. The PLL achieves an approximately 50% frequency tuning range when designed in Global Foundry 65 nm 1P9 M complementary metal-oxide-semiconductor(CMOS) technology, and can be used directly in a high performance multi-core microprocessor.
暂无评论