Band matrix multiplication is widely used in the concurrent system. But traditional Kung-Leiserson systolic array for band matrix multiplication cannot realize high cell efficiency because only about 1/3 cells are ope...
详细信息
Band matrix multiplication is widely used in the concurrent system. But traditional Kung-Leiserson systolic array for band matrix multiplication cannot realize high cell efficiency because only about 1/3 cells are operated in each step. Thus three alternative designs are presented based on the ideas of "Matrix compression" and "Super pipelined". These new arrays arrange and compress the data matrix skillfully, and add the Processing elements (PE) or readjust the operation sequence to increase the cell efficiency. These changes realize higher cell efficiency and faster operation speed with more intricate architectures. The results show that the best systolic array for band matrix multiplication can use almost 100% processing elements in each step, which is nearly triplication of the traditional Kung-Leiserson system. Also, these modifications increase the operation speed and at best spend only 1/3 processing time to complete the multiplication operation.
Adaptive support-weight algorithm can generate high quality disparity map for stereo matching. But due to the complexity, it requires large internal memory size and bandwidth to meet the real-time constraint. In this ...
详细信息
Presented herein is a fast but accurate quantum C-V simulation, capable of extracting effective oxide thickness and other parameters based strictly on C-V data alone. The apparent C-V degradation in leaky dielectric M...
详细信息
In this paper, a new Hybrid Field Programmable Gate Array (FPGA) architecture is proposed. The logic tile, which consists of a logic cluster and related Connection Boxes (CBs), can be configured as either Programmable...
详细信息
A novel programmable security processor for cryptography algorithms is presented in this paper. The 16-bit length RISC-like instruction set and 3-stage pipeline provide low code density, low hardware cost and low powe...
详细信息
In this paper, a novel-structured electrochemical sensor array with five disk working electrodes, one arc counter electrode and one reference electrode is introduced. The array is fabricated by micro-electro-mechanica...
详细信息
This paper describes a 1.8-V, 8-bit, 125 Msample/s analog-to-digital converter (ADC) with a power-efficient architecture designed in a 0.18-μm CMOS technology. Through sharing an amplifier between two successive pipe...
详细信息
This paper proposed a VLSI architecture of resisting long echo channel estimation which is based on the algorithm proposed in [1]. FFT module reusing and clock gating are used in order to reduce the hardware complexit...
详细信息
A hardware/software co-processing system for speech recognition applications is proposed in this paper. The system consists of a soft-core microprocessor and a dedicated hardware accelerator implemented on an FPGA. Th...
详细信息
This paper presents a high performance design for Context-Based Adaptive Variable Length-Coding (CAVLC) used in the H.264/AVC standard. To reduce the cycles of processing one macroblock (MB), a two-stage residual enco...
详细信息
暂无评论