Runtime-reconfigurable, mixed-radix FFT/IFFT engines are essential for modern wireless communication systems. To comply with varying standards requirements, these engines are customized for each modem. The Chisel hard...
详细信息
ISBN:
(纸本)9781479999880
Runtime-reconfigurable, mixed-radix FFT/IFFT engines are essential for modern wireless communication systems. To comply with varying standards requirements, these engines are customized for each modem. The Chisel hardware construction language has been used in this work to create a generator of runtime-reconfigurable 2(n)3(m)5(k) FFT engines targeting software-defined radios (SDR) for modern communications, but with flexibility to support a wide range of applications. The generator uses a conflict-free, in-place, multi-bank SRAM design, and exploits the duality of decimation-in-frequency (DIF) and decimation-in-time (DIT) FFTs to support continuous data flow with only 2N memory blocks. DFT decomposition using the prime-factoralgorithm (PFA) followed by the Cooley-Tukey algorithm (CTA) reduces twiddle ROM sizes. A programmable Winograd's Fourier Transform (WFTA) butterfly supporting radix-2/3/4/5/7 operations reuses radix-7 hardware to support reconfigurability with minimal area penalty. The generated FFTs use 50% less memory than iterative FFTs from Spiral. The twiddle ROM size of the generated LTE/WiFi FFT engine is 16% smaller than that of a 2048-pt Spiral design.
Abstract: This paper presents an efficient memory-based fast Fourier transform processor including 35 different working sizes for LTE systems. A factorization method named high-radix-small-butterfly combined with a co...
详细信息
ISBN:
(纸本)9781479953424
Abstract: This paper presents an efficient memory-based fast Fourier transform processor including 35 different working sizes for LTE systems. A factorization method named high-radix-small-butterfly combined with a conflict-free address scheme for 2~p3~q5~r point memory-based FFT processor is proposed. The processor can not only provide conflict-free concurrent data access from different memory banks but also continuous-flow working mode. Moreover, we exploit prime factor algorithm to decrease the multiplications and twiddle factor storage. In addition, a unified Winograd Fourier transform algorithm butterfly core was designed for the small 2, 3, 4, 5-point DFTs. The FFT processor was implemented in a SMIC 55nm CMOS process with core area 1.063mm~2. The chip consumes 40 8mW at 122.88MHz operating frequency with 1.08V voltage supply.
Dedicated hardware accelerators enable energy-efficient implementations of radio and imaging basebands. Multi-standard, multi-mode radio basebands require an on-the-fly reconfigurable fast Fourier transform (FFT) acce...
详细信息
ISBN:
(纸本)9781538631782
Dedicated hardware accelerators enable energy-efficient implementations of radio and imaging basebands. Multi-standard, multi-mode radio basebands require an on-the-fly reconfigurable fast Fourier transform (FFT) accelerator that implements many different FFT sizes. An instance of a runtime-reconfigurable 2(n)3(m)5(k) FFT accelerator was generated by a custom hardware generator to meet the requirements of common wireless standards (Wi-Fi, LTE). The accelerator is integrated with a RISC-V processor, and the measured 16nm FinFET chip runs up to 940MHz and consumes 0.46 to 22.6mW of power when running FFT benchmarks for Wi-Fi and LTE symbol lengths.
A fast Fourier transform algorithm for computing N=N_1×N_2-point DFT, where both factors N_1 and N_2 are smaller positive integer, said to be a double factors algorithm (DFA), is developed. The DFA subdivides a D...
详细信息
ISBN:
(纸本)9781424439874
A fast Fourier transform algorithm for computing N=N_1×N_2-point DFT, where both factors N_1 and N_2 are smaller positive integer, said to be a double factors algorithm (DFA), is developed. The DFA subdivides a DFT of length N=N_1×N_2 into smaller transforms of length N_1 and N_2 and takes the following steps:(1) computes N_1 N_2-point DFTs, (2) multiplies the values of DFT by twiddle factors, (3) computes N_2 N_1-point DFTs. The structure of the DFA is similar to those of the most simple PFA and WFTA, but N_1 and N_2 are not necessarily relatively prime. When N=2~M or 4~M, the total number of computations of DFT in the DFA is less than those in the radix-2 and radix-4 FFT algorithm but slightly more than that in the split-radix FFT algorithm. When N is other values, the total number of computations of DFT in the DFA is less than those in the PFA and WFTA.
暂无评论