Digital signal processing (DSP) systems are becoming popular with the emergence of artificial intelligence and machine learning based applications. Residue number system is one of most sought representation for implem...
详细信息
Digital signal processing (DSP) systems are becoming popular with the emergence of artificial intelligence and machine learning based applications. Residue number system is one of most sought representation for implementing the high speed DSP systems. This paper presents an efficient implementation of memory less distributed arithmetic (MLDA) architecture in finite impulse response filter with residual number system. The input data and filter coefficients of MLDA are in residue number form and the output data from MLDA is converted into binary form using Chinese remainder theorem. In addition, compressor adders are used to reduce the area. For real time validation, the proposed design has been simulated and synthesized in application specific integrated circuit platform using synopsis design compiler with CMOS 90 nm technology. The results show that the proposed design has very high computation speed with total delay of only 20 ns and occupies 20% less area in comparison with the existing designs.
In this paper, we propose a low complex architectural design for hearing aid applications. In this, we recast the hearing aid using distributed arithmetic (DA), which enables the implementation of hearing aid without ...
详细信息
In this paper, we propose a low complex architectural design for hearing aid applications. In this, we recast the hearing aid using distributed arithmetic (DA), which enables the implementation of hearing aid without multipliers. The design is based on the distributed arithmetic based formulation of it. It is further shown that high order filters, which are required to implement high-speed hearing aid can be realized using only look-up-tables and shift-accumulate operations. A novel approach was proposed to replace the decimation filter of a hearing aid using multiplier less architecture with a single DA unit. By proper initialization, it is shown that low complexity hearing aid architecture can be obtained. The proposed distributed arithmetic architecture is implemented in ASIC SAED 90 nm technology. The application of hearing aid is implemented in Matlab Simulink and Xilinx system generator tool. The obtained results show 20% less area delay product and 40% less power delay product when compared with the existing architecture.
Real-time data processing systems utilize Digital Signal Processing (DSP) functions as the base modules. Most of the DSP functions involve the implementation of Fast Fourier Transform (FFT) to convert the signals from...
详细信息
Real-time data processing systems utilize Digital Signal Processing (DSP) functions as the base modules. Most of the DSP functions involve the implementation of Fast Fourier Transform (FFT) to convert the signals from one domain to another domain. The major bottleneck of Decimation in frequency-Fast Fourier Transform (DIF-FFT) implementation lies in using a number of Multipliers. distributed arithmetic (DA) is considered as one of the efficient techniques to implement DIF-FFT. In this approach, the multipliers are not used. The proposed technique exploits the very advantage of the look-up table by storing the Twiddle factors, thereby avoiding the multipliers required in the butterfly structure. DIF-FFT using distributed arithmetic (DIF-FFT DA) models, with different adders such as Ripple carry adder (RCA), Carry-lookahead adder (CLA), and Sklansky prefix graph adder, are proposed in this paper. The three proposed models are synthesized using Cadence 6.1 EDA tools with a 45nm CMOS technology. Compared to the traditional method, it is observed that the area is improved by 53.11%, 53.35%, and 50.15%, power is improved by 42.31%, 42.52%, and 40.39%, and delay is improved by 45.26%, 45.42%, 41.80%, respectively.
MAC is an essential core which is used in every Digital signal processor. The primary focal point of this article is to introduce a high performance distributed based (DA) MAC and offset binary coding distributed Arit...
详细信息
ISBN:
(纸本)9781728172231
MAC is an essential core which is used in every Digital signal processor. The primary focal point of this article is to introduce a high performance distributed based (DA) MAC and offset binary coding distributed arithmetic (DA) based MAC for real time Signal Processing Applications. Addition and multiplication are the two hardware resources widely used to design any arithmetic blocks in many fields like video processing, audio processing, speech processing and medical image processing applications. In this article, a literature survey is done on different MAC [2] units with different multipliers to generate partial products and to perform accumulation. Developed a DA based and offset binary coding DA based MAC cores which offers greater speed compared with different conventional MAC's using various multipliers. The coding for DA and offset based architectures are done using Verilog and simulation, synthesis are performed in Xilinx 14.7 Integrated Simulation Environment version. It achieves best area and less delay result when compared with previous approximate adder designs. The results of DA based MAC cores gives much more efficient in delay whereas offset binary coding-based DA offers both speed and area optimization.
In this paper, an efficient VLSI architecture of distributed arithmetic (DA) based block least mean square (BLMS) adaptive finite impulse response (ADFIR) filter implementation with parallel processing is proposed. In...
详细信息
In this paper, an efficient VLSI architecture of distributed arithmetic (DA) based block least mean square (BLMS) adaptive finite impulse response (ADFIR) filter implementation with parallel processing is proposed. In DA scheme, the filter partial products are precomputed and saved in lookup table (LUT) and then by using shift and accumulation operations filtering can be done. To improve the efficiency, a high level of parallelism is incorporated in the design of variable coefficient ADFIR filter. The parallel LUT followed by shift-accumulate operations replaces multiply and accumulate (MAC) operations. By using BLMS algorithm in ADFIR filter with block length P gives P times fast throughput. Since memory reuse concept is used, the design requires less number of registers to calculate output vector and coefficient increment vector. The proposed design is implemented on FPGA. The implementation results indicate that the proposed design is a low power and high speed architecture. The proposed structure provides 47.4% less power and 22.7% less delay when compared to existing designs. (C) 2019 Elsevier Ltd. All rights reserved.
distributed arithmetic (DA) brings area and power benefits to digital designs relevant to the Internet-of-Things. Therefore, a new error resilient technique for DA computation is proposed to improve robustness against...
详细信息
distributed arithmetic (DA) brings area and power benefits to digital designs relevant to the Internet-of-Things. Therefore, a new error resilient technique for DA computation is proposed to improve robustness against process, voltage, and temperature variations. The proposed approach mitigates the effect of timing violations by first providing a guardband for significant (most significant bit) computations. This guardband is initially achieved by modifying the order of DA serial operations and borrowing time from the least significant bit (ISB) group. Therefore, LSB computation can correspond to the critical path, and timing error can be tolerated at the cost of acceptable accuracy loss. Moreover, the shifted-phase clock signals are applied on the end-point registers, thereby increasing the global guardband without any effect on system sampling rate. Our approach is demonstrated on a 16-tap FIR filter using the 65 nm CMOS process. The simulation results demonstrate that this design can maintain error-free operation without worst case timing margin, and achieve up to 42% power savings by voltage scaling when the worst case margin is considered. This is at the cost of a 6.3% delay and 7.3% overhead.
In this paper, a fixed-point finite impulse response adaptive filter is proposed using approximate distributed arithmetic (DA) circuits. In this design, the radix-8 Booth algorithm is used to reduce the number of part...
详细信息
In this paper, a fixed-point finite impulse response adaptive filter is proposed using approximate distributed arithmetic (DA) circuits. In this design, the radix-8 Booth algorithm is used to reduce the number of partial products in the DA architecture, although no multiplication is explicitly performed. In addition, the partial products are approximately generated by truncating the input data with an error compensation. To further reduce hardware costs, an approximate Wallace tree is considered for the accumulation of partial products. As a result, the delay, area, and power consumption of the proposed design are significantly reduced. The application of system identification using a 48-tap bandpass filter and a 103-tap high-pass filter shows that the approximate design achieves a similar accuracy as its accurate counterpart. Compared with the state-of-the-art adaptive filter using bit-level pruning in the adder tree (referred to as the delayed least mean square (DLMS) design), it has a lower steady-state mean squared error and a smaller normalized misalignment. Synthesis results show that the proposed design attains on average a 55% reduction in energy per operation (EPO) and a 3.2x throughput per area compared with an accurate design. Moreover, the proposed design achieves 45%-61% lower EPO compared with the DLMS design. A saccadic system using the proposed approximate adaptive filter based cerebellar model achieves a similar retinal slip as using an accurate filter. These results are promising for the large-scale integration of approximate circuits into high-performance and energy-efficient systems for error-resilient applications.
In this paper an efficient implementation of decision feed back equalizer (DFE) is carried out using novel memory less distributed arithmetic (NMLDA) filter. In wireless transmission systems, DFEs are used to mitigate...
详细信息
In this paper an efficient implementation of decision feed back equalizer (DFE) is carried out using novel memory less distributed arithmetic (NMLDA) filter. In wireless transmission systems, DFEs are used to mitigate the inter-symbol interference (ISI). The ISI is occurred due to multi-path propagation of the transmitted signal. High data rate systems demand higher order filters in DFE architectures which increase complexity in hardware design. In our proposed NMLDA design, we have used multiplexers and enhanced compressor adders in place of memory unit and conventional adders. The proposed design occupies lower area and gives higher throughput, when compared to MAC based filter and all other memory based DA filter architectures. By using proposed NMLDA based DFE, the ISI errors in transmission signal, will be minimized and the performance of the transmission system will be enhanced. We have synthesized the NMLDA of 32-tap, 16-tap, 8-tap and 4-tap filters and implemented them on FPGA device. The proposed design has nearly 70% less number of logical elements than OBC DA and 50% less than MDA and offers better throughput than the existing designs when implemented on Altera Cyclone III EP3C55F484C6.
This paper proposes a new architecture of 2-D block FIR filter using distributed arithmetic (DA) algorithm, which is known for the efficient design of multiply and accumulate block. Hardware-based architecture is prop...
详细信息
This paper proposes a new architecture of 2-D block FIR filter using distributed arithmetic (DA) algorithm, which is known for the efficient design of multiply and accumulate block. Hardware-based architecture is proposed for DA lookup table (DA-LUT) that makes the architecture of 2-D FIR filter reconfigurable. Further, due to block processing, sharing takes place among DA-LUTs at various stages. Thus, a common DA-LUT may be designed for block inputs which reduce the hardware complexity for DA-LUT. Furthermore, memory overlapping is used to reduce the systolic architectures in proposed design over existing designs. For higher-order 2-D FIR filter, the complexity of DA-LUT is reduced by dividing the internal block into parallel and small blocks. With the help of ASIC synthesis results, a comparative analysis of proposed design with the earlier reported designs is presented, and it is shown that the proposed design leads to significant improvements in various performance parameters.
Dedicated hardware for "Discrete Wavelet Transform" (DWT) is at high demand for real-time imaging operations in any standalone electronic devices, as DWT is being extensively utilized for most of the transfo...
详细信息
Dedicated hardware for "Discrete Wavelet Transform" (DWT) is at high demand for real-time imaging operations in any standalone electronic devices, as DWT is being extensively utilized for most of the transform-domain imagery applications. Various DWT algorithms exist in the literature facilitating its software implementations which are generally unsuitable for real-time imaging in any stand-alone devices due to their power intensiveness and huge computation time. In this paper, a convolutional DWT-based pipelined and tunable VLSI architecture of Daubechies 9/7 and 5/3 DWT filter is presented. Our proposed architecture, which mingles the advantages of convolutional and lifting DWT while discarding their notable disadvantages, is made area and memory efficient by exploiting "distributed arithmetic' (DA) in our own ingenious way. Almost 90% reduction in the memory size than other notable architectures is reported. In our proposed architecture, both the 9/7 and 5/3 DWT filters can be realized with a selection input, "mode". With the introduction of DA, pipelining and parallelism are easily incorporated into our proposed 1D/2D DWT architectures. The area requirement and critical path delay are reduced to almost 38.3% and 50% than that of the latest remarkable designs. The performance of the proposed VLSI architecture also excels in real-time applications.
暂无评论