检索结果-内蒙古大学图书馆

distributed arithmetic-FIR filter design using Approximate Karatsuba Multiplier and VLCSA

EXPERT SYSTEMS WITH APPLICATIONS 2024年第PartB期249卷

作者： Krishnan, Sakkarai Samy Hari Vidhya, Krishnan SIMATS Saveetha Sch Engn Dept Elect & Commun Engn Chennai India

In this manuscript, a High Throughput and Low Latency DA-FIR filter design is integrated with Approximate Karatsuba Multiplier (AKM) and Variable Latency Carry Skip Adder (VLCSA) is proposed for the noise removal application in SDR. In this design, an AKM and VLCSA are considered to decrease the count of partial products on DA framework, although no multiplication is clearly performed. Thus, an important reduction is accomplished under accumulation circuits. The main execution problem of the DA system is the size of lookup table (LUT) increases exponentially through the length of inner product. To further diminish the memory complexity, approximate DA architectures depending truncation approach are needed. Partial products on DA are created through truncating least significant bits (LSBs) of inputs. The proposed hybrid technique lessens a count of LUTs by truncating the least significant bits of input operands. To further reduce the latency, this manuscript deals with one of the fastest multipliers, namely Approximate Karatsuba Multiplier employed for accumulation of partial products. The proposed design is performed in Verilog using Xilinx 14.5 ISE simulation. The experimental performances of the proposed DA-FIR-Hyb AKE-VLCSA filter is evaluated under lower delay, lower static power and compared with the existing filters.

关键词： Approximate Karatsuba Multiplier FIR Filter distributed arithmetic Lookup table Least significant bits Variable Latency Carry Skip Adder

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Optimization of distributed arithmetic-Based Convolutional Algorithms for FIR Filters on FPGA 34

Performance Analysis and Optimization of Distributed Arithme...

引用

34th Irish Signals and Systems Conference (ISSC)

作者： Chen, Cheng Romashchenko, Vladyslav Brutscheck, Michael Chmielewski, Ingo Anhalt Univ Appl Sci Hsch Anhalt D-06366 Kothen Germany

ISBN: (纸本)9798350340570

distributed arithmetic (DA) implementation for finite impulse response (FIR) filters on field-programmable gate arrays (FPGAs) is highly desirable in digital signal processing due to its fast computational speed and low power consumption. However, traditional LUT-based DA implementation on FPGAs is challenging because of its high memory space requirements. To overcome this challenge, LUT-partition and MUX-incorporation techniques have been proposed to reduce memory space, but they also increase the FPGA resource utilization. Furthermore, the inherent serial nature of DA computing can limit data throughput. Parallel processing of multiple bits can improve computational performance but at the cost of chip area. Therefore, it is beneficial to combine optimization methods to achieve desired performance. This paper proposes a comprehensive approach for optimizing memory space, computational performance, and chip area by analyzing different LUT partitions and incorporating MUX configurations. The proposed method is evaluated on a Xilinx Zynq 7010 FPGA, demonstrating its effectiveness.

关键词： FIR-Filter FPGA distributed arithmetic VHDL Digital Signal Processing Convolutional Algorithms Sum of Products Fixed-Point Parallel Architecture

来源：评论

学校读者我要写书评

暂无评论

An Energy-efficient and High-precision Approximate MAC with distributed arithmetic Circuits 22

An Energy-efficient and High-precision Approximate MAC with ...

引用

32nd Great Lakes Symposium on VLSI (GLSVLSI)

作者： Cui, Ziying Chen, Ke Wu, Bi Yan, Chenggang Liu, Weiqiang Nanjing Univ Aeronaut & Astronaut Nanjing Jiangsu Peoples R China

ISBN: (纸本)9781450393225

In this paper, an approximate distributed arithmetic (DA) based parallel MAC is proposed. First, by adopting three kinds of approximation methods, the novel structure significantly reduces hardware complexity. Then, the result is compensated according to the analysis of the probability to enhance the precision. The hardware and error metric evaluation demonstrates that the proposed MAC achieves 25% power-delay product reduction while maintaining better precision. Finally, the Gaussian Blur application is employed to verify the proposed DA-based MAC with 6dB average PSNR improvement compared with recent state-of-the-art work.

关键词： Approximate Computing distributed arithmetic MAC Low-power design

来源：评论

学校读者我要写书评

暂无评论

Efficient Architecture for the Realization of 2-D Adaptive FIR Filter Using distributed arithmetic

引用

CIRCUITS SYSTEMS AND SIGNAL PROCESSING 2021年第3期40卷 1458-1478页

作者： Shrivastava, Prabhat Chandra Kumar, Prashant Tiwari, Manish Dhawan, Amit MNNIT Allahabad Dept Elect & Commun Engn Prayagraj 211004 UP India

This paper presents an efficient architecture for two-dimensional (2-D) adaptive FIR filter architecture using the distributed arithmetic (DA) algorithm. DA-based filter architectures essentially require look-up tables (LUT). In the proposed filter architecture, RAM- or ROM-based LUT is replaced by adders- and logic gates-based structure that generates the LUT value corresponding to the input. Therefore, the MAC unit requires fewer logic gates and adders in DA-based realization. In addition, the memory sharing concept in architecture reduces the delay elements. Moreover, the complexity of the LUT hardware of higher-order filters is reduced by parallelly dividing the internal MAC block for the DA decomposition which offers a higher degree of modularity and parallelism in the proposed architecture. Further, 2-D delayed LMS algorithm is used for the updation of the filter coefficient weights. Furthermore, two-stage pipelining is used to reduce the critical path of the architecture and it also makes critical path delay independent of the order of the filter. ASIC synthesis results reveal the advantages of the proposed structure by reducing the area, power, ADP and EDP by 54%, 48.19%, 55% and 49%, respectively, as compared with the existing architecture for filter size 8 x 8.

关键词： 2-D adaptive filter distributed arithmetic Hardware-based LUT Multiplier-less filters Raster scanning 2-D delayed LMS algorithm

来源：评论

学校读者我要写书评

暂无评论

An In-Situ Sliding Window Approximate Inner-Product Scheme Based on Parallel distributed arithmetic for Ultra-Low Power Fault-Tolerant Applications

An In-Situ Sliding Window Approximate Inner-Product Scheme B...

引用

IEEE International Midwest Symposium on Circuits and Systems (MWSCAS)

作者： Rizk, Dominick Rizk, Rodrigue Rizk, Frederic Kumar, Ashok Univ Louisiana Lafayette Ctr Adv Comp Studies Lafayette LA 70504 USA

ISBN: (纸本)9781665424615

Approximate computing (AC) provides an efficient solution for reducing power, area, and complexity of digital systems. When backed with distributed arithmetic (DA), AC leverages the ability to implement ultra-efficient inner-product units in terms of area, power, and delay. Such units can be used in any inherently resilient application. This paper presents a novel scheme of approximate inner-product based on parallel DA for low-power fault-tolerant applications backed with a novel in-situ sliding window algorithm. Our model eliminates the need for an explicit error correction scheme, which further reduces the overhead while improving the accuracy. The experimental results show that our model achieves a state-of-the-art performance in terms of power delay product (PDP), area power product (APP) with a reduction of 39.26% and 48.83%, respectively.

关键词： Approximate computing distributed arithmetic DSP inner-product LUT memory VLSI SOP

来源：评论

学校读者我要写书评

暂无评论

An Efficient Modified distributed arithmetic Architecture Suitable for FIR Filter 6

An Efficient Modified Distributed Arithmetic Architecture Su...

引用

6th International Conference on Wireless Communications, Signal Processing and Networking (IEEE WiSPNET)

作者： Narendiran, S. Jayakumar, E. P. Natl Inst Technol Dept Elect & Commun Engn Calicut Kerala India

ISBN: (纸本)9781665440868

One of the essential components of a Digital Signal Processing (DSP) system is the Finite Impulse Response (FIR) filter. FIR filter uses the Multiply and Accumulate (MAC) operation for its computation. Conventional MAC units are slow and consume high power, making them unsuitable for energy-constrained devices. The MAC operations in FIR filter uses constant filter coefficients as one of its inputs. This situation is well suited for a bit-serial technique such as distributed arithmetic (DA). However, the traditional DA has the drawback of using huge memory resources as the filter order increases. An efficient LUT-less Modified distributed arithmetic architecture is proposed in this paper to solve the memory problem. This architecture removes the need for precomputation of weighted sums needed for the LUT in a DA using multiplexers and adders. Also, the architecture is designed to extend the range of input values. Further, a 16-Tap FIR filter is designed, synthesized with Xilinx ISE, and implemented for an XC4VSX35-FF668-10 based FPGA to measure the performance of this architecture. Our implementation results show that the design uses fewer resources and achieves faster filtering than the filter's previous implementations.

关键词： distributed arithmetic Multiply and Accumulate FIR Filter FPGA

来源：评论

学校读者我要写书评

暂无评论

An Analytical Framework and Approximation Strategy for Efficient Implementation of distributed arithmetic-Based Inner-Product Architectures

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS 2020年第1期67卷 212-224页

作者： Ray, Dwaipayan George, Nithin, V Meher, Pramod Kumar Indian Inst Technol Gandhinagar Dept Elect Engn Gandhinagar 382355 India Qualcomm Bangalore Design Ctr Bengaluru 560066 India Nanyang Technol Univ Cyber Secur Res Ctr Singapore 639798 Singapore

distributed arithmetic (DA)-based approximate structures are used for efficient implementation of inner-products in various error-resilient applications. In the existing literature, most of these approximate architectures are developed by truncating the least significant bits (LSBs) of the inputs and/or the multiplying coefficients. The existing works do not provide any analytical study to evaluate and design an approximate structure. To address this issue, an analytical framework is proposed in this paper. It is shown that the analytical results match very closely with the Monte Carlo simulation results. The proposed framework reveals that the truncation of the LSBs of partial inner-products is a promising alternative to design more efficient DA architectures with less error. Following these observations, a novel approach to truncate the LSBs of partial inner-products, namely, a weight-dependent truncation strategy and its two variants with a suitable error compensation function are presented in this paper. Synthesis results, accuracy analysis, and evaluation in two commonly used error-tolerant applications demonstrate the superiority of the proposed architectures over the state-of-the-art DA-based approximate structures.

关键词： Approximate computing distributed arithmetic inner-product computation probabilistic analysis energy efficient designs

来源：评论

学校读者我要写书评

暂无评论

Efficient Real-Time Digital Subcarrier Cross-Connect (DSXC) Based on distributed arithmetic DSP Algorithm

引用

JOURNAL OF LIGHTWAVE TECHNOLOGY 2020年第13期38卷 3495-3505页

作者： Xu, Tong Fumagalli, Andrea Hui, Rongqing Univ Kansas Dept Elect Engn & Comp Sci Lawrence KS 66044 USA Univ Texas Dallas Erik Jonsson Sch Engn & Comp Sci Richardson TX 75083 USA

Two real-time functions of digital subcarrier cross-connect (DSXC) are experimentally demonstrated for the first time using distributed arithmetic (DA) in a field programmable gate array (FPGA) platform. Both frequency translation and channel selection in DSXC are implemented using DA-based resampling filters, achieving flexible modulation format and fine data-rate granularity of many concurrent subcarrier channels. Compared with traditional resampling filters that leverage multipliers, the DA-based approach eliminates the need for DSP slices in the FPGA implementation and significantly reduces the hardware cost. By requiring only a few clock periods, the DA-based resampling filter is also significantly faster when compared to conventional FIR filters, whose overall latency is proportional to the filter order. The DA-based DSXC is therefore able to achieve improved spectral efficiency and programmability of multiple orthogonal subcarrier channels, while keeping low cross-connection latency and requiring low cost hardware resources when implemented in a FPGA platform.

关键词： Finite impulse response filters Optical filters Field programmable gate arrays OFDM Real-time systems Table lookup distributed arithmetic DSXC FPGA frequency translation resampling filter

来源：评论

学校读者我要写书评

暂无评论

A memory and area-efficient distributed arithmetic based modular VLSI architecture of 1D/2D reconfigurable 9/7 and 5/3 DWT filters for real-time image decomposition

引用

JOURNAL OF REAL-TIME IMAGE PROCESSING 2020年第5期17卷 1421-1446页

作者： Chakraborty, Anirban Banerjee, Ayan Indian Inst Engn Sci & Technol Elect & Telecommun Engn Dept Sibpur Howrah India

In this article, we have proposed the internal architecture of a dedicated hardware for 1D/2D convolution-based 9/7 and 5/3 DWT filters, exploiting bit-parallel 'distributed arithmetic' (DA) to reduce the computation time of our proposed DWT design while retaining the area at a comparable level to other recent existing designs. Despite using memory extensive bit-parallel DA, we have successfully achieved 90% reduction in the memory size than that of the other notable architectures. Through our proposed architecture, both the 9/7 and 5/3 DWT filters can be realized with a selection input, mode. With the introduction of DA, we have incorporated pipelining and parallelism into our proposed convolution-based 1D/2D DWT architectures. We have reduced the area by 38.3% and memory requirement by 90% than that of the latest remarkable designs. The critical-path delay of our design is almost 50% than that of the other latest designs. We have successfully applied our prototype 2D design for real-time image decomposition. The quality of the architecture in case of real-time image decomposition is measured by 'peak signal-to-noise ratio' and 'computation time', where our proposed design outperforms other similar kind of software- and hardware-based implementations.

关键词： DWT distributed arithmetic Memory efficient Digital VLSI design Parallelism Image decomposition PSNR

来源：评论

学校读者我要写书评

暂无评论

Novel DWT/IDWT Architecture for 3D with Nine Stage 2D Parallel Processing using Split distributed arithmetic

引用

INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS 2020年第3期20卷

作者： Divakara, S. S. Patilkulkarni, Sudarshan Raj, Cyril Prasanna Coorg Inst Technol Kodagu 571216 Karnataka India JSS Sci & Technol Univ Mysore 570006 Karnataka India MS Coll Engn Bangalore 562110 Karnataka India

Novel high-speed memory optimized distributed arithmetic (DA)-based architecture is developed and modeled for 3D discrete wavelet transform (DWT). The memory requirement for the proposed architecture is designed to 9 x 9N + 36 pixel dynamic memory space and 52W ROM. The proposed 3D-DWT architecture implements 9/7 Daubechies wavelet filters, synthesizes 7127 bytes of memory for temporary storage and uses 758 adders, 36 multiplexers of 16:1 and 36 up counter to realize the 3D-DWT hardware. The 3D-DWTengine is implemented and tested in a Xilinx FPGA Vertex5 XC5VLX155T with high area and power e +/- ciency. The maximumdelay in the timing path is 2.676 ns and the 3D-DWT works at maximum frequency of 381MHz clock.

关键词： 3D-DWT split architecture parallel processing distributed arithmetic FPGA

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：