This paper describes a method based on polynomial approximation for transferring ROM resources used in FPGA designs to multiplication and addition operations. The technique can be applied to any FPGA architecture cont...
详细信息
This paper describes a method based on polynomial approximation for transferring ROM resources used in FPGA designs to multiplication and addition operations. The technique can be applied to any FPGA architecture containing embedded multiplication, however this paper focuses on using the DSP blocks of Altera Stratix and Stratix ii architectures. The transformation is combined with other resource transfers and integrated in a synthesis flow targeting designs implemented on heterogeneous FPGAs. The main advantage of such a system is in handling user constraints on each type of resource: DSP block, LUT and ROM, in addition to timing-related constraints. The flow is based on an extension to the Altera Quartus ii synthesis software and Quartus University Interface Program (QUIP) framework. Results are provided for implementations of benchmark algorithms and it is shown through a design-space exploration that the set of achievable designs for the algorithms has been extended by the use of the proposed methods.
In this paper, we present an arithmetic sum-of-products (SOP) based realization of the general Multiple Constant Multiplication (MCM) algorithm. We also propose an enhanced SOP based algorithm, which uses Partial Max-...
详细信息
ISBN:
(纸本)9781424481927
In this paper, we present an arithmetic sum-of-products (SOP) based realization of the general Multiple Constant Multiplication (MCM) algorithm. We also propose an enhanced SOP based algorithm, which uses Partial Max-SAT (PMSAT) to further optimize the SOP. The enhanced algorithm attempts to reduce the number of rows (partial products) of the SOP, by i) shifting coefficients to realize other coefficients when possible, ii) exploring multiple implementations of each coefficient using a Minimal Signed Digit (MSD) format and iii) exploiting the mutual exclusiveness within certain groups of partial products. Hardware implementations of the Fast Fourier Transform (FFT) algorithm require the incoming data to be multiplied by one of several constant coefficients. We test/validate it for FFT, which is an important problem. We compare our SOP-based architectures with the best existing implementation of MCM for FFT (which utilizes a cascade of adders), and show that our approaches show a significant improvement in area and delay. Our architecture was synthesized using 65nm technology libraries.
System developers have found that exploiting parallel architectures for control systems is challenging and often the resulting implementations do not provide the expected performance advantages over traditional unipro...
详细信息
Previously, most mammalian auditory systems research has concentrated on human sensory perception whose frequencies are lower than 20 kHz. The implementations almost always used analog VLSI design. Due to the complexi...
Previously, most mammalian auditory systems research has concentrated on human sensory perception whose frequencies are lower than 20 kHz. The implementations almost always used analog VLSI design. Due to the complexity of the model, it is difficult to implement these algorithms using current digital technology. This paper introduces a simplified model of biosonic reception system in bats and its implementation in the ‘‘Chiroptera Inspired Robotic CEphaloid’’ (CIRCE) project. This model consists of bandpass filters, a half‐wave rectifier, low‐pass filters, automatic gain control, and spike generation with thresholds. Due to the real‐time requirements of the system, the system employs Butterworth filters and advanced field programmable gate array (FPGA) architectures to provide a viable solution. The ultrasonic signalprocessing is implemented on a Xilinx FPGA Virtex ii device in real time. In the system, 12‐bit input echo signals from receivers are sampled at 1 M samples per second for a signal frequency range from 20 to 200 kHz. The system performs a 704‐channel per ear auditory pipeline operating in real time. The output of the system is a coded time series of threshold crossing points. Comparing hardware implementation with fixed‐point software, the system shows significant performance gains with no loss of accuracy.
暂无评论