Floating-point (FP) multiplication finds application in image and signal processing. This paper presents a hardware implementation of optimized IEEE 754 single precision floating-point multiplier. The design is simula...
详细信息
ISBN:
(纸本)9780769531106
Floating-point (FP) multiplication finds application in image and signal processing. This paper presents a hardware implementation of optimized IEEE 754 single precision floating-point multiplier. The design is simulated using Modelsim and synthesized using Virtix E Xilinx ISE. An improvement of 57.77% in area and 44.52% in delay is shown.
The recent explosion in the number of handheld multimedia devices has created a need for energy-efficient computation due to limited battery lifetimes. We focus on multiplication, which is needed in several applicatio...
详细信息
ISBN:
(纸本)0769522319
The recent explosion in the number of handheld multimedia devices has created a need for energy-efficient computation due to limited battery lifetimes. We focus on multiplication, which is needed in several application domains, e.g., 3D graphics, signal processing, and cryptography. We introduce an asynchronous implementation of a plain booth multiplier (i.e., radix-2) which is both area- and energy-efficient, and therefore suitable for mobile applications. This paper makes the following contributions. First, a novel counterflow organization is introduced, in which the data bits flow in one direction, and the booth commands piggyback on the acknowledgments flowing in the opposite direction. Second, the arithmetic and shifter units are merged together to obtain significant improvement in area, energy as well as speed. Third, our design performs overlapped execution of multiple iterations of the booth algorithm. Finally, the design is quite modular, which allows scaling to arbitrary operand widths, without gate resizing or cycle time overheads. Spice simulations in a 0.18mum TSMC process at 1.8V indicate promising performance: the multiplier takes 1.08ns per booth iteration, regardless of the operand widths, thereby demonstrating the scalability of our approach. In addition, the multiplier is fully functional at reduced supply voltages (e.g., 1.0V), and thus capable of dynamically trading off performance for energy efficiency.
The computational abilities of today’s parallel supercomputers are often quite impressive, but these machines can be impractical for some researchers due to prohibitive costs and limited availability. These researche...
详细信息
The computational abilities of today’s parallel supercomputers are often quite impressive, but these machines can be impractical for some researchers due to prohibitive costs and limited availability. These researchers might be better served by a more personal solution such as a "hardware acceleration" peripheral for a PC. FPGAs are the ideal device for the task: their configurability allows a problem to be translated directly into hardware, and their reconfigurability allows the same chip to be reprogrammed for a different problem.
Efficient FPGA computation of parallel problems calls for cellular computing, which uses an array of independent, locally connected processing elements, or cells, that compute a problem in parallel. The architecture of the computing cells determines the performance of the FPGA-based computer in terms of the cell density possible and the speedup over conventional single-processor computation.
This thesis presents the design and performance results of four computing-cell architectures. MULTIPLE performs all operations in one cycle, which takes the least amount of time but requires the most chip area. BIT performs all operations bit-serially, which takes a long time but allows a large cell density. The two other architectures, SINGLE and booth, lie within these two extremes of the area/time spectrum.
The performance results show that MULTIPLE provides the greatest speedup over common calculation software, but its usefulness is limited by its small cell density. Thus, the best architecture for a particular problem depends on the number of computing cells required. The results also show that with further research, next-generation FPGAs can be expected to accelerate single-processor computations as much as 22,000 times.
This paper makes a nice connection between digital signal representation, all potential two's complement multiplier for FPGA and one of the most original and powerful method for multiplication : The booth Algorith...
详细信息
ISBN:
(纸本)0818689145
This paper makes a nice connection between digital signal representation, all potential two's complement multiplier for FPGA and one of the most original and powerful method for multiplication : The booth algorithm. The paper identifies the applications where constant coefficient multipliers cannot be used and states the advantages and drawbacks of all other techniques. At this point, the description of our Anti-Jamming IIR Notch Filter [3] in XILINX FPGA becomes easier. This application note describes the functionality and integration of a real time IIR Filter using 2 large multipliers at very high sampling rate in a XC4020EPG223-2 Device (rtp to 40Msamples/s). It also reveals the solution to an interesting design problem which emerges, and some additional enhancements since other papers, introducing hybrid technique. booth algorithm shows to improve the CLB density and speed of FPGA circuit without any pipeline needed in IIR filtering.
High-speed multipliers are essential building blocks for modern computers, signal processing and other digital systems. A new parallel multiplier configuration is developed in this paper by using the signed digital nu...
详细信息
High-speed multipliers are essential building blocks for modern computers, signal processing and other digital systems. A new parallel multiplier configuration is developed in this paper by using the signed digital number systems incorporated with the modified version of booth's algorithm. The carry propagation chain has been broken in Add/ Substract operations. Thus it can perform NxN bit multiplication in parallel with a time proportional to log2 (N/2). It is almost double the speed of Wallace tree for which its computing time is proportional to log2N. The number of computing cells is proportional to n. log2 (n/2) which is less than that of the conventional multiplier. A regular array structure for this scheme is suitable for VLSI implementation.
Práce se zabývá koncepčním návrhem elementárního procesoru. Tento procesor řeší diferenciální rovnice za pomocí Eulerovy metody. Práce je rozdělena na dvě ...
详细信息
Práce se zabývá koncepčním návrhem elementárního procesoru. Tento procesor řeší diferenciální rovnice za pomocí Eulerovy metody. Práce je rozdělena na dvě významné části. V první se řeší návrh procesoru, jež pracuje v aritmetice pevné řádové čárky. Na základě tohoto návrhu je v druhé části uveden návrh procesoru pracujícího v aritmetice plovoucí řádové čárky.
Předkládaná práce se zabývá návrhem systému pro výpočet vícenásobných integrálů pro různé diferenční výrazy prostorové proměnné. V ...
详细信息
Předkládaná práce se zabývá návrhem systému pro výpočet vícenásobných integrálů pro různé diferenční výrazy prostorové proměnné. V dnešní době je výpočet integrálů jedním z důležitých problémů inženýrství. Čtenář je nejdříve seznámen s různými metodami výpočtu integrálu. Následně je seznámen s numerickou integrací a využitím Taylorova rozvoje v numerické integraci. Praktickým cílem této práce je návrh softwarového a hardwarového systému pro výpočet vícenásobných integrálů.
暂无评论