This paper proposes efficient fixed-point and floating-point implementations for radix-10 square root in Xilinx FPGAs devices. The method implements digitrecurrence with restoring algorithm, which supports the three ...
详细信息
This paper proposes efficient fixed-point and floating-point implementations for radix-10 square root in Xilinx FPGAs devices. The method implements digitrecurrence with restoring algorithm, which supports the three decimal floating-point (DFP) types specified in the IEEE 754-2008 standard. The technique used for restoring is optimal and novel. The designs use new techniques based on the efficient utilization of dedicated resources in the programmable devices. Implementations were made in Xilinx 7-series devices. For fixed-point square root, they are capable of operating up to 212 MHz for p=7, 197 MHz for p=16, and 190 MHz for p=34. As for DFP square root, the operation frequency obtained is 194 MHz for p=7, 183 MHz for p=16, and 174 MHz for p=34. The proposed architecture achieves better computation times than related works.
In embedded computing, it is common to find applications such as signal processing, image processing, computer graphics or data compression that might benefit from hardware implementation for the computation of intege...
详细信息
In embedded computing, it is common to find applications such as signal processing, image processing, computer graphics or data compression that might benefit from hardware implementation for the computation of integer roots of order [GRAPHICS] . However, the scientific literature lacks architectural designs that implement such operations for different values of N, using a low amount of resources. This article presents a parameterisable field programmable gate array (FPGA) architecture for an efficient Nth root calculator that uses only adders/subtractors and [GRAPHICS] location memory elements. The architecture was tested for different values of [GRAPHICS] , using 64-bit number representation. The results show a consumption up to 10% of the logical resources of a Xilinx XC6SLX45-CSG324C device, depending on the value of N. The hardware implementation improved the performance of its corresponding software implementations in one order of magnitude. The architecture performance varies from several thousands to seven millions of root operations per second.
This paper presents the algorithm and architecture of the decimal floating-point (DFP) logarithmic converter, based on the digit-recurrence algorithm with selection by rounding. The proposed approach can compute faith...
详细信息
This paper presents the algorithm and architecture of the decimal floating-point (DFP) logarithmic converter, based on the digit-recurrence algorithm with selection by rounding. The proposed approach can compute faithful DFP logarithm results for any one of the three DFP formats specified in the IEEE 754-2008 standard. In order to optimize the latency for the proposed design, we mainly integrate the following novel features: 1) using the redundant carry-save representation of the data path;2) reducing the number of iterations by determining the number of initial iteration;and 3) retiming and balancing the delay of the proposed architecture. The proposed architecture is synthesized with STM 90-nm standard cell library and the results show that the critical path delay and the number of clock cycles of the proposed Decimal64 logarithmic converter are 1.55 ns (34.4 FO4) and 19, respectively, and the total hardware complexity is 43,572 NAND2 gates. The delay estimation results of the proposed architecture show that its latency is close to that of the binary radix-16 logarithmic converter, and that it has a significant decrease on latency compared with a recently published high performance CORDIC implementation.
This paper presents a new design and implementation of a 32-bit decimal floating-point (DFP) logarithmic converter based on the digit-recurrence algorithm. The converter can calculate accurate logarithms of 32-bit DFP...
详细信息
ISBN:
(纸本)9780769536705
This paper presents a new design and implementation of a 32-bit decimal floating-point (DFP) logarithmic converter based on the digit-recurrence algorithm. The converter can calculate accurate logarithms of 32-bit DFP numbers which are defined in the IEEE 754-2008 standard. Redundant digit e(1) is obtained by look-up table in the first iteration and the rest redundant digits e(j) are selected by rounding the scaled remainder during the succeeding iterations. The sequential architecture of the proposed 32-bit DFP logarithmic converter is implemented on Xilinx Virtex-II Pro P30 FPGA device and then synthesized with TMSC 0.18-um standard cell library. The implementation results indicate that the maximum frequency of the proposed architecture is 47.7 MHz in FPGA and 107.9 MHz in TMSC 0.18-um technology. The faithful 32-bit DFP logarithm results can be obtained in 18 cycles.
We propose a radix-r digit-recurrence algorithm for complex square-root. The operand is prescaled to allow the selection of square-root digits by rounding of the residual. This leads to a simple hardware implementatio...
详细信息
ISBN:
(纸本)0769522262
We propose a radix-r digit-recurrence algorithm for complex square-root. The operand is prescaled to allow the selection of square-root digits by rounding of the residual. This leads to a simple hardware implementation of digit selection. Moreover, the use of digitrecurrence approach allows correct rounding of the result if needed. The algorithm, compatible with the complex division presented in Ercegovac and Muller ("Complex Division with Prescaling of the Operands," in Proc. Application-Specific Systems, Architectures, and Processors (ASAP'03), The Hague, The Netherlands, June 24-26, 2003), and its design are described. We also give rough estimates of its latency and cost with respect to implementation based on standard floating-point instructions as used in software routines for complex square root.
A hardware algorithm is proposed for improving the speed of the linear digit-recurrence logarithmic algorithm. The convergence rate of this logarithmic algorithm is exponential. Furthermore, the size of the lookup tab...
详细信息
A hardware algorithm is proposed for improving the speed of the linear digit-recurrence logarithmic algorithm. The convergence rate of this logarithmic algorithm is exponential. Furthermore, the size of the lookup tables used in the algorithm is smaller than the size of the lookup tables used in the digit-recurrence algorithms. When the word length of the operand is less than or equal to 64 bits, the operations involved in each stage of the logarithmic computation only include small table lookup operation, digit-multiplication, and simple square operations. We conclude that the hardware implementation of our proposed algorithm is very efficient.
A digit-recurrence algorithm for cube rooting is proposed. In cube rooting, the digit-recurrence equation of the residual includes the square of the partial result of the cube root. In the proposed algorithm, the squa...
详细信息
A digit-recurrence algorithm for cube rooting is proposed. In cube rooting, the digit-recurrence equation of the residual includes the square of the partial result of the cube root. In the proposed algorithm, the square of the partial result is kept, and the square, as well as the residual, is updated by addition/subtraction, shift, and multiplication by one or two digits. Different specific versions of the algorithm are possible, depending on the radix, the digit set of the cube root, and etc. Any version of the algorithm can be implemented as a sequential (folded) circuit or a combinational (unfolded) circuit, which is suitable for VLSI realization.
In this work, we present a reciprocal square root algorithm by digitrecurrence and selection by a staircase function and the radix-4 implementation. As in similar algorithms for division and square root, the results ...
详细信息
In this work, we present a reciprocal square root algorithm by digitrecurrence and selection by a staircase function and the radix-4 implementation. As in similar algorithms for division and square root, the results are obtained correctly rounded in a straightforward manner (in constrast to existing methods to compute the reciprocal square root). Although, apparently, a single selection function can only be used for j greater than or equal to 2 (the selection constants are different for j = 0, j = 1, and j greater than or equal to 2), we show that it is possible to use a single selection function for all iterations. We perform a rough comparison with existing methods and we conclude that our implementation is a low hardware complexity solution with moderate latency, especially for exactly rounded results. We also extend the unit to support division and square root with the same selection function and with slight modifications in the initialization of the reciprocal square root unit.
A digit-recurrence algorithm for computing the Euclidean norm of a three-dimensional (3D) vector which often appears in 3D computer graphics is proposed. One of the three squarings required for the usual computation i...
详细信息
A digit-recurrence algorithm for computing the Euclidean norm of a three-dimensional (3D) vector which often appears in 3D computer graphics is proposed. One of the three squarings required for the usual computation is removed and the other two squarings, as well as the two additions, are overlapped with the square rooting. The Euclidean norm is computed by iteration of carry-propagation-free additions, shifts, and multiplications by one digit. Different specific versions of the algorithm are possible, depending on the radix, the redundancy factor of the digit set, and etc. Each version of the algorithm can be implemented as a sequential (folded) circuit or a combinational (unfolded) circuit, which has a regular array structure suitable for VLSI.
A very-high radix digit-recurrence algorithm for the operation root s/d is developed, with residual scaling and digit selection by rounding. This is an extension of the division and square-root algorithms presented pr...
详细信息
A very-high radix digit-recurrence algorithm for the operation root s/d is developed, with residual scaling and digit selection by rounding. This is an extension of the division and square-root algorithms presented previously, and for which a combined unit was shown to provide a fast execution of these operations. The architecture of a combined unit to execute division, square-root, and root x/d is described, with inverse square-root as a special case. A comparison with the corresponding combined division and square-root unit shows a similar cycle time and an increase of one cycle for the extended operation with respect to square-root. To obtain an exactly rounded result for the extended operation a datapath of about 2n bits is needed. An alternative is proposed which requires approximately the same width as for square-root, but produces a result with an error of less than one ulp. The area increase with respect to the division and square root unit should be no greater than 15 percent. Consequently, whenever a Very high radix unit for division and square-root seems suitable, it might be profitable to implement the extended unit instead.
暂无评论