Fixed-point processors are utilized in an enormous variety of applications, often for tasks that require the evaluation of mathematical functions. We present an automated method for mapping functions to such processor...
详细信息
Fixed-point processors are utilized in an enormous variety of applications, often for tasks that require the evaluation of mathematical functions. We present an automated method for mapping functions to such processors via polynomials that explicitly targets the native word length of the processor, thereby significantly reducing the execution time relative to commonly used floating-point emulation approaches based on traditional mathematical libraries. The methods presented here also contrast with hand-tuned processor-specific code, which has the potential to deliver efficient implementations but at the cost of significant design time. We describe an automated design flow utilizing multiword arithmetic to provide overflow protection and precision accurate to one unit in the last place (ulp). Analytical approaches are used to minimize the number of fixed-width operands required for each operation and to ensure that precision requirements are met. This allows automated generation of processor-optimized code and characterization of a design space representing a rich range of trade-offs among precision, latency, and memory cost.
We present an automated bit-width optimization methodology for polynomial-based hardware function evaluation. Due to the analytical nature of the approach, overflow protection and precision accurate to one unit in the...
详细信息
We present an automated bit-width optimization methodology for polynomial-based hardware function evaluation. Due to the analytical nature of the approach, overflow protection and precision accurate to one unit in the last place (ulp) can be guaranteed. A range analysis technique based on computing the root of the derivative of a signal is utilized to determine the minimal number of integer bits. Fractional bit requirements are established using an analytical error expression derived from the functions that occur along the data path. Global fractional bit optimization across multiple computation stages is performed using simulated annealing and circuit area estimation functions.
We present a hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardware-based simulation system, such...
详细信息
We present a hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardware-based simulation system, such as for exploring channel code behavior at very low bit error rates, as low as 10(-12) to 10(-13). The main novelties of this work are accurate analytical error analysis and bit-width optimization for the elementary functions involved in the Box-Muller method. Two 16-bit noise samples are generated every clock cycle and, due to the accurate error analysis, every sample is analytically guaranteed to be accurate to one unit in the last place. An implementation on a Xilinx Virtex-4 XC4VLX100-12 FPGA occupies 1,452 slices, three block RAMs, and 12 DSP slices, and is capable of generating 750 million samples per second at a clock speed of 375 MHz. The performance can be improved by exploiting concurrent execution: 37 parallel instances of the noise generator at 95 MHz on a Xilinx Virtex-II Pro XC2VP100-7 FPGA generate seven billion samples per second and can run over 200 times faster than the output produced by software running on an Intel Pentium-4 3 GHz PC. The noise generator is currently being used at the Jet Propulsion Laboratory, NASA to evaluate the performance of low-density parity-check codes for deep-space communications.
We present a methodology and an automated system for function evaluation unit generation. Our system selects the best function evaluation hardware for a given function, accuracy requirements, technology mapping, and o...
详细信息
We present a methodology and an automated system for function evaluation unit generation. Our system selects the best function evaluation hardware for a given function, accuracy requirements, technology mapping, and optimization metrics, such as area, throughput, and latency. Function evaluation f(x) typically consists of range reduction and the actual evaluation on a small convenient interval such as [0, pi/2) for sin(x). We investigate the impact of hardware function evaluation with range reduction for a given range and precision of x and f(x) on area and speed. An automated bit-width optimization technique for minimizing the sizes of the operators in the data paths is also proposed. We explore a vast design space for fixed-point sin(x), log(x), and root x p accurate to one unit in the last place using MATLAB and ASC, A Stream Compiler for Field-Programmable Gate Arrays (FPGAs). In this study, we implement over 2,000 placed-and-routed FPGA designs, resulting in over 100 million Application-Specific Integrated Circuit (ASIC) equivalent gates. We provide optimal function evaluation results for range and precision combinations between 8 and 48 bits.
暂无评论