This paper presents a novel class of division algorithm that reduces the delay of calculus introducing more concurrency in computation. The algorithm is suitable for fixed-point operands and divides in a radix r = 2(k...
详细信息
ISBN:
(纸本)9781424438914
This paper presents a novel class of division algorithm that reduces the delay of calculus introducing more concurrency in computation. The algorithm is suitable for fixed-point operands and divides in a radix r = 2(k), producing k bits at each iteration. The proposed digit recurrence algorithm has two different architectures, a first one for general hardware implementation, and the second one optimized for configurable logic. Results show a speedup greater to three times respect to a classical non-restoring division implemented in Xilinx Devices. The dividers were also compared against Xilinx CoreGenerator circuits clearly outperforming latency and area
The interconnection networks used by current fine grain FPGAs are not scalable for very big array sizes. To address this issue, we apply the GALS (Globally Asynchronous and Locally Synchronous) paradigm to build scala...
详细信息
ISBN:
(纸本)9781424438914
The interconnection networks used by current fine grain FPGAs are not scalable for very big array sizes. To address this issue, we apply the GALS (Globally Asynchronous and Locally Synchronous) paradigm to build scalable FPGAs. The logic resources are divided into locally synchronous tiles and asynchronous communications among different tiles. To route the asynchronous communications, we build a serial network-on-chip. Targeting streaming applications, we propose a design flow that maps user applications to our new FPGA architecture. To validate our architecture and design flow, we build an emulation prototype and develop a JPEG baseline encoder as the case study. We have successfully demonstrated the concept and predict a maximum frequency of 224MHz for designs mapping to sFPGA2 architecture.
Configurable architectures offer the unique opportunity of customizing the storage allocation to meet specific applications' needs. In this paper we describe a compiler approach to map the arrays of a loop-based c...
详细信息
ISBN:
(纸本)9781424403127
Configurable architectures offer the unique opportunity of customizing the storage allocation to meet specific applications' needs. In this paper we describe a compiler approach to map the arrays of a loop-based computation to internal memories of a configurable architecture with the objective of minimizing the overall execution time. We present an algorithm that considers the data access patterns of the arrays along the critical path of the computation as well as the available storage and memory bandwidth. We demonstrate experimental results of the application of this approach for a set of kernel codes when targeting a field-programmable Gate-Array (FPGA). The results reveal that our algorithm outperforms naive and custom data layouts for these kernels by an average of 33% and 15% in terms of execution time, while taking into account the available hardware resources.
This paper presents a comparison between two technologies for reconfigurable circuits: FPGA's and FPAA's. The comparison is based on a case study of the area of industrial control using simulations with both t...
详细信息
ISBN:
(纸本)9781424438914
This paper presents a comparison between two technologies for reconfigurable circuits: FPGA's and FPAA's. The comparison is based on a case study of the area of industrial control using simulations with both types of reconfigurable devices. Several design issues are discussed, including the ease of implementation, accuracy, capacity, consumption and size, among others. Based on the case study, we present qualitative directions to choose the most suitable reconfigurable device for similar applications.
Current trends show, it is increasingly difficult to manage the constraints of costs, power consumption, size and more than everything else, functional safety, with conventional architectures. This paper presents a ne...
详细信息
ISBN:
(纸本)9781424438914
Current trends show, it is increasingly difficult to manage the constraints of costs, power consumption, size and more than everything else, functional safety, with conventional architectures. This paper presents a new architecture to deal with the current and upcoming requirements in safety critical applications. It proposes the use of diverse redundancy with digital and analog channels, to detect random hardware failures as well as systematic failures. That will increase the functional safety. By exploiting the ability of dynamic and partial hardware reconfiguration of FPGA and FPAA and by using the appropriate failure recovery scenario, the system availability can also be increased. Furthermore, the architecture offers the possibility to combine high accuracy with short response time.
The PhD project described in this paper aims to use word-length optimization techniques to automatically optimize the dynamic power consumption of high-level descriptions of DSP algorithms intended for implementation ...
详细信息
ISBN:
(纸本)9781424403127
The PhD project described in this paper aims to use word-length optimization techniques to automatically optimize the dynamic power consumption of high-level descriptions of DSP algorithms intended for implementation on FPGA, before or during synthesis. By developing models which can quickly estimate the power consumed by a system from a high-level description of the algorithm it implements, our work will allow for existing word-length optimization techniques to minimize the power consumption of a system, subject to acceptable signal distortion constraints.
This paper presents a fast and scalable method of computing signal toggle rate in FPGA-based circuits. Our technique is a vectorless estimation technique, which can be used in a CAD tool to identify the parts of the c...
详细信息
ISBN:
(纸本)9781424419609
This paper presents a fast and scalable method of computing signal toggle rate in FPGA-based circuits. Our technique is a vectorless estimation technique, which can be used in a CAD tool to identify the parts of the circuit that can benefit from power optimization. A key advantage of our approach is its ability to efficiently account for spatial correlation of related logic cones, which is accomplished using a novel XOR-based decomposition. In addition, our approach uses post-routing circuit delays to account for glitches in a logic circuit. The proposed approach was tested on 14 MCNC benchmark circuits compiled for the Altera Stratix II devices. The results indicate that our method improves the vectorless estimation technique available in the latest version of Altera's Quartus II commercial CAD tool, reducing the average error by 37% and standard deviation by 59%.
Large digital signal processing applications like particle filtering require a tradeoff between execution time and area in order to scale on FPGAs. This research focuses on developing a methodology to make this tradeo...
详细信息
ISBN:
(纸本)9781479900046
Large digital signal processing applications like particle filtering require a tradeoff between execution time and area in order to scale on FPGAs. This research focuses on developing a methodology to make this tradeoff based on structure in the mathematical description of the application. Structure is expressed using higher-order functions which are transformed using tradeoff rules to reduce area usage on FPGA.
Molecular dynamics (MD) is of central importance to computational chemistry. Here the authors show that MD can be implemented efficiently on a commercial off-the-shelf (COTS) fieldprogrammable gate array (FPGA) board...
详细信息
Molecular dynamics (MD) is of central importance to computational chemistry. Here the authors show that MD can be implemented efficiently on a commercial off-the-shelf (COTS) fieldprogrammable gate array (FPGA) board, and that speed-ups from 31 x to 88 x over a PC implementation can be obtained. Although the extent of speed-up depends on the stability required, 46x can be obtained with virtually no detriment, and the upper end of the range is apparently viable in many cases. The authors sketch the FPGA implementations and describe the effects of precision on the trade-off between the performance and quality of the MD simulation.
In recent years, the RapidSmith CAD tool [1] has been used with ISE to create custom CAD tools targeting Xilinx FPGAs. This tool flow was based on the Xilinx Design Language (XDL), a human-readable representation of a...
详细信息
暂无评论