Implementing Dynamic Voltage and Frequency Scaling (DVFS) is a non-trivial task on FPGAs and requires knowledge about the feasible voltage and frequency (VF) ranges as a first step. the VF feasible ranges depend not o...
详细信息
ISBN:
(纸本)9781467381239
Implementing Dynamic Voltage and Frequency Scaling (DVFS) is a non-trivial task on FPGAs and requires knowledge about the feasible voltage and frequency (VF) ranges as a first step. the VF feasible ranges depend not only on the size of the critical path in the design but also on the inter-and intra-die variability on the FPGA die. Moreover, the variations in the configuration of the FPGA highly affect feasible VF ranges. therefore, it is crucial to characterise feasibility by studying the relationship between feasible VF regions and these sources of variability in FPGAs. In this paper we employ a self-checking multiplier which uses residue codes and DVFS implemented on the programmablelogic component of a Xilinx Zynq ZC702 device as an error-detection circuit to study these feasible regions. Results show that, as expected, feasible VF ranges vary with FPGA configuration. More interestingly, significant variation of the feasible VF regions is found for different dies. these results highlight the necessity of dynamic self-testing as a part of an adaptive DVFS implementation on FPGAs. Employing the techniques presented in this work enables the implementation of efficient adaptive on-line DVFS on programmablelogic while ensuring reliability.
Reconfiaurable logic Devices are classified as the fine-grained or coarse-rained type on the basis of their basic logic cell architecture. In general, each architecture has its own merit;therefore, it is difficult to ...
详细信息
ISBN:
(纸本)9781424410590
Reconfiaurable logic Devices are classified as the fine-grained or coarse-rained type on the basis of their basic logic cell architecture. In general, each architecture has its own merit;therefore, it is difficult to achieve a balance between the operation speed and implementation area in various applications. In this paper, we propose a Variable Grain logic Cell (VGLC) architecture, which consists of a 4-bit ripple carry adder with configuration memory bits and also develop technology mapping tool. Its key feature is the variable granularity being a trade-off between coarse-grained and fine-grained types required for the implementation arithmetic and random logic, respectively. As a result, critical path delay, and number of configuration memory bits are reduced by 49.7%, and 48.5%, respectively, in the benchmark circuits.
the SMILE project accelerates scientific and industrial applications by means of a cluster of low-cost FPGA boards. Withthis approach the intensive calculation tasks are accelerated using the FPGA logic, while the co...
详细信息
ISBN:
(纸本)9781424419609
the SMILE project accelerates scientific and industrial applications by means of a cluster of low-cost FPGA boards. Withthis approach the intensive calculation tasks are accelerated using the FPGA logic, while the communication patterns of the applications remains unchanged by using a Message Passing Library over Linux. this paper explains the cluster architecture: the SMILE nodes and the developed high-speed communication network for the FPGA RocketIO interfaces. A SystemC model developed to simulate the cluster is also detailed. In order to show the potential of the SMILE proposal a Content-Based Information Retrieval parallel application has been developed and compared with a HP cluster architecture in terms of response time and power consumption.
Random number generators play an important role in the field of cryptography and security. It is often required that a random number generator consists of digital logic blocks only, so that it can be implemented on re...
详细信息
ISBN:
(纸本)9781424438914
Random number generators play an important role in the field of cryptography and security. It is often required that a random number generator consists of digital logic blocks only, so that it can be implemented on reconfigurable platforms. Since randomness cannot be proved by statistical tests there is a need for a provably secure hardware random number generator. In order to provide a proof of security, an experimental investigation of various physical effects on reconfigurable platforms is needed. In this paper we focus on the effect of narrow transitions suppression in the logic gates. the estimation of this effect may be crucial for the validity of the security proof of a RNG design. We explain our views on how experiments on FPGA should be performed and we give description of the measurement setup. We show that up to 98% of the transitions are suppressed in our experimental FPGA setup.
this paper presents an FPGA implementation of a low cost 8bit reconfigurable processor core for media processing applications. the core is optimized to provide all basic arithmetic and logic functions required by the ...
详细信息
ISBN:
(纸本)9781424438914
this paper presents an FPGA implementation of a low cost 8bit reconfigurable processor core for media processing applications. the core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. this paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. the core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented.
this paper explores hardware acceleration to significantly improve the runtime of computing the forward algorithm on Pair-HMM models, a crucial step in analyzing mutations in sequenced genomes. We describe 1) the desi...
详细信息
ISBN:
(纸本)9789090304281
this paper explores hardware acceleration to significantly improve the runtime of computing the forward algorithm on Pair-HMM models, a crucial step in analyzing mutations in sequenced genomes. We describe 1) the design and evaluation of a novel accelerator architecture that can efficiently process real sequence data without performing wasteful work;and 2) aggressive memoization techniques that can significantly reduce the number of invocations of, and the amount of data transferred to the accelerator. We describe our demonstration of the design on a Xilinx Virtex 7 FPGA in an IBM Power8 system. Our design achieves a 14.85x higher throughput than an 8-core CPU baseline (that uses SIMD and multi-threading) and a 147.49x improvement in throughput per unit of energy expended on the NA12878 sample.
FPGAs have become complex, heterogeneous platforms targeting a multitude of different applications. Understanding how a design maps to them and consumes various FPGA resources can be difficult to predict, so typically...
详细信息
ISBN:
(纸本)9781424419609
FPGAs have become complex, heterogeneous platforms targeting a multitude of different applications. Understanding how a design maps to them and consumes various FPGA resources can be difficult to predict, so typically designers are forced to run full synthesis on each iteration of the design. For complex designs that involve many iterations and optimizations, the run-time of synthesis can be quite prohibitive. In this paper, we describe a fast and accurate method of estimating the FPGA resources of any RTL-based design. We achieve run-times that are more than 60 times faster than synthesis and is on average within 22% of the actual mapped slices across a large benchmark suite targeting three different FPGA families. this resource estimator tool is first provided in Xilinx. PlanAbead 10.1.
Convolutional neural networks (CNNs) are revolutionizing a variety of machine learning tasks, but they present significant computational challenges. Recently, FPGA-based accelerators have been proposed to improve the ...
详细信息
ISBN:
(纸本)9782839918442
Convolutional neural networks (CNNs) are revolutionizing a variety of machine learning tasks, but they present significant computational challenges. Recently, FPGA-based accelerators have been proposed to improve the speed and efficiency of CNNs. Current approaches construct an accelerator optimized to maximize the overall throughput of iteratively computing the CNN layers. However, this approach leads to dynamic resource underutilization because the same accelerator is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator design that improves the dynamic resource utilization. Using the same FPGA resources, we build multiple accelerators, each specialized for specific CNN layers. Our design achieves 1.3x higher throughput than the state of the art when evaluating the convolutional layers of the popular AlexNet CNN on a Xilinx Virtex-7 FPGA.
In this paper we investigate using low-level loop analysis to identify common loop patterns in the netlist generated by the synthesis flow and use loop optimization techniques to increase Fmax of applications implemen...
详细信息
ISBN:
(纸本)9781728148847
In this paper we investigate using low-level loop analysis to identify common loop patterns in the netlist generated by the synthesis flow and use loop optimization techniques to increase Fmax of applications implemented on Xilinx FPGAs. Ordinarily, feed-forward paths in the netlist can be easily pipelined. the focus of this study is only sequential loops (with feedback cycles) that are more challenging to optimize. We show that, using low-level loop analysis, we can improve Fmax on average by 57% and achieve an average Fmax of 714MHz across seven industrial designs. Using aggressive loop combining, we also show that we can save 18% area on average while still improving the Fmax by 15% to 41% on four of the seven designs.
FPGAs are promising platforms to efficiently execute distributed graph algorithms. Unfortunately, they are notoriously hard to program, especially when the problem size and system complexity increases. In this paper, ...
详细信息
ISBN:
(纸本)9782839918442
FPGAs are promising platforms to efficiently execute distributed graph algorithms. Unfortunately, they are notoriously hard to program, especially when the problem size and system complexity increases. In this paper, we propose GraVF, a high-level design framework for distributed graph processing on FPGAs. It leverages the vertex-centric paradigm, which is naturally distributed and requires the user to define only very small kernels and their associated message semantics for the target application. the user design may subsequently be elaborated and compiled to the target system automatically by the framework. To demonstrate the flexibility and capabilities of the proposed framework, 4 graph algorithms with distinct requirements have been implemented, namely breadth-first search, PageRank, single source shortest path, and connected component. Results show that the proposed framework is capable of producing FPGA designs with performance comparable to similar custom designs while requiring only minimal input from the user.
暂无评论