One of the obvious advantages of FPGA-based reconfigurable computing is customizability of a tradeoff point between performance and hardware costs. However, this tradeoff has rarely been discussed in a whole applicati...
详细信息
ISBN:
(纸本)9781424438914
One of the obvious advantages of FPGA-based reconfigurable computing is customizability of a tradeoff point between performance and hardware costs. However, this tradeoff has rarely been discussed in a whole application level, which is the most important view for application users. this paper presents empirical evaluation of a hardware module sharing technique which can shift a tradeoff point of area and performance on an FPGA-based biochemical simulator. the biochemical simulation results are discussed in terms of hardware costs, simulation throughput, parallelism extracted in simulation hardware, and data transfer overheads.
this paper considers the implementation of an annealing technique for dynamic power reduction in FPGAs. the proposed method comprises a power-aware objective function for placement and is implemented in a commercial t...
详细信息
ISBN:
(纸本)9781424419609
this paper considers the implementation of an annealing technique for dynamic power reduction in FPGAs. the proposed method comprises a power-aware objective function for placement and is implemented in a commercial tool. In particular, a capacitance model based on multi-dimensional nonlinear regression is described, as well as a new capacitance model for global nets. the importance and advantages of these models are highlighted in terms of the overall attainable reduction in power in a real, commercially-available architecture and tool flow. the results are quantified across a range of industrial benchmarks targeting the Actel (R) IGLOO (TM) FPGA architecture. Power measurements show that, across a suite of 120 industrial designs, the technique described in this paper reduces dynamic power by 13% on average, with only a 1% degradation in timing performance.
We present our latest FPGA acceleration card NFB-200G2QL that is specifically designed to enable traffic processing at 200 Gbps. Unique high-speed DMA engines in the FPGA together with highly optimized Linux drivers e...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
We present our latest FPGA acceleration card NFB-200G2QL that is specifically designed to enable traffic processing at 200 Gbps. Unique high-speed DMA engines in the FPGA together with highly optimized Linux drivers enable data transfer through PCIe interfaces with minimal CPU overhead. Captured traffic can be independently distributed between individual cores of two physical CPUs (NUMA nodes) without utilization of QPI. As a result, wire-speed packet capture to the host memory from two fully saturated 100 Gbps Ethernet interfaces (QSFP28+ cages) is achieved and various network monitoring applications can utilize the power of the latest FPGAs and CPUs for data processing. this is especially useful when both directions of a single 100GbE link are monitored. the live demonstration shows how the packets are received from two 100 Gbps Ethernet links at wire-speed and captured to the host memory at 200 Gbps without a loss. the opposite direction of communication is also shown, i.e. how the packets are transmitted from the host memory and fully saturate the two 100GbE network interfaces. Achieved speeds are demonstrated by counters and gauges showing generated, received/transmitted and captured packets. We also show statistics of CPU load during the packet capture/transmission for different packet lengths.
Heterogeneous platforms that include diverse architectures such as multicore CPUs, FPGAs and GPUs are becoming very popular due to their superior performance and energy efficiency. Besides heterogeneity, a promising a...
详细信息
ISBN:
(纸本)9789090304281
Heterogeneous platforms that include diverse architectures such as multicore CPUs, FPGAs and GPUs are becoming very popular due to their superior performance and energy efficiency. Besides heterogeneity, a promising approach for minimizing energy consumption is through approximate computing which relaxes the requirement that all parts of a program are considered equally important to the output quality, thus, all should be executed at full accuracy. Our work extends a traditional OpenMP-like programming model and runtime system to support seamless execution on hybrid architectures with approximation semantics. Starting from a common application code, annotated with our programming model, the programmer can not only target heterogeneous architectures comprising CPU, FPGA and GPU components, but can also regulate the amount of approximation. We evaluate our framework on a number of large-scale applications and demonstrate that the combination of heterogeneous and approximate computing can provide a powerful dynamic interplay between performance and output quality.
Understanding how FPGAs age and how to control that aging is crucial for ensuring the reliability and security of FPGAs in critical applications. Due to the proprietary nature of commercial FPGAs, it can be challengin...
详细信息
ISBN:
(纸本)9798331530082;9798331530075
Understanding how FPGAs age and how to control that aging is crucial for ensuring the reliability and security of FPGAs in critical applications. Due to the proprietary nature of commercial FPGAs, it can be challenging to validate aging models on real silicon, and most previous work has relied on circuit simulations to study the effects of FPGA aging. In this work, we leverage low-level placement and routing APIs provided by RapidWright to create a series of stressor and characterization circuits that allow us to measure the effects of aging on individual LUTs and routing resources in a 28nm FPGA. We demonstrate how these techniques allow fine-grained control of the relative aging of different FPGA resources, even to the point of aging individual paths within a single LUT. Several different aging experiments are demonstrated, and in a cumulative test, we show how different signal and LUT configurations can influence the aging rate by over 2x.
As FPGA technology and related EDA tools develop, design IP protection and licensing requires increasing consideration. the current multi-player, Partial-Reconfiguration (PR) design flow does not facilitate bitstream-...
详细信息
ISBN:
(纸本)9781424438914
As FPGA technology and related EDA tools develop, design IP protection and licensing requires increasing consideration. the current multi-player, Partial-Reconfiguration (PR) design flow does not facilitate bitstream-level IP core license enforcement, e.g, time-limited or pay-per-use. this paper proposes the use of a Secure Reconfigurable Controller (SeReCon) for accounting of IP core usage, e.g. total runtime, no. of activations etc, in a PR system. this paper extends the reported SeReCon root-of-trust to support license enforcement within the PR flow and to facilitate confidentiality of the IP core during the PR system life-cycle. A prototype IP-aware SeReCon demonstrator, implemented on Virtex-5 and supporting reconfiguration of a PCIe accelerator with cryptographic IP cores is described.
FPGAs are rising in popularity for acceleration in all kinds of systems. However, even in cloud environments, FPGA devices are typically still used exclusively by one application only. To overcome this, and as an appr...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
FPGAs are rising in popularity for acceleration in all kinds of systems. However, even in cloud environments, FPGA devices are typically still used exclusively by one application only. To overcome this, and as an approach to manage FPGA resources with OS functionality, this paper introduces the concept of resource elastic virtualization which allows shrinking and growing of accelerators in the spatial domain withthe help of partial reconfiguration. Withthis, we can serve multiple applications simultaneously on the same FPGA and optimize the resource utilization and consequently the overall system performance. We demonstrate how an implementation of resource elasticity can be realized for OpenCL accelerators along with how it can achieve 23x better FPGA utilization and 49% better performance on average while simultaneously lowering waiting time for tasks.
FPGAs can provide high performance and energy efficiency to many applications;therefore, they are attractive computing platforms in a cloud environment. However, FPGA application development requires extensive hardwar...
详细信息
ISBN:
(纸本)9782839918442
FPGAs can provide high performance and energy efficiency to many applications;therefore, they are attractive computing platforms in a cloud environment. However, FPGA application development requires extensive hardware design knowledge which significantly limits the potential user base. Moreover, in a cloud setting, allocating a whole FPGA to a user is often wasteful and not cost effective due to low device utilization. To make FPGA application development easier, firstly, we propose a methodology that provides clean abstractions with high-level APIs and a simple execution model that supports both software and hardware execution. Secondly, to improve device utilization and share the FPGA among multiple users, we developed a lightweight runtime system that provides hardware-assisted memory virtualization and memory protection, enabling multiple applications to simultaneously execute on the device.
this paper advocates the use of 3D integration technology to stack a DRAM on top of an FPGA. the DRAM will store future FPGA contexts. A configuration is read from the DRAM into a latch array on the DRAM layer while t...
详细信息
ISBN:
(纸本)9781424438914
this paper advocates the use of 3D integration technology to stack a DRAM on top of an FPGA. the DRAM will store future FPGA contexts. A configuration is read from the DRAM into a latch array on the DRAM layer while the FPGA executes;the new configuration is loaded from the latch array into the FPGA in 60ns (5 cycles). the latency between reconfigurations, 8.42 mu s, is dominated by the time to read data from the DRAM into the latch array. We estimate that the DRAM can cache 289 FPGA contexts.
Post-beamforming second order Volterra filter (SOVF) was previously introduced for decomposing the pulse echo ultrasonic radio-frequency (RF) signal into its linear and quadratic components. Using singular value decom...
详细信息
ISBN:
(纸本)9781424419609
Post-beamforming second order Volterra filter (SOVF) was previously introduced for decomposing the pulse echo ultrasonic radio-frequency (RF) signal into its linear and quadratic components. Using singular value decomposition (SVD), an optimal minimum-norm least squares algorithm for deriving the coefficients of the linear and quadratic kernels of the SOVF was developed and verified. the "Separable" implementation algorithm of a SOVF based on the eigenvalue decomposition (EVD) of the quadratic kernel was introduced and verified. In this paper, the "Separable" version of a Second Order Volterra filter is implemented in Xilinx Virtex-E FPGA. Parallel operation, efficient use of instructions per task, and data streaming capability of FPGA are identified. this implementation should allow for real-time implementation of quadratic filtering on commercial ultrasound scanners.
暂无评论