SRAM-based fieldprogrammable gate arrays (FPGAs) have an inherent capacity for defect tolerance. A simple scheme that exploits this potential in multiple-FPGA systems is prod posed. the symmetry of the system is expl...
详细信息
SRAM-based fieldprogrammable gate arrays (FPGAs) have an inherent capacity for defect tolerance. A simple scheme that exploits this potential in multiple-FPGA systems is prod posed. the symmetry of the system is exploited to yield a large number of possible mappings of bitstreams on FPGAs, which results in a high probability that at least one functional mapping exists. It is shown that the behaviour of a system built using a large number of defective FPGAs approaches that of the ideal defect-free system. Various interconnection topologies such as the tree, the crossbar and a hybrid form are compared.
A ring oscillator physical unclonable function (RO PUF) is an application-constrained hardware security primitive that can be used for authentication and key generation. PUFs depend on variability during the fabricati...
详细信息
ISBN:
(纸本)9789090304281
A ring oscillator physical unclonable function (RO PUF) is an application-constrained hardware security primitive that can be used for authentication and key generation. PUFs depend on variability during the fabrication process to produce random outputs that are nevertheless stable across multiple measurements. Unfortunately, RO PUFs are known to be unstable especially when implemented on an fieldprogrammable Gate Array (FPGA). In this work, we comprehensively evaluate the RO PUF's stability on FPGAs, and we propose a phase calibration process to improve the stability of RO PUFs. the results show that the bit errors in our PUFs are reduced to less than 1%.
Physically unclonable functions are used for IP protection, hardware authentication and supply chain security. While many PUF constructions have been put forward in the past decade, only few of them are applicable to ...
详细信息
ISBN:
(纸本)9789090304281
Physically unclonable functions are used for IP protection, hardware authentication and supply chain security. While many PUF constructions have been put forward in the past decade, only few of them are applicable to FPGA platforms. Strict constraints on the placement and routing are the main disadvantages of the existing PUFs on FPGAs, because they place a high effort on the designer. In this paper we propose a new delay-based PUF construction called Monte Carlo PUF, that does not require low-level placement and routing control. this construction relies on the on-chip Monte Carlo method that is applied for measuring the delays of logic elements in order to extract a unique device fingerprint. the proposed construction allows a trade-off between the evaluation time and the error rate. the Monte Carlo PUF is implemented and evaluated on Xilinx Spartan-6 FPGAs.
Many CPU design houses have added dedicated support for cryptography in recent processor generations, including Intel, IBM, and ARM. While adding accelerators and/or dedicated instructions boosts performance on crypto...
详细信息
ISBN:
(纸本)9789090304281
Many CPU design houses have added dedicated support for cryptography in recent processor generations, including Intel, IBM, and ARM. While adding accelerators and/or dedicated instructions boosts performance on cryptography, we are investigating a different approach that is not adding extra silicon area: We study to replace the hardened NEON SIMD unit of an ARM Cortex-A9 with an identical sized FPGA fabric, called an interlay. this will be used for implementing cryptographic instructions in soft-logic. We show that this approach can outperform the hardened NEON by up to 7.7 x on AES and provide functionality that is not available in the hardened ARM.
Chromatic dispersion is one of the error sources limiting the transmission capacity in coherent optical communication that can be mitigated with digital signal processing. In this paper, the current status and plans o...
详细信息
ISBN:
(纸本)9781728199023
Chromatic dispersion is one of the error sources limiting the transmission capacity in coherent optical communication that can be mitigated with digital signal processing. In this paper, the current status and plans of implementation of chromatic dispersion compensation (CDC) filters on FPGAs are discussed. As these high-speed filters are most efficiently implemented in the frequency-domain, different approaches for high-speed FFT-based architectures are considered and preliminary results of fully parallel FFT implementation by utilizing FPGA hardware features are presented.
Stencil computations represent a highly recurrent class of algorithms in various high performance computing scenarios. the Streaming Stencil Time-step (SST) architecture is a recent implementation of stencil computati...
详细信息
ISBN:
(纸本)9789090304281
Stencil computations represent a highly recurrent class of algorithms in various high performance computing scenarios. the Streaming Stencil Time-step (SST) architecture is a recent implementation of stencil computations on fieldprogrammable Gate Array (FPGA). In this paper, we propose an automated framework for SST-based architectures capable of achieving the maximum performance level for a given FPGA device through 1) the maximization of basic modules instantiated in the design and 2) optimization of the design floorplanning. Experimental results show that the proposed approach reduces the design time up to 15x w.r.t. naive design space exploration approaches, and improves the performance of the 13%.
the traditional approach to FPGA packing and CLB-level placement has been shown to yield significantly worse quality than approaches which allow BLES to move during placement. In practice, however, modern FPGA archite...
详细信息
ISBN:
(纸本)9781424410590
the traditional approach to FPGA packing and CLB-level placement has been shown to yield significantly worse quality than approaches which allow BLES to move during placement. In practice, however, modern FPGA architectures require expensive DRC checks which can render full BLE-level placement impractical. We address this problem by proposing a novel clustering framework that uses physical information to produce better initial packings which can, in turn, reduce the amount Of BLE-level placement that is required. We quantify our packing technique across accepted benchmarks and show that it produces results with16% less wire length, 19% smaller minimum channel widths, and 8% less critical path delay, on average, than leading methods.
FPGA hardware accelerators have recently enjoyed significant attention as platforms for further accelerating computation in the datacenter but they potentially add additional layers of hardware and software interfacin...
详细信息
ISBN:
(纸本)9781728199023
FPGA hardware accelerators have recently enjoyed significant attention as platforms for further accelerating computation in the datacenter but they potentially add additional layers of hardware and software interfacing that can further increase communication latency. In this paper, we characterize these overheads for streaming applications where latency can be an important consideration. We examine the latency and throughput characteristics of traditional server-based PCIe connected accelerators, and the more recent approach of network attached FPGA accelerators. We additionally quantify the additional overhead introduced by virtualising accelerators on FPGAs.
Finding placement locations for modules on an FPGA in a limited amount of time is a crucial task that determines the efficiency of a dynamic partially reconfigurable system. In this work, we will define a placement me...
详细信息
ISBN:
(纸本)9781467381239
Finding placement locations for modules on an FPGA in a limited amount of time is a crucial task that determines the efficiency of a dynamic partially reconfigurable system. In this work, we will define a placement method based on transforming the inherent two dimensional (2D) structure of the FPGA into a one dimensional string and employing string matching. Moreover, our model is suited to compute a module placement over multiple chained reconfigurable regions. Our algorithm is based on a hybrid approach consisting of an offline precompute phase at design-time which in turn is used to speed-up module placement at run-time.
暂无评论