A novel routing fabric is introduced that offers high flexibility at significant lower silicon cost compared to routing fabrics currently incorporated in many fieldprogrammable Gate Array (FPGA) devices, IP cores, an...
详细信息
ISBN:
(纸本)9781424410590
A novel routing fabric is introduced that offers high flexibility at significant lower silicon cost compared to routing fabrics currently incorporated in many fieldprogrammable Gate Array (FPGA) devices, IP cores, and IP-core wrappers. the novel fabric is entirely constructed from multiplexers and unidirectional point-to-point connections, controlled by configuration bits, and proves very efficient when mapping applications. For a fabric connecting 4-input Look-Up-Tables, area savings of 60% are demonstrated when routing applications from the MCNC benchmark set.
Long distance interconnect delays are not scaling well with process technology, thereby leading to long routes strongly impacting the critical path of large FPGA designs. this forces the designer to pipeline long conn...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
Long distance interconnect delays are not scaling well with process technology, thereby leading to long routes strongly impacting the critical path of large FPGA designs. this forces the designer to pipeline long connections, which necessitates time consuming logic redesign in traditional latency sensitive systems. Latency-insensitive design (LID) is an increasingly attractive alternative as the typical latency of long distance interconnect grows, since LID decouples the design of the interconnect from that of the computational modules. By doing so, LID simplifies timing closure, improves forward compatibility (migration of systems to future FPGAs) and makes automated system-level pipelining feasible. Modern FPGAs, such as Stratix 10 which includes pipelined interconnect, make it difficult to use traditional LID solutions without significant area and frequency overhead. We present two LID styles that are more suitable for FPGAs and compare them to traditional LID. Our best system gained 2x area efficiency and 18% speed efficiency over traditional LID. Additionally, our designs come at a minimal speed overhead of only 3% compared to that of a latency-sensitive design.
Inserting soft logic analyzers into FPGA circuits is a common way to provide signal visibility at run-time, helping users locate bugs in their designs. However, this can become infeasible for highly (70-90+%) utilized...
详细信息
ISBN:
(数字)9781538685174
ISBN:
(纸本)9781538685174
Inserting soft logic analyzers into FPGA circuits is a common way to provide signal visibility at run-time, helping users locate bugs in their designs. However, this can become infeasible for highly (70-90+%) utilized designs, which leave few logic resources or block RAMs available for internal logic analyzers. this paper presents a fast, low-impact method of enabling signal visibility in these situations using LUT-based distributed memory. Trace-buffers are inserted post-PAR allowing users to quickly change the set of observed nets. Results from routing-based experiments are presented which demonstrate that, even in highly utilized designs, many design signals can be observed withthis technique.
FPGAs are being incorporated into contemporary datacenters in order to improve computational capacity, power consumption, and processing latency. Efficiently integrating FPGAs in datacenters is, however, quite challen...
详细信息
ISBN:
(纸本)9789090304281
FPGAs are being incorporated into contemporary datacenters in order to improve computational capacity, power consumption, and processing latency. Efficiently integrating FPGAs in datacenters is, however, quite challenging. Ideally, smaller tasks could share a device and the cloud management layer would be able to partially reconfigure the device to allocate its free resources to incoming tasks. Moreover, to facilitate FPGA hardware upgrades without undue porting effort for previously developed accelerator tasks, the complexities associated with board-specific system-level integration should be abstracted away from designers. By meeting these requirements, FPGAs in the cloud would become multi-user virtualized resources with increased availability and elasticity. the virtualization of FPGAs, however, comes with two major costs in current FPGAs: lower application operating frequency, and extravagant use of routing resources. In this paper, we quantify the costs of FPGA virtualization and demonstrate that for an FPGA that supports four independent tasks, virtualization reduces the task average frequency by 18% to 46% and increases wire usage to 2.6x. We also investigate the cause of these costs and show that the use of hard NoCs in future datacenter-optimized FPGAs would facilitate FPGA virtualization without sacrificing operating frequency or routing resources.
SRAM-based fieldprogrammable gate arrays (FPGAs) have an inherent capacity for defect tolerance. A simple scheme that exploits this potential in multiple-FPGA systems is prod posed. the symmetry of the system is expl...
详细信息
SRAM-based fieldprogrammable gate arrays (FPGAs) have an inherent capacity for defect tolerance. A simple scheme that exploits this potential in multiple-FPGA systems is prod posed. the symmetry of the system is exploited to yield a large number of possible mappings of bitstreams on FPGAs, which results in a high probability that at least one functional mapping exists. It is shown that the behaviour of a system built using a large number of defective FPGAs approaches that of the ideal defect-free system. Various interconnection topologies such as the tree, the crossbar and a hybrid form are compared.
We present the first open-source TensorFlow to FPGA tool capable of running state-of-the-art DNNs. Running TensorFlow on the Amazon cloud FPGA instances, we provide competitive performance and higher accuracy compared...
详细信息
ISBN:
(纸本)9781728148847
We present the first open-source TensorFlow to FPGA tool capable of running state-of-the-art DNNs. Running TensorFlow on the Amazon cloud FPGA instances, we provide competitive performance and higher accuracy compared to a proprietary tool, thus providing a public framework for research exploration in the DNN inference space. We also detail the optimizations needed to map modern DNN frameworks to FPGAs, provide novel analysis of design tradeoffs for FPGA DNN accelerators and present experiments across a range of DNNs.
Flow-in-Cloud(FiC) is an acceleration platform designed to make a virtual monolithic large FPGA image from a number of mid-range economical FPGAs. We will show the live demonstration of the acceleration example of FiC...
详细信息
ISBN:
(纸本)9781728148847
Flow-in-Cloud(FiC) is an acceleration platform designed to make a virtual monolithic large FPGA image from a number of mid-range economical FPGAs. We will show the live demonstration of the acceleration example of FiC with 24 boards through the network.
this paper presents the FISH (FPGA-Initiated Software-Handled) framework which allows FPGA accelerators to make system calls to the Linux operating system in CPU-FPGA systems. A special FISH Linux kernel module runnin...
详细信息
ISBN:
(纸本)9789090304281
this paper presents the FISH (FPGA-Initiated Software-Handled) framework which allows FPGA accelerators to make system calls to the Linux operating system in CPU-FPGA systems. A special FISH Linux kernel module running on the CPU provides a system call interface for FPGA accelerators, much like the ABI which exists for software programs. We provide a proofof-concept implementation of this framework running on the Intel Cyclone V SoC device, and show that an FPGA accelerator can seamlessly make system calls as if it were the host program. We see the FISH framework being especially useful for high-level synthesis (HLS) by making it possible to synthesize software code that contains system calls.
A True Random Number Generator (TRNG) is an essential component for security applications of FPGAs. Its requirements include small logic area, high throughput, sufficient randomness backed with a mathematical model, a...
详细信息
ISBN:
(纸本)9781728199023
A True Random Number Generator (TRNG) is an essential component for security applications of FPGAs. Its requirements include small logic area, high throughput, sufficient randomness backed with a mathematical model, and feasibility - ease of implementation. this paper focuses on TRNGs based on a Transition Effect Ring Oscillator (TERO) and presents a three-path configurable TERO (TC-TERO), an improved implementation of TERO that achieves high feasibility with a minimal amount of hardware. According to the evaluation with a Xilinx Artix-7 FPGA, a TC-TERO with a 20-bit configurable parameter only required 40 LUTs. By selecting one of the promising parameters, the proposed TRNG passed AIS-31 Procedure A without post-processing and NIST SP 800-22 with a simple debiasing.
暂无评论