Withthe introduction of Zynq FPGAs that provide an ARM SoC with an attached FPGA fabric, it is possible to build complex software-centric systems that are software and hardware programmable. To harness the full poten...
详细信息
ISBN:
(纸本)9781728148847
Withthe introduction of Zynq FPGAs that provide an ARM SoC with an attached FPGA fabric, it is possible to build complex software-centric systems that are software and hardware programmable. To harness the full potential of this approach, we developed FOS an FPGA Operating System which is built on open-source FPGA community and Xilinx vendor components. A distinct feature shown in this demo is a heterogeneous resource elastic scheduler that can dynamically and automatically adjust the allocation of tasks to hardware and software resources with respect to the present load scenario. We will also show the FOS ecosystem that allows easily implementing relocatable partially reconfigurable modules directly from RTL or HLS.
Torque sensor is a precision measuring instrument that measures various torques, speeds and mechanical powers. To improve the measurement accuracy and stability, a torsion sensor electrical signal acquisition system w...
详细信息
A fixed point divider is needed for determining the result of division up to a fixed number of points in its fractional part. the divider does so with a good accuracy so that the result can be used for further applica...
详细信息
True Random Number Generators (TRNGs) are essential in all security systems. Unfortunately, large design effort is required to ensure that a TRNG design on a fieldprogrammable Gate Array (FPGA) generates a sufficient ...
详细信息
ISBN:
(纸本)9781728148847
True Random Number Generators (TRNGs) are essential in all security systems. Unfortunately, large design effort is required to ensure that a TRNG design on a fieldprogrammable Gate Array (FPGA) generates a sufficient entropy density at its output. this design effort relates to the fact that for each FPGA family a manual placement and routing procedure has to be executed. On top of this often comes the additional effort of finding a suitable location inside the target FPGA. this searching procedure has to be repeated for every device separately. In this demo, we show the working of a novel entropy source for the Coherent Sampling Ring Oscillator (COSO) based TRNG. this entropy source eliminates the need for any manual intervention during the implementation process. It generates two oscillating signals that can be matched with a precision of a few picoseconds. A controller regulates this entropy source based on some predefined bounds on the period length difference of the two oscillating signals.
Most internal FPGA debug methods require the use of Block-RAM (BRAM) memory for trace buffers. Recent work has shown the viability of replacing BRAMs with distributed, LUT based memory. Distributed memory (DIME) trace...
详细信息
ISBN:
(纸本)9781728148847
Most internal FPGA debug methods require the use of Block-RAM (BRAM) memory for trace buffers. Recent work has shown the viability of replacing BRAMs with distributed, LUT based memory. Distributed memory (DIME) trace buffers are lean and can be utilized in large designs where other debug methods are unlikely to fit. Since LUTs are abundant on FPGA devices, there are nearly always some left unused after the user's design is placed, even for designs that utilize more than 90% of the FPGA's resources. DIME trace buffers are inserted into highly utilized designs within minutes using RapidWright. In this paper we contrast the previously used method of scavenging leftover LUT resources with a preallocation scheme that ensures a certain amount of memory LUTs are left available for distributed memory trace buffers. While causing virtually no penalty to the user design, preallocating memory LUT resources allows the very largest designs to utilize higher numbers of distributed memory trace buffers at lower timing penalties. We also show that depth of DIME trace buffers can be extended from 16 to 256 bits.
In order to keep an HPC cluster viable in terms of economy, serious cost limitations on the hardware and software deployment should be considered, prompting researchers to reconsider the design of modern HPC platforms...
详细信息
ISBN:
(数字)9783030343569
ISBN:
(纸本)9783030343569;9783030343552
In order to keep an HPC cluster viable in terms of economy, serious cost limitations on the hardware and software deployment should be considered, prompting researchers to reconsider the design of modern HPC platforms. In this paper we present a cross-layer communication architecture suitable for emerging HPC platforms based on heterogeneous multiprocessors. We propose simple hardware primitives that enable protected, reliable and virtualized, user-level communication that can easily be integrate in the same package withthe processing unit. Using an efficient user-space software stack the proposed architecture provides efficient, low-latency communication mechanisms to HPC applications. Our implementation of the MPI standard that exploits the aforementioned capabilities delivers point-to-point and collective primitives with low overheads, including an eager protocol with end-to-end latency of 1.4 mu s. We port and evaluate our communication stack using real HPC applications in a cluster of 128 ARMv8 processors that are tightly coupled with FPGA logic. the network interface primitives occupy less than 25% of the FPGA logic and only 3 Mbits of SRAM while they can easily saturate the 16 Gb/s links in our platform.
the FPGAs have proven their efficiency over other digital hardware platforms and DSP processors, and still top them especially for real time applications. this is due to highly configurable hardware that the FPGAs are...
详细信息
ISBN:
(纸本)9789492859143
the FPGAs have proven their efficiency over other digital hardware platforms and DSP processors, and still top them especially for real time applications. this is due to highly configurable hardware that the FPGAs are built from, since they consist of very large numbers of reconfigurable gate structures. the other very important features of FPGAs are MIPS (Millions of instructions that can be executed per second) and MMACS (Millions of Multiply-Accumulate Operations per Second). In spite of all this finesse, however, there is a big drawback with huge data processing according to the limited number of I/O pins of FPGA processors that restrict the number of data samples to be processed simultaneously. Different methods are suggested to resolve this problem such as segmentation techniques or assistance of external RAM. Professional intelligent systems can determine the most suitable technique for huge data processing depending on experience knowledge based on an extended data base about segmentation techniques and partitioning algorithms. FPGA based digital system design using different Xilinx processors, also requires an extensive experience about these processors, their special features and capabilities with full information of available resources for each processor. So, a smart expert system with its comprehensive data base, can give a perfect decision pointing into two directions of processing strategies. the first one is to choose the efficient processing technique for specific huge data application systems. the second direction is to determine the most suitable Xilinx processor that its available resources and processing criteria, such as speed and operation conditions, match specific application requirements without any loss for processor components and utilities. An expert FPGA design is adopted in this paper, depending on the features and possibilities of the used FPGA kit, as well as the demanded features of the implemented system to make a genius segmentatio
Network security is increasing in importance as systems become more interconnected. Much research has been conducted on large appliances for network security, but these do not scale well to lightweight systems such as...
详细信息
ISBN:
(纸本)9781728148847
Network security is increasing in importance as systems become more interconnected. Much research has been conducted on large appliances for network security, but these do not scale well to lightweight systems such as those used in the Internet of things (IoT). Meanwhile, the low power processors used in IoT devices do not have the required performance for detailed packet analysis. We present an approach for network intrusion detection using neural networks, implemented on FPGA SoC devices that can achieve the required performance on embedded systems. the design is flexible, allowing model updates in order to adapt to emerging attacks.
Commodity FPGA boards with advanced networking facilities have great potential in the construction of high-performance compute clusters that scale. However, low-level design tools and long synthesis times are major ba...
详细信息
ISBN:
(纸本)9781728148847
Commodity FPGA boards with advanced networking facilities have great potential in the construction of high-performance compute clusters that scale. However, low-level design tools and long synthesis times are major barriers to productivity for application developers. In this paper, we explore the potential of a distributed soft-processor overlay, programmed in software at a high-level of abstraction, to deliver a useful level of performance for FPGA clusters. In particular, we demonstrate the use of hardware multhreading to achieve a fast, space-efficient, high-throughput overlay, and compare a 12-FPGA instance of it (12,288 RISC-V threads) against a conventional Xeon cluster on the problem of distributed graph processing.
In this paper we investigate using low-level loop analysis to identify common loop patterns in the netlist generated by the synthesis flow and use loop optimization techniques to increase Fmax of applications implemen...
详细信息
ISBN:
(纸本)9781728148847
In this paper we investigate using low-level loop analysis to identify common loop patterns in the netlist generated by the synthesis flow and use loop optimization techniques to increase Fmax of applications implemented on Xilinx FPGAs. Ordinarily, feed-forward paths in the netlist can be easily pipelined. the focus of this study is only sequential loops (with feedback cycles) that are more challenging to optimize. We show that, using low-level loop analysis, we can improve Fmax on average by 57% and achieve an average Fmax of 714MHz across seven industrial designs. Using aggressive loop combining, we also show that we can save 18% area on average while still improving the Fmax by 15% to 41% on four of the seven designs.
暂无评论