An ATCA-based computation platform for data acquisition and trigger applications in nuclear and particle physics experiments has been developed. Each Compute Node (CN) which appears as a field Replaceable Unit (FRU) i...
详细信息
ISBN:
(纸本)9781424419609
An ATCA-based computation platform for data acquisition and trigger applications in nuclear and particle physics experiments has been developed. Each Compute Node (CN) which appears as a field Replaceable Unit (FRU) in an ATCA shelf, features 5 Xilinx Virtex-4 FX60 FPGAs and up to 10 GBytes DDR2 memory. Connectivity is provided with 8 optical links and 5 Gigabit Ethernet ports, which are mounted on each board to receive data from detectors and forward results to outer shelves or PC farms with attached mass storage. Fast point-to-point on-board interconnections between FPGAs as well as the full-mesh shelf backplane provide flexibility and high bandwidth to partition algorithms and correlate results among them. the system represents a highly reconfigurable and scalable solution for multiple applications.
In the last years, aside from fine-grained reconfigurable architectures such as FPGAs, coarse-grained reconfigurable architectures (CGRAs), which typically have building blocks of a fixed bit-width (8 bit, 16 bit, etc...
详细信息
ISBN:
(纸本)9781424419609
In the last years, aside from fine-grained reconfigurable architectures such as FPGAs, coarse-grained reconfigurable architectures (CGRAs), which typically have building blocks of a fixed bit-width (8 bit, 16 bit, etc.), have gained in importance in academia as well as in industry. CGRAs are usually used for domain-specific computations and have advantages over traditional FPGAs in terms of area and power cost, performance, and reconfiguration time. thus, architectures with coarse-grained reconfiguration features have also been studied in projects (Sec. 1, 2, 4) within the priority program Reconfigurable Computing Systems and the project CoMap (Sec. 3), which are all sponsored by the German science foundation.
Withthe advent of machine learning as perhaps the most high-profile application area for FPGAs, there is a compelling reason to improve the provision of smaller precision arithmetic on these devices. INT8 is commonly...
详细信息
ISBN:
(纸本)9781728148847
Withthe advent of machine learning as perhaps the most high-profile application area for FPGAs, there is a compelling reason to improve the provision of smaller precision arithmetic on these devices. INT8 is commonly used for AI inferencing, and along with some additional soft logic for exponent handling, can be an effective solution for training as well. this paper describes techniques for efficiently extracting INT8 multipliers from commonly available INT18 multipliers found in many modern FPGAs. A small amount of soft logic - as little as 7 ALMs per INT8 multiplier is required to provide pre or post multiplier correction to calculate two INT8 multiplies from a single 18x18 multiplier. We present two configurations for both signed and unsigned representations where two multiplications share one input operand. In addition to the individual INT8 variants, we present full device cases of 22,400 INT8 multipliers organized as DOT32 product arrays, withthe soft logic tightly bound to the INT18 based DSP Blocks. A majority of the soft logic and routing in the device is left untouched, and available for application development.
ZytleBot is an autonomous driving robot with an FPGA-integrated development platform that uses the Xilinx programmable system-on-chip (SoC). ZytleBot can run a course, turn right/left at intersections, avoid obstacles...
详细信息
ISBN:
(纸本)9781728148847
ZytleBot is an autonomous driving robot with an FPGA-integrated development platform that uses the Xilinx programmable system-on-chip (SoC). ZytleBot can run a course, turn right/left at intersections, avoid obstacles, detect traffic signals, and stop. All judgments and calculations necessary for driving are performed on the embedded system mounted on the robot. In ZytleBot, the main autonomous driving system uses the Robot Operating System (ROS) running on a CPU, and high-load processing is offloaded to the FPGA to enable real-time operation. the FPGA preprocesses road surface images acquired from the camera and detects traffic signals. We demonstrate the running of ZytleBot on a miniature course to win the FPT' 18 FPGA design competition(1). We also provide ZytleBot as a platform for the efficient development of FPGA-integrated ROS robots.
In this paper we present a framework for the seamlessly utilization of hardware accelerators in heterogeneous SoCs that are used to speedup the processing of Spark data analytics applications.
ISBN:
(纸本)9789090304281
In this paper we present a framework for the seamlessly utilization of hardware accelerators in heterogeneous SoCs that are used to speedup the processing of Spark data analytics applications.
We propose embedding hard NoCs on FPGAs to improve system-level communication as detailed in our previous studies [1-6]. this demo paper outlines the three main design and simulation tools that we have been using to e...
详细信息
ISBN:
(纸本)9781467381239
We propose embedding hard NoCs on FPGAs to improve system-level communication as detailed in our previous studies [1-6]. this demo paper outlines the three main design and simulation tools that we have been using to experiment with Embedded NoCs on FPGAs.
Modelling the interactions of biological molecules, or docking, is critical both to understanding basic life processes and to designing new drugs. the fieldprogrammable gate array (FPGA) based acceleration of a recen...
详细信息
Modelling the interactions of biological molecules, or docking, is critical both to understanding basic life processes and to designing new drugs. the fieldprogrammable gate array (FPGA) based acceleration of a recently developed, complex, production docking code is described. the authors found that it is necessary to extend their previous three-dimensional (3D) correlation structure in several ways, most significantly to support simultaneous computation of several correlation functions. the result for small-molecule docking is a 100-fold speed-up of a section of the code that represents over 95% of the original run-time. An additional 2% is accelerated through a previously described method, yielding a total acceleration of 36x over a single core and 10x over a quad-core. this approach is found to be an ideal complement to graphics processing unit (GPU) based docking, which excels in the protein-protein domain.
Microphone arrays are able to recognize, profile and locate sound-sources in noisy environments, but their quality is determined by the number of microphones. A higher number of microphones increases the computational...
详细信息
ISBN:
(纸本)9782839918442
Microphone arrays are able to recognize, profile and locate sound-sources in noisy environments, but their quality is determined by the number of microphones. A higher number of microphones increases the computational demand, making real-time response challenging. In this demo, we present a scalable and runtime reconfigurable architecture able to support a variable number of microphones and orientations in order to provide accurate sound-source localization in real-time.
暂无评论