All-digital radios allow the full digitalization of the radio system which poses an important step towards the complete software description of RF signals proposed in SDR. Using digital signal processing techniques an...
详细信息
ISBN:
(纸本)9781467381239
All-digital radios allow the full digitalization of the radio system which poses an important step towards the complete software description of RF signals proposed in SDR. Using digital signal processing techniques and the integration of the radio into a single digital chip, it is expected that high flexibility in these systems will be fundamental for the next generation of wireless networks. In addition, FPGA-based architectures take advantage of the high and heterogeneous processing power of modern FPGAs as well as their dynamic configurability capabilities needed for highly flexible radio transceivers.
In image processing, FPGAs have shown very high performance in spite of their low operational frequency. This high performance comes from (1) high parallelism in applications in image processing, (2) high ratio of 8 b...
详细信息
ISBN:
(纸本)9781424419609
In image processing, FPGAs have shown very high performance in spite of their low operational frequency. This high performance comes from (1) high parallelism in applications in image processing, (2) high ratio of 8 bit operations, and (3) a large number of internal memory banks on FPGAs which can be accessed in parallel. In the recent micro processors, it becomes possible to execute SIMD instructions on 128 bit data in one clock cycle. Furthermore, these processors support multi-cores and large cache memory which can hold all image data for each core. In this paper, we compare the performance of FPGAs with those processors using three applications in image processing;two-dimensional filters, stereo-vision and k-means clustering, and make it clear how fast is an FPGA in image processing, and how many hardware resources are required to achieve the performance.
We introduce SERVE, a cloud platform for agile hardware software co-design, with cloud IDE and cloud FPGAs integrated. SERVE enables users to focus on logic designs, without facing the hassle of setting up FPGA tools ...
详细信息
ISBN:
(纸本)9781665473903
We introduce SERVE, a cloud platform for agile hardware software co-design, with cloud IDE and cloud FPGAs integrated. SERVE enables users to focus on logic designs, without facing the hassle of setting up FPGA tools and development environment. Users can write and simulate hardware logic in the cloud IDE and then generate bitstream files through a Continuous Integration (CI) pipeline. Finally, the bitstream files are deployed on an FPGA board. A great amount of testbenches will be executed to ensure the correctness of the hardware logic. We will demo a workflow of modifying a RISC-V processor and getting the design change quickly evaluated using SERVE.
The conventional matrix multiplication algorithms that are suitable for dense matrices do not perform well on the corresponding Sparse Matrix-Matrix Multiplication (SMMM) operation. In particular, they do not utilize ...
详细信息
ISBN:
(纸本)9781479900046
The conventional matrix multiplication algorithms that are suitable for dense matrices do not perform well on the corresponding Sparse Matrix-Matrix Multiplication (SMMM) operation. In particular, they do not utilize the sparsity of the matrix. This paper describes a new technique for performing the SMMM operation using a novel storage format for sparse matrices. To demonstrate the feasibility of this technique, the SMMM operation is implemented on an FPGA and various parameters that affect the performance of the design are explored.
A Virtual Private Network (VPN) encrypts and decrypts the private traffic it tunnels over a public network. Maximizing the available bandwidth is an important requirement for network applications, but the cryptographi...
详细信息
ISBN:
(纸本)9782839918442
A Virtual Private Network (VPN) encrypts and decrypts the private traffic it tunnels over a public network. Maximizing the available bandwidth is an important requirement for network applications, but the cryptographic operations add significant computational load to VPN applications, limiting the network throughput. This work presents a coprocessor designed to offer hardware acceleration for these encryption and decryption operations. The open-source SigmaVPN application is used as the base solution, and a coprocessor is designed for the parts of Networking and Cryptography library (NaCl) which underlies the cryptographic operation of SigmaVPN. The hardware-software codesign of this work is implemented on a Xilinx Zynq-7000 SoC, showing a 93% reduction in the execution time of encrypting a 1024-byte frame, and this improved the TCP and UDP communication bandwidths by a factor of 4.36 and 5.36 respectively compared to pure software solution for a 1024-byte frame.
With the introduction of the Stratix V family, the FPGA vendor Altera is now fully supporting partial reconfiguration in all their recent FPGA devices. A distinct feature in the Altera architecture is that reconfigura...
详细信息
ISBN:
(纸本)9782839918442
With the introduction of the Stratix V family, the FPGA vendor Altera is now fully supporting partial reconfiguration in all their recent FPGA devices. A distinct feature in the Altera architecture is that reconfigurable regions can be arbitrarily defined which is possible by writing a configuration mask prior to writing the actual configuration data to the FPGA fabric. In this paper, we will present details and the flow for implementing partial reconfiguration using Altera FPGAs, as well as a study on configuration bitstream sizes and configuration speeds for various resource and bounding-box aspect ratio variants. The results are used to build a partial reconfiguration controller that is featuring a lightweight but effective bitstream decompression module for greatly improving configuration speed on a DE5-net board.
Custom operators, working at custom precisions, are a key ingredient to fully exploit the FPGA flexibility advantage for high-performance computing. Unfortunately, such operators are costly to design, and application ...
详细信息
ISBN:
(纸本)9781424438914
Custom operators, working at custom precisions, are a key ingredient to fully exploit the FPGA flexibility advantage for high-performance computing. Unfortunately, such operators are costly to design, and application designers tend to rely on less efficient off-the-shelf operators. To address this issue, an open-source architecture generator framework is introduced. Its salient features are an easy learning curve from VHDL, the ability to embed arbitrary synthesizable VHDL code, portability to mainstream FPGA targets from Xilinx and Altera, automatic management of complex pipelines with support for frequency-directed pipeline, and automatic test-bench generation. This generator is presented around the simple example of a collision detector, which it significantly improves in accuracy, DSP count, logic usage, frequency and latency with respect to an implementation using standard floating-point operators.
Identifying and locating objects in images and videos, including elements like traffic signs, vehicles, buildings, and people, constitutes a fundamental and demanding task in computer vision, known as object detection...
详细信息
Fast carry chains featuring dedicated adder circuitry is a distinctive feature of modern FPGAs. The carry chains bypass the general routing network and are embedded in the logic blocks of FPGAs for fast addition. Conv...
详细信息
ISBN:
(纸本)9781424438914
Fast carry chains featuring dedicated adder circuitry is a distinctive feature of modern FPGAs. The carry chains bypass the general routing network and are embedded in the logic blocks of FPGAs for fast addition. Conventional intuition is that such carry chains can be used only for implementing carry-propagate addition;state-of-the-art FPGA synthesizers can only exploit the carry chains for these specific circuits. This paper demonstrates that the carry chains can be used to build compressor trees, i.e., multi-input addition circuits used for parallel accumulation and partial product reduction for parallel multipliers implemented in FPGA logic. The key to our technique is to program the lookup tables (LUTs) in the logic blocks to stop the propagation of carry bits along the carry chain at appropriate points. This approach improves the area of compressor trees significantly compared to previous methods that synthesized compressor trees solely on LUTs, without compromising the performance gain over trees built from ternary carry-propagate adders.
This paper discusses a novel RTL to RTL partitioner flow to better cope with the challenges of modern rapid system prototyping on Multi-FPGA sytems. The proposed system partitioner flow is timing driven to evaluate th...
详细信息
ISBN:
(纸本)9781479900046
This paper discusses a novel RTL to RTL partitioner flow to better cope with the challenges of modern rapid system prototyping on Multi-FPGA sytems. The proposed system partitioner flow is timing driven to evaluate the best achievable system performance within a given search space, reduces turn-around times if RTL code changes and enables more efficient debugging capabilities. System prototyping is often compared to emulation as alternative verification method, whereas system performance is used as one of the comparison points. This paper highlights some fundamental aspects for the achievable system performance of rapid system prototyping on Multi-FPGA systems.
暂无评论