A real-time global stereo matching algorithm is implemented on fpga. Stereo matching is frequently used in stereo vision systems, e.g. for stereo vision applications like objects detection and autonomous vehicles. Glo...
ISBN:
(纸本)9781450338561
A real-time global stereo matching algorithm is implemented on fpga. Stereo matching is frequently used in stereo vision systems, e.g. for stereo vision applications like objects detection and autonomous vehicles. Global algorithms perform much more significant than local algorithms, but global algorithms are not implemented on fpga by reason of rely on the high-end hardware resources. In this implementation the stereo pairs are divided into blocks, the hardware resources are reduced by processing one block once. The hardware implementation is based on a Xilinx®Kintex 7 fpga. Experiment results show the implementation performances significant and 30 [email protected] is achieved.
RapidSmith is an open-source framework that allows for the exploration of novel approaches to the fpga CAD flow for Xilinx devices. However, RapidSmith has poor sup- port for manipulating designs below the slice level...
详细信息
Combining multi-processing with the high level of configurability possible with fpga-based soft-processors, this paper presents a multiprocessing framework based on the MicroBlaze soft-processor that provides multicor...
详细信息
ISBN:
(纸本)9781450333153
Combining multi-processing with the high level of configurability possible with fpga-based soft-processors, this paper presents a multiprocessing framework based on the MicroBlaze soft-processor that provides multicore support and fully coherent, independently configurable Level 1 Caches with Linux multicore support. This architecture allows for finegrain configurability of the system, allowing for fpga resources to be better optimized for a specific embedded application. We use our framework to explore the L1 Data Cache configuration, developing a metric for efficiency based on resource usage and static application runtime. We find that a Pseudo-Random replacement policy is consistently the more efficient choice for fpga systems.
This paper provides a novel technique for testing fpga local interconnects based on repeatable configuration modules (RCMs). In order to fully detect all the possible faults, local interconnects together with the adja...
详细信息
ISBN:
(纸本)9781450338561
This paper provides a novel technique for testing fpga local interconnects based on repeatable configuration modules (RCMs). In order to fully detect all the possible faults, local interconnects together with the adjacent logic blocks in an fpga are programmed to form a set of RCMs that are repeatable all over the fpga array. After the RCMs for configurable logic blocks (CLBs) and other types of embedded cores (such as digital signal processor, block random access memory) are constructed, test configurations are generated by connecting the RCMs one by one throughout the whole fpga array. The number of test configurations depends on the structure of the fpga and the exact types of hard cores inside the fpga. Experimental results show that a total of 47 test configurations are sufficient to achieve 96.2% fault coverage for Xilinx XC4VLX200 fpga local interconnects. This project is supported by the State Key Laboratory of ASIC and System, Fudan University, No. 2015MS007.
This tutorial describes tools for efficiently implementing floating point applications on fpgas. We present both the SDK for OpenCL and DSP Builder Advanced Blockset and show that they can be effectively used to imple...
详细信息
This work describes the architecture of a new fpga DSP block supporting both fixed and floating point arithmetic. Each DSP block can be configured to provide one single precision IEEE-754 floating multiplier and one I...
详细信息
The personal computer market grew exponentially in the 1980's for vendors such as Apple, Microsoft, and Intel when there was a healthy mix of software, tools, and microprocessor devices. At the time, killer applic...
详细信息
ISBN:
(纸本)9781450333153
The personal computer market grew exponentially in the 1980's for vendors such as Apple, Microsoft, and Intel when there was a healthy mix of software, tools, and microprocessor devices. At the time, killer applications that drove the market were spreadsheets, compilers, and games that ran on the personal computer. Thirty years later, we now have a similar opportunity to grow a healthy ecosystem as developers and vendors bring killer applications, tools, and programmable logic devices to the market to accelerate datacenters for cloud computing. Copyright is held by the author/owner(s).
fieldprogrammablegatearrays (fpgas) are well-established as fine-grained hardware reconfigurable computing platforms. However, fpga energy usage is dominated by programmable interconnects, which have poor scalabili...
详细信息
ISBN:
(纸本)9781450338561
fieldprogrammablegatearrays (fpgas) are well-established as fine-grained hardware reconfigurable computing platforms. However, fpga energy usage is dominated by programmable interconnects, which have poor scalability across different technology generations. In this work, we propose ENFIRE, a novel, energy-efficient, fine-grained, spatio-temporal, memory-based reconfigurable computing framework that provides the flexibility of bit-level information processing, which is not available in conventional coarse-grain reconfigurable architectures (CGRAs). A dense two-dimensional memory array is the main computing element in the proposed framework, which stores not only the data to be processed, but also the functional behavior of a mapped application in the form of lookup tables (LUTs) of various input/output sizes. Spatially distributed configurable computing elements (CEs) communicate with each other based on data dependencies using a mesh network, while execution inside each CE occurs in a temporal manner. A custom software framework has also been co-developed which enables application mapping to a set of CEs. By finding the right balance between spatial and temporal computing, it can achieve a highly energy-efficient mapping, significantly reducing the programmable interconnect overhead when compared with fpga. Simulation results show an improvement of 7.6X in overall energy, 1.6X in energy efficiency, 1.1X in leakage energy, and 5.3X in Unified Energy-Efficiency, a metric that considers energy and area together, compared with comparable fpga implementations for a set of random logic benchmarks.
Serial arithmetic has been shown to offer attractive advantages in area, clock frequency, and functional density for fpga datapaths but suffers from a significant reduction in throughput compared to traditional bit-pa...
详细信息
ISBN:
(纸本)9781450338561
Serial arithmetic has been shown to offer attractive advantages in area, clock frequency, and functional density for fpga datapaths but suffers from a significant reduction in throughput compared to traditional bit-parallel designs that is prohibitive for many applications. In this work, we present a full-bandwidth SerDes architecture specialized for Xilinx fpgas that enables serial pipelines to accept inputs and generate outputs at the same rate as bit-parallel pipelines. When combined with the clock improvements from serial pipelines, we show that this approach offers more than 2.1x average increase in throughput compared to bit-parallel pipelines. Although previous work has shown that serial pipelines can achieve similar results for some limited situations, the key contribution of this work is the ability to replace potentially any existing fpga pipeline with a higher throughput serialized alternative. We also present a serialized sliding-window architecture that improves throughput up to 4x.
Due to rapidly expanding data size, there is increasing need for scalable, high-performance, and low-energy frameworks for large- scale data computation. We build a dataflow architecture that harnesses fpga resources ...
详细信息
ISBN:
(纸本)9781450338561
Due to rapidly expanding data size, there is increasing need for scalable, high-performance, and low-energy frameworks for large- scale data computation. We build a dataflow architecture that harnesses fpga resources within a distributed analytics platform creating a heterogeneous data analytics framework. This approach leverages the scalability of existing distributed processing environments and provides easy access to custom hardware accelerators for large-scale data analysis. We prototype our framework within the Apache Spark analytics tool running on a CPU-fpga heterogeneous cluster. As a specific application case study, we have chosen the MapReduce paradigm to implement a multi-purpose, scalable, and customizable RTL accelerator inside the fpga, capable of incorporating custom High-Level Synthesis (HLS) MapReduce kernels. We demonstrate how a typical MapReduce application can be simply adapted to our distributed framework while retaining the scalability of the Spark platform.
暂无评论