this paper presents a Zynq capable version of GNU Radio - an open-source rapid radio deployment tool - with an enhanced flow that utilizes the processing capability of FPGAs. this work features TFlow - an FPGA back-en...
详细信息
this paper presents a Zynq capable version of GNU Radio - an open-source rapid radio deployment tool - with an enhanced flow that utilizes the processing capability of FPGAs. this work features TFlow - an FPGA back-end compilation accelerator for instant FPGA assembly. the Xilinx Zynq FPGA architecture integrates the FPGA fabric and CPU onto a single chip, which eliminates the need for a controlling host computer; thus, providing a single, portable, low-power, embedded platform. By exploiting the computational advantages of FPGAs in the GNU Radio flow, a larger class of software defined radios can be implemented. Once the FPGA is programmed with a design, modules can be parameterized to realize an even larger class of applications and further solidify the concept of rapid assembly of software defined radios.
We introduce a library for the productive development of image processing accelerators using C-based high-level synthesis. the key concept of our approach is to provide a set of generic building blocks that is applica...
详细信息
We introduce a library for the productive development of image processing accelerators using C-based high-level synthesis. the key concept of our approach is to provide a set of generic building blocks that is applicable to a multitude of image processing applications. An efficient memory architecture that facilitates easy integration of point and local image processing operators is the centerpiece of the library. the generic building blocks are kept very compact and can be tailored to support sophisticated processing techniques. the representation enables the designer to comply with specific design requirements, such as stringent timing constraints or limited resource budgets. Results show a significant gain in productivity compared to hand coded implementation while delivering comparable performance and resource requirements.
An approach to estimate the performance of FPGA architectures is proposed based on semi-supervised model tree algorithm. the proposed approach avoids synthesizing, mapping, packing, placing and routing, which are esse...
详细信息
An approach to estimate the performance of FPGA architectures is proposed based on semi-supervised model tree algorithm. the proposed approach avoids synthesizing, mapping, packing, placing and routing, which are essential steps in a traditional flow to obtain the performance of FPGA. thus it is time efficient while the performance predicted maintains quite close to the result obtained through the traditional method (a tool flow called VTR). this can be utilized effectively during the early FPGA design stage to choose an optimal architecture under a certain metric. Comparisons are made between the performance obtained by the proposed approach and by VTR on a commercial 40nm technology. Results show that the proposed approach has MRE below 7.62% compared to VTR, and improves the time cost by thousands of times when utilized in architecture design space exploration.
Graph mining is an important research area within the domain of data mining. One of the most challenging tasks of graph mining is frequent subgraph mining. this work presents the first FPGA-based implementation, to th...
详细信息
Graph mining is an important research area within the domain of data mining. One of the most challenging tasks of graph mining is frequent subgraph mining. this work presents the first FPGA-based implementation, to the best of our knowledge, of the most efficient and well-known algorithm for the Frequent Subgraph Mining (FSM) problem, i.e. gSpan. the proposed system, named High Performance Computing-gSpan (HPC-gSpan), achieves manyfold speedup vs. the official software solution of the gboost library when executed on a high-end CPU for various real-world datasets.
High-speed and energy-efficient computations are mandatory in the financial and insurance industry to survive in competition and meet the federal reporting requirements. On a hybrid CPU/FPGA system we propose a modula...
详细信息
High-speed and energy-efficient computations are mandatory in the financial and insurance industry to survive in competition and meet the federal reporting requirements. On a hybrid CPU/FPGA system we propose a modular pricing engine and derive a novel algorithmic extension able to exploit online dynamic reconfiguration. the result is a high-performance and energy-efficient pricing system suitable for exotic option pricing in the state-of-the-art Heston market model. Withthe online reconfiguration extension our hybrid pricing system is nearly two orders of magnitude faster than high-end Intel CPUs, while consuming the same power.
Mapping complex mathematical expressions to DSP blocks through standard inference from pipelined code is inefficient and results in significantly reduced throughput. In this paper, we demonstrate the benefit of consid...
详细信息
Mapping complex mathematical expressions to DSP blocks through standard inference from pipelined code is inefficient and results in significantly reduced throughput. In this paper, we demonstrate the benefit of considering the structure and pipeline arrangement of DSP blocks during mapping. We have developed a tool that can map mathematical expressions using RTL inference, through high level synthesis with Vivado HLS, and through a custom approach that incorporates DSP block structure. We can show that the proposed method results in circuits that run at around double the frequency of other methods, demonstrating that the structure of the DSP block must be considered when scheduling complex expressions.
Partial reconfiguration allows some applications to substantially save FPGA area by time sharing resources among multiple modules. In this paper, we push this approach further by introducing hierarchical reconfigurati...
详细信息
Partial reconfiguration allows some applications to substantially save FPGA area by time sharing resources among multiple modules. In this paper, we push this approach further by introducing hierarchical reconfiguration where reconfigurable modules can have reconfigurable submodules. this is useful for complex systems where many modules have common parts or where modules can share components. For such systems, we show that the number of bitstreams and the bitstream storage requirements can be scaled down from a multiplicative to an additive behavior with respect to the number of modules and submodules. A case study consisting of different reconfigurable softcore CPUs and hierarchically reconfigurable custom instruction set extensions demonstrates a 18.7× lower bitstream storage requirement and up to 10× faster reconfiguration speed when using hierarchical reconfiguration instead of using conventional single-level module-based reconfiguration.
Heterogeneous Multiprocessor System-on-Chip (Ht-MPSoC) architectures represent a promising approach as they allow a higher performance/energy consumption trade-off. In such systems, the processor instruction set is en...
详细信息
Heterogeneous Multiprocessor System-on-Chip (Ht-MPSoC) architectures represent a promising approach as they allow a higher performance/energy consumption trade-off. In such systems, the processor instruction set is enhanced by application-specific custom instructions implemented on reconfigurable fabrics, namely FPGA. To increase area utilization and guarantee application constraint respect, we propose a new architecture where Ht-MPSoC hardware accelerators are shared among different processors in an intelligent manner. In this paper, a Mixed Integer Linear Programming (MILP) model is proposed to systematically explore the complex design space of the different configurations.
the variety of applications for fieldprogrammable gate arrays (FPGAs) is continuously growing, thus it is important to address power consumption issues during the operation. As technological node shrinks, leakage pow...
详细信息
the variety of applications for fieldprogrammable gate arrays (FPGAs) is continuously growing, thus it is important to address power consumption issues during the operation. As technological node shrinks, leakage power becomes increasingly critical in overall power consumption of FPGA. the technique of configuration pre-fetching (loads configurations as soon as possible) adopted to achieve high performance is one of the major reasons of leakage waste since regions containing reconfiguration information cannot be powered down in between the time gap of reconfiguration and execution. In this work, we present a heuristic approach to minimize the leakage power consumption for two-dimensional reconfigurable FPGA architectures. the heuristic scheduler is based on list scheduling and exploits dynamic priority for sorting the tasks into schedule order and a cost function for cell allocation. Farthest placement scheme is adopted for anti-fragmentation purpose. the cost function provides control to compromise between leakage dissipation and schedule length.
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent field-programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs h...
详细信息
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent field-programmable Gate Arrays (FPGA). General-Purpose Graphics Processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems.
暂无评论