In classical FPGA, LUTs and DFFs are pre-packed into BLEs and then BLEs are grouped into logic blocks. We propose a novel logic block architecture with fast combinational paths between LUTs, called pattern-based logic...
详细信息
In classical FPGA, LUTs and DFFs are pre-packed into BLEs and then BLEs are grouped into logic blocks. We propose a novel logic block architecture with fast combinational paths between LUTs, called pattern-based logic blocks. A new clustering algorithm is developed to release the potential of pattern-based logic blocks. Experimental results show that the novel architecture and the associated clustering algorithm lead to a 14% performance gain and a 8% wirelength reduction with a 3% area overhead compared to conventional architecture in large control-instensive benchmarks.
In this paper, we propose a design support tool set for asynchronous circuits with bundled-data implementation to implement them on commercial FPGAs easily considering a latency constraint. the design support tool set...
详细信息
In this paper, we propose a design support tool set for asynchronous circuits with bundled-data implementation to implement them on commercial FPGAs easily considering a latency constraint. the design support tool set consists of six tools to automate constraint generation, timing verification, and delay adjustment for bundled-data implementation. In the experiments, we synthesize two circuits using the proposed tool set and compare area, performance, power consumption, and energy consumption withthe synchronous counterparts.
In this work, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the In...
详细信息
In this work, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the Indirect Calculation of Tree Lengths method and the Progressive Neighborhood. In our implementation, we define a tree structure, and accelerate the search by parallel and pipeline processing. We show results for six real-world biological datasets. We compare execution times against our previous hardware approach, and TNT, the fastest available parsimony program. Acceleration rates between 34 to 45 per rearrangement, and 2 to 6, for the whole search, are obtained against our previous approach. Acceleration rates between 2 to 4 per rearrangement, and 18 to 112, for the whole search, are obtained against TNT. We estimate that these acceleration rates could increase for even larger datasets.
In this paper a novel, low-latency family of high-radix Parallel Prefix Network adders and modular adders has been proposed. this family efficiently takes advantage of fast carry chains of modern FPGAs. the implementa...
详细信息
In this paper a novel, low-latency family of high-radix Parallel Prefix Network adders and modular adders has been proposed. this family efficiently takes advantage of fast carry chains of modern FPGAs. the implementation results reveal that these adders have great potential for efficient implementation of modular addition withthe long integers used in various public key cryptography schemes.
fieldprogrammable Gate Arrays (FPGAs) are an ideal platform for building systems with custom hardware accelerators, however managing these systems is still a major challenge. the OpenCL standard has become accepted a...
详细信息
fieldprogrammable Gate Arrays (FPGAs) are an ideal platform for building systems with custom hardware accelerators, however managing these systems is still a major challenge. the OpenCL standard has become accepted as a good programming model for managing heterogeneous platforms due to its rich constructs. Although commercial OpenCL frameworks are now emerging, there is a need for an open-source OpenCL framework that facilitates the exploration of the overall system architecture and software, as well as the implementation and architectures of the custom hardware accelerators (devices). In this paper, we use an OpenCL framework to compare interconnect implementations for a simple multiprocessor accelerator.
Physically Unclonable Functions (PUFs) based on the evaluation of uninitialized SRAM are one of the most promising PUF candidates to date. However, transferring their concept to Xilinx FPGAs is not straightforward sin...
详细信息
Physically Unclonable Functions (PUFs) based on the evaluation of uninitialized SRAM are one of the most promising PUF candidates to date. However, transferring their concept to Xilinx FPGAs is not straightforward since all SRAM-based block memories in these FPGAs are automatically cleared on power-up, destroying the desired initial bits of information. In this work we therefore propose a novel strategy to convert block memories of 28nm Xilinx FPGAs into SRAM-PUFs by exploiting their recently introduced feature of power-gating and partial reconfiguration.
Most modern field-programmable gate arrays (FPGAs) employ a look-up table (LUT) as their basic logic cell. Although a k-input LUT can implement any k-input logic, its functionality relies on a large amount of configur...
详细信息
Most modern field-programmable gate arrays (FPGAs) employ a look-up table (LUT) as their basic logic cell. Although a k-input LUT can implement any k-input logic, its functionality relies on a large amount of configuration memory. As FPGA scales improve, the increased quantity of configuration memory cells required for FPGAs will require a larger area and consume more power. Moreover, the soft-error rate per device will also increase as more configuration memory cells are embedded. We propose scalable logic modules (SLMs), logic cells requiring less configuration memory, reducing configuration memory by making use of partial functions of Shannon expansion for frequently appearing logics. Experimental results show that SLM-based FPGAs use much less configuration memory and have smaller area than conventional LUT-based FPGAs.
the Remote FPGA Lab (RFL) provides interactive web-based visual control and probing of reconfigurable logic hardware in the Cloud in real time, and supports a learn-by-doing approach to digital system education. the a...
详细信息
ISBN:
(纸本)9781849199247
the Remote FPGA Lab (RFL) provides interactive web-based visual control and probing of reconfigurable logic hardware in the Cloud in real time, and supports a learn-by-doing approach to digital system education. the authors have previously reported the RFL architecture, and RFL usage and survey data which highlights its effectiveness for enhanced learning, achievement and engagement. this paper illustrates an RFL counter example use case. A range of animated interactive web page console views are presented, from top level block diagram to FPGA Lookup Table and D Flip Flop hardware implementation views, and Finite State Machine animation. the paper illustrates the additional interactive real-time timing diagram functionality and proposes an automatic online assessment strategy using the RFL.
High-level synthesis (HLS) promises to increase designer productivity in the face of steadily increasing FPGA sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implemen...
详细信息
High-level synthesis (HLS) promises to increase designer productivity in the face of steadily increasing FPGA sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implementation. One roadblock to HLS adoption is the lack of a debugging infrastructure. To debug, designers can run their source code on a processor; however, this does not capture interactions with other system components. the alternative is to debug using the RTL, which is beyond the expertise of software designers, and impractical for hardware designers as the RTL may not resemble the original source code.
Despite a decade of activity in the development of soft vector processors for FPGAs, high-level language support remains thin. We attribute this problem to a design method in which the high-level vector programming in...
详细信息
Despite a decade of activity in the development of soft vector processors for FPGAs, high-level language support remains thin. We attribute this problem to a design method in which the high-level vector programming interface is only really considered once the processor architecture has been perfected, by which point the designer may be committed to the time-consuming development of a complicated compiler. In this paper, we present the codesign of a soft vector processor and a lightweight compiler, which together lift the level of abstraction for the programmer while allowing a rapid compiler implementation *** demonstrate the effectiveness of our approach on a range of applications from digital signal processing, neuroscience, and machine learning.
暂无评论