In this paper, we propose a design support tool set for asynchronous circuits with bundled-data implementation to implement them on commercial FPGAs easily considering a latency constraint. the design support tool set...
详细信息
In this paper, we propose a design support tool set for asynchronous circuits with bundled-data implementation to implement them on commercial FPGAs easily considering a latency constraint. the design support tool set consists of six tools to automate constraint generation, timing verification, and delay adjustment for bundled-data implementation. In the experiments, we synthesize two circuits using the proposed tool set and compare area, performance, power consumption, and energy consumption withthe synchronous counterparts.
In this paper a novel, low-latency family of high-radix Parallel Prefix Network adders and modular adders has been proposed. this family efficiently takes advantage of fast carry chains of modern FPGAs. the implementa...
详细信息
In this paper a novel, low-latency family of high-radix Parallel Prefix Network adders and modular adders has been proposed. this family efficiently takes advantage of fast carry chains of modern FPGAs. the implementation results reveal that these adders have great potential for efficient implementation of modular addition withthe long integers used in various public key cryptography schemes.
fieldprogrammable Gate Arrays (FPGAs) are an ideal platform for building systems with custom hardware accelerators, however managing these systems is still a major challenge. the OpenCL standard has become accepted a...
详细信息
fieldprogrammable Gate Arrays (FPGAs) are an ideal platform for building systems with custom hardware accelerators, however managing these systems is still a major challenge. the OpenCL standard has become accepted as a good programming model for managing heterogeneous platforms due to its rich constructs. Although commercial OpenCL frameworks are now emerging, there is a need for an open-source OpenCL framework that facilitates the exploration of the overall system architecture and software, as well as the implementation and architectures of the custom hardware accelerators (devices). In this paper, we use an OpenCL framework to compare interconnect implementations for a simple multiprocessor accelerator.
Physically Unclonable Functions (PUFs) based on the evaluation of uninitialized SRAM are one of the most promising PUF candidates to date. However, transferring their concept to Xilinx FPGAs is not straightforward sin...
详细信息
Physically Unclonable Functions (PUFs) based on the evaluation of uninitialized SRAM are one of the most promising PUF candidates to date. However, transferring their concept to Xilinx FPGAs is not straightforward since all SRAM-based block memories in these FPGAs are automatically cleared on power-up, destroying the desired initial bits of information. In this work we therefore propose a novel strategy to convert block memories of 28nm Xilinx FPGAs into SRAM-PUFs by exploiting their recently introduced feature of power-gating and partial reconfiguration.
Most modern field-programmable gate arrays (FPGAs) employ a look-up table (LUT) as their basic logic cell. Although a k-input LUT can implement any k-input logic, its functionality relies on a large amount of configur...
详细信息
Most modern field-programmable gate arrays (FPGAs) employ a look-up table (LUT) as their basic logic cell. Although a k-input LUT can implement any k-input logic, its functionality relies on a large amount of configuration memory. As FPGA scales improve, the increased quantity of configuration memory cells required for FPGAs will require a larger area and consume more power. Moreover, the soft-error rate per device will also increase as more configuration memory cells are embedded. We propose scalable logic modules (SLMs), logic cells requiring less configuration memory, reducing configuration memory by making use of partial functions of Shannon expansion for frequently appearing logics. Experimental results show that SLM-based FPGAs use much less configuration memory and have smaller area than conventional LUT-based FPGAs.
High-level synthesis (HLS) promises to increase designer productivity in the face of steadily increasing FPGA sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implemen...
详细信息
High-level synthesis (HLS) promises to increase designer productivity in the face of steadily increasing FPGA sizes, and broaden the market of use, allowing software designers to reap the benefits of hardware implementation. One roadblock to HLS adoption is the lack of a debugging infrastructure. To debug, designers can run their source code on a processor; however, this does not capture interactions with other system components. the alternative is to debug using the RTL, which is beyond the expertise of software designers, and impractical for hardware designers as the RTL may not resemble the original source code.
Despite a decade of activity in the development of soft vector processors for FPGAs, high-level language support remains thin. We attribute this problem to a design method in which the high-level vector programming in...
详细信息
Despite a decade of activity in the development of soft vector processors for FPGAs, high-level language support remains thin. We attribute this problem to a design method in which the high-level vector programming interface is only really considered once the processor architecture has been perfected, by which point the designer may be committed to the time-consuming development of a complicated compiler. In this paper, we present the codesign of a soft vector processor and a lightweight compiler, which together lift the level of abstraction for the programmer while allowing a rapid compiler implementation *** demonstrate the effectiveness of our approach on a range of applications from digital signal processing, neuroscience, and machine learning.
the design and implementation of a multitasking runtime system for mixed-architecture applications on a tightly coupled FPGA-CPU platform is presented. the runtime environment and the user applications assume an under...
详细信息
the design and implementation of a multitasking runtime system for mixed-architecture applications on a tightly coupled FPGA-CPU platform is presented. the runtime environment and the user applications assume an underlying machine that encompasses multiple computing architectures within a unified machine model. Using this model, a unified process scheduling mechanism was developed that enables concurrent execution of multiple mixed-architecture processes. Scheduling and allocation strategies, including blocking and preemption, were implemented and evaluated with respect to performance and fairness on a Xilinx Zynq platform using a mix of synthetic workloads.
this paper presents a novel methodology for generating and compressing configuration bitstreams for modules that can be executed at different positions of an FPGA. the presented methodology for bitstream generation an...
详细信息
this paper presents a novel methodology for generating and compressing configuration bitstreams for modules that can be executed at different positions of an FPGA. the presented methodology for bitstream generation and compression does not need deep knowledge of the bitstream format and it is independent of the target (Xilinx) FPGA family. the approach consists of a design phase where partial bitstreams are decomposed into sequences of module dependent and module independent pieces of configuration data. At run-time, this data can then be recomposed for the individual placement positions by a special DMA configuration controller as one atomic operation without any further software interaction. Our experiments demonstrate that module relocation and fast partial reconfiguration can be implemented at low logic cost.
the effect of variability has become increasingly significant as a result of technology geometry scaling. this paper describes Asynchronous Assisting logic (AAL) blocks and the method of introducing them into modern F...
详细信息
the effect of variability has become increasingly significant as a result of technology geometry scaling. this paper describes Asynchronous Assisting logic (AAL) blocks and the method of introducing them into modern FPGA architecture, in order to increase tolerance of the wide range latency variations caused by parametric variation, and temperature and supply voltage fluctuations. the proposed method leverages the availability of variation maps and suggests deploying configurable AAL blocks only into the variation critical paths - reinforcing rather rerouting/remapping. this method reduces the size overhead significantly which normally will be incurred by fully asynchronous designs. the proposed technique maintains the existing FPGA architecture allowing potential reuse of design flow. Simulations show correct functionality given regularly variable, randomly variable and capacitor switching energy harvester voltage supplies.
暂无评论