Hardware-software co-design is the new trend for deep neural network and FPGA accelerator development, which iteratively revises and tunes the full system. The bottleneck of the approach lies in the time-consuming har...
详细信息
In recent years the challenge of high performance, low power retargettable embedded system has been faced with different technological and architectural solutions. In this paper we present a new configurable unit expl...
详细信息
In recent years the challenge of high performance, low power retargettable embedded system has been faced with different technological and architectural solutions. In this paper we present a new configurable unit explicitly designed to implement additional reconfigurable pipelined datapaths, suitable for the design of reconfigurable processors. A VLIW reconfigurable processor has been implemented on silicon in a standard 0.18 μm CMOS technology to prove the effectiveness of the proposed unit. Testing on a signal processing algorithms benchmark showed speedups from 4.3x to 13.5x and energy consumption reduction up to 92%.
This paper presents a review of some existing architectures for the implementation of Montgomery modular multiplication and exponentiation on FPGA (fieldprogrammablegate Array). Some new architectures are presented,...
详细信息
ISBN:
(纸本)9781581134520
This paper presents a review of some existing architectures for the implementation of Montgomery modular multiplication and exponentiation on FPGA (fieldprogrammablegate Array). Some new architectures are presented, including a pipelined architecture exploiting the maximum carry chain length of the FPGA which is used to implement the modular exponentiation operation required for RSA encryption and decryption. Speed and area comparisons are performed on the optimised designs. The issues of targeting a design specifically for a reconfigurable device are considered, taking into account the underlying architecture imposed by the target technology.
Multi-point distributed random variables whose moments match those of a Gaussian random variable up to a certain order play an important role in Monte Carlo simulations of weak approximations of stochastic differentia...
详细信息
Multi-point distributed random variables whose moments match those of a Gaussian random variable up to a certain order play an important role in Monte Carlo simulations of weak approximations of stochastic differential equations. In applications such as finance, where "real time" execution is required, there is a strong need for highly efficient implementations. In this paper a fast and flexible dedicated hardware solution on a fieldprogrammablegate Array (FPGA) is presented. A comparative performance analysis between a software-only and the proposed hardware solution demonstrates that the FPGA solution is bottleneck-free, retains the flexibility of the software solution and significantly increases the computational efficiency.
To implement high-density and high-speed FPGA circuits, designers need tight control over the circuit implementation process. However, current design tools are unsuited for this purpose as they lack fast turnaround ti...
详细信息
ISBN:
(纸本)9780897919784
To implement high-density and high-speed FPGA circuits, designers need tight control over the circuit implementation process. However, current design tools are unsuited for this purpose as they lack fast turnaround times, interactiveness, and integration. We present a system for the Xilinx XC6200 FPGA, which addresses these issues. It consists of a suite of tightly integrated tools for the XC6200 architecture centered around an architecture-independent tool framework. The system lets the designer easily intervene at various stages of the design process and features design cycle times (from an HDL specification to a complete layout) in the order of seconds.
This paper examines the tradeoffs between flexibility, area, and power dissipation of programmable clock networks for field-programmablegatearrays (FPGA's). The paper begins by describing a parameterized clock n...
详细信息
ISBN:
(纸本)1595932925
This paper examines the tradeoffs between flexibility, area, and power dissipation of programmable clock networks for field-programmablegatearrays (FPGA's). The paper begins by describing a parameterized clock network model that describes a broad range of programmable clock network architectures. Specifically, the model supports architectures with multiple local and global clock domains and varying amounts of flexibility at various levels of the clock network. Using the model, the architectural parameters that control the flexibility of the clock network are varied to determine the cost of this flexibility in terms of area and power dissipation. From these experiments, the study finds that area and power costs are highest for networks with flexibility close to the logic blocks. Furthermore, it found that clock networks with local clock domains have little overhead and are significantly more efficient than clock networks without local clock domains for applications with multiple clocks. Copyright 2006 acm.
field-programmablegatearrays (FPGAs) are widely employed in network-interface cards across applications including cloud services, machine learning, and high-frequency trading. These applications often share a common...
详细信息
This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (FPGAs), including studies, implementation techniques, operators, and structures, in various ...
详细信息
This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (FPGAs), including studies, implementation techniques, operators, and structures, in various area-time tradeoffs. It covers the integer operations of addition/subtraction, multiplication, squaring, division, and square root, in parallel, and in both serial modes (least-significant digit first, and online). Many people, including researchers in the field of computer arithmetic, parallel computing, digital signal and image processing, system-on-a-programmable chip (SoPC) designers, and other people with a need to implement special purpose arithmetic circuits on FPGAs, might find such a review useful, either as an introduction to the topic, as a knowledge update, or for reference.
Via-programmablegatearrays (VPGAs) offer a middle ground application specific integrated circuits and fieldprogrammablearrays in terms of flexibility, manufactuing , speed, power and area. In this paper, we presen...
详细信息
ISBN:
(纸本)9781605584102
Via-programmablegatearrays (VPGAs) offer a middle ground application specific integrated circuits and fieldprogrammablearrays in terms of flexibility, manufactuing , speed, power and area. In this paper, we present a VPGA logic cell, the complementary universal logic (CULG) which can be used to implement both sequential combinatorial elements. Its performance is compared a number of other designs including transmission , differential cascode voltage switch with pass gate, standard cell. The CULG is found to have comparable delay product and process variation sensitivity to the other designs while offering the lowest power consumption. Copyright 2009 acm.
暂无评论