To improve fpga performance for arithmetic circuits, this paper proposes a new architecture for fpga logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adja...
详细信息
ISBN:
(纸本)9781595939340
To improve fpga performance for arithmetic circuits, this paper proposes a new architecture for fpga logic cells that includes a 6:2 compressor. The new cell features additional fast carry-chains that concatenate adjacent compressors and can be routed locally without the global routing network. Unlike previous carry-chains for binary and ternary addition, the carry chain used by the new cell only spans 2 logic blocks, which significantly improves the delay of multi-input addition operations mapped onto the fpga. The delay and area overhead that arises from augmenting a traditional fpga logic cell with the new compressor structure is minimal. Using this new cell, we observed an average speedup in combinational delay of 1.41 compared to adder trees synthesized using ternary adders. Copyright 2008 acm.
As the complexity of programmable architectures increases with advances in silicon process technology, there is a growing need to extract greater productivity and performance from the tools. Due to their inherent reco...
详细信息
ISBN:
(纸本)9781450361378
As the complexity of programmable architectures increases with advances in silicon process technology, there is a growing need to extract greater productivity and performance from the tools. Due to their inherent reconfigurability, fpgas are proving to be valuable targets for more efficient domain-specific architectures. However, fpga implementation tools are designed for a broad set of applications. In this paper we describe RapidWright, an open source framework that enables customized implementations for Xilinx fpgas. RapidWright enables implementation tools that can take advantage of the great potential of domain-specific attributes-leading to greater productivity and performance. The focus of this paper is to provide an introductory reference of RapidWright and its use cases so that others may be empowered to adapt their implementations to their domain-specific applications.
The ability to measure delay of arbitrary circuits on fpga offers many opportunities for on-chip characterisation and optimisation. This paper describes an improved delay measurement method by monitoring the transitio...
详细信息
ISBN:
(纸本)9781450305549
The ability to measure delay of arbitrary circuits on fpga offers many opportunities for on-chip characterisation and optimisation. This paper describes an improved delay measurement method by monitoring the transition probability at. the output nodes as the operating frequency is swept. The new method uses optimised test vector generation to improve the accuracy of the test method. It is effectively demonstrated on a 4th order IIR filter circuit implemented on an Altera Cyclone III fpga.
Embedded memory blocks (EMBs) are used in modern fieldprogrammablegatearrays (fpgas) for implementation of on-chip memories or specialized logic functions. In this paper, we propose an integrated approach with stru...
详细信息
ISBN:
(纸本)9781581131932
Embedded memory blocks (EMBs) are used in modern fieldprogrammablegatearrays (fpgas) for implementation of on-chip memories or specialized logic functions. In this paper, we propose an integrated approach with structural clustering and functional decomposition to minimize the circuit area using EMBs while preserving the circuit delay. The structural clustering method is based on the concepts of Maximum Fanout Free Cone (MFFC) and Maximum Fanout Free Subgraph (MFFS). In order to effectively use EMB in large clusters, single-output and multiple-output functional decompositions are used to decompose large clusters so that the encoding functions or base functions can be implemented by EMBs. It also considers multiple EMBs for individual large cluster so that better area reduction can be obtained. We have developed an algorithm called EMB_Syn that can be used as a postprocessing tool in the fpga synthesis flow. MCNC benchmarks are used to test EMB_Syn on Altera's FLEX10K device family and the experimental results are compared with those by EMB_Pack and SMAP. When EMB_Syn is used as postmapping processing, it shows 45.06% and up to 5.23% improvements over EMB_Pack and SMAP, respectively, in terms of the covered area by EMBs.
This paper gives a representation for graph data structures as electronic circuits in reconfigurable hardware. Graph properties, such as vertex reachability, are computed quickly by exploiting a graph's edge paral...
详细信息
This paper gives a representation for graph data structures as electronic circuits in reconfigurable hardware. Graph properties, such as vertex reachability, are computed quickly by exploiting a graph's edge parallelism - signals propagate along many graph edges concurrently. This new representation admits arbitrary graphs in which vertices/edges may be inserted and deleted dynamically at low cost - graph modification does not entail any re-fitting of the graph's circuit. Dynamic modification is achieved by rewriting cells in a re-configurable hardware array. Dynamic graph algorithms are given for vertex reachability, transitive closure, shortest unit path, cycle detection, and connected-component identification. On the task of computing a graph's transitive closure, for example, simulation of such a dynamic graph processor indicates possible speedups greater than three orders of magnitude compared to an efficient software algorithm running on a contemporaneously fast uniprocessor. Implementation of a prototype in an fpga verifies the accuracy of the simulation and demonstrates that a practical and efficient (compact) mapping of the graph construction is possible in existing fpga architectures. In addition to speeding conventional graph computations with dynamic graph processors, we note their potential as parallel graph reducers implementing general (Turing equivalent) computation.
The radiation dose associated with computerized tomography (CT) is significant. Optimization-based iterative reconstruction approaches, e.g., compressive sensing provide ways to reduce the radiation exposure, without ...
详细信息
ISBN:
(纸本)9781450311557
The radiation dose associated with computerized tomography (CT) is significant. Optimization-based iterative reconstruction approaches, e.g., compressive sensing provide ways to reduce the radiation exposure, without sacrificing image quality. However, the computational requirement such algorithms is much higher than that of the conventional Filtered Back Projection (FBP) reconstruction algorithm. This paper describes an fpga implementation of one important iterative kernel called EM, which is the major computation kernel of a recent EM+TV reconstruction algorithm. We show that a hybrid approach (CPU+GPU+fpga) can deliver a better performance and energy efficiency than GPU-only solutions, providing 13X boost of throughput than a dual-core CPU implementation.
The popularity of fpgas is rapidly growing due to the unique advantages that they offer. However, their distinctive features also raise new questions concerning the security and communication capabilities of an fpga-b...
详细信息
ISBN:
(纸本)9781450305549
The popularity of fpgas is rapidly growing due to the unique advantages that they offer. However, their distinctive features also raise new questions concerning the security and communication capabilities of an fpga-based hardware platform. In this paper, we explore some of the limits of fpga side-channel communication. Specifically, we identify a previously unexplored capability that significantly increases both the potential benefits and risks associated with side-channel communication on an fpga: an in-device receiver. We designed and implemented three new communication mechanisms: speed modulation, timing modulation and pin hijacking. These non-traditional interfacing techniques have the potential to provide reliable communication with an estimated maximum bandwidth of 3.3 bit/sec, 8 Kbits/sec, and 3.4 Mbits/sec, respectively.
An algorithm is presented for partitioning a design in time. The algorithm divides a large, technology-mapped design into multiple configurations of a time-multiplexed fpga. These configurations are rapidly executed i...
详细信息
ISBN:
(纸本)9780897919784
An algorithm is presented for partitioning a design in time. The algorithm divides a large, technology-mapped design into multiple configurations of a time-multiplexed fpga. These configurations are rapidly executed in the fpga to emulate the large design. The tool includes facilities for optimizing the partitioning to improve routability, for fitting the design into more configurations than the depth of the critical path and for compressing the critical path of the design into fewer configurations, both to fit the design into the device and to improve performance. Scheduling results are shown for mapping designs into an 8-configuration time-multiplexed fpga and for architecture investigation for a time-multiplexed fpga.
This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (fpgas), including studies, implementation techniques, operators, and structures, in various ...
详细信息
This article is a concise literature review of the actual state of the art in arithmetic for field-programmablegatearrays (fpgas), including studies, implementation techniques, operators, and structures, in various area-time tradeoffs. It covers the integer operations of addition/subtraction, multiplication, squaring, division, and square root, in parallel, and in both serial modes (least-significant digit first, and online). Many people, including researchers in the field of computer arithmetic, parallel computing, digital signal and image processing, system-on-a-programmable chip (SoPC) designers, and other people with a need to implement special purpose arithmetic circuits on fpgas, might find such a review useful, either as an introduction to the topic, as a knowledge update, or for reference.
This paper introduces the Delaware Enhanced Emulation Platform (DEEP) - a fpga-based emulation system for hardware/software co-verification of many-core chip architectures. This platform exhibits the following three c...
详细信息
ISBN:
(纸本)9781450305549
This paper introduces the Delaware Enhanced Emulation Platform (DEEP) - a fpga-based emulation system for hardware/software co-verification of many-core chip architectures. This platform exhibits the following three characteristics: fast compilation of logic designs, debugging support, and affordability. It is based on a novel iterative emulation methodology for hardware design and verification. We also conducted a logic design and integration of a new architectural feature that provides Full/Empty bit fine-grain synchronization for the IBM Cyclops-64 many-core architecture and evaluated its performance against existing synchronization constructs.
暂无评论