In this paper we present new technology mapping algorithms for use in a programmable logic device (PLD) that contains both lookup tables (LUTs) and PLA-like blocks. The technology mapping algorithms partially collapse...
详细信息
ISBN:
(纸本)9781581131932
In this paper we present new technology mapping algorithms for use in a programmable logic device (PLD) that contains both lookup tables (LUTs) and PLA-like blocks. The technology mapping algorithms partially collapse circuits to reduce either area or depth, and pack the circuits into a minimum number of LUTs and PLA-like blocks. Since no other technology mapping algorithm for this problem has been previously published, we cannot compare our approach to others. Instead, to illustrate the importance of this problem we use our algorithms to investigate the benefits provided by a PLD architecture with both LUTs and PLA-like blocks compared to a traditional LUT-based fpga. The experimental results indicate that our mixed PLD architecture is more area-efficient than LUT-based fpgas by up to 29%, or more depth-efficient by up to 75%.
It has become clear that on-chip storage is an essential component of high-density fpgas. These arrays were originally intended to implement storage, but recent work has shown that they can also be used to implement l...
详细信息
ISBN:
(纸本)9781581131932
It has become clear that on-chip storage is an essential component of high-density fpgas. These arrays were originally intended to implement storage, but recent work has shown that they can also be used to implement logic very efficiently. This previous work has only considered single-port arrays. Many current fpgas, however, contain dual-port arrays. In this paper we present an algorithm that maps logic to these dual-port arrays. Our algorithm can either optimize area with no regard for circuit speed, or optimize area under the constraint that the combinational depth of the circuit does not increase. Experimental results show that, on average, our algorithm packs between 29% and 35% more logic than an algorithm that targets single-port arrays. We also show, however, that even with this algorithm, dual-port arrays are still not as area-efficient as single-port arrays when implementing logic.
With increased logic density due to the shift towards Deep Submicron technologies (DSM), fpgas have become a viable option for implementing large designs. However, most commercial fpgas, due to their general purpose a...
详细信息
With increased logic density due to the shift towards Deep Submicron technologies (DSM), fpgas have become a viable option for implementing large designs. However, most commercial fpgas, due to their general purpose architectural nature, cannot handle designs which require very high throughput. In this paper, we propose a novel high throughput fpga architecture which tries to combine the high-performance of Application Specific Integrated Circuits (ASICs) and the flexibility afforded by the reconfigurability of fpgas. This architecture utilizes the concept of `Wave-Steering' and works best for designs which are highly regular and have almost equal delays along all paths. It has enormous potential in Digital Signal and Image Processing applications since a good portion of these applications are regular in nature. Preliminary results for some commonly used DSP designs are encouraging and yield throughputs in the neighborhood of 770 MHz in 0.5 μ CMOS technology.
With the expiration of the Data Encryption Standard (DES) in 1998, the Advanced Encryption Standard (AES) development process is well underway. It is hoped that the result of the AES process will be the specification ...
详细信息
With the expiration of the Data Encryption Standard (DES) in 1998, the Advanced Encryption Standard (AES) development process is well underway. It is hoped that the result of the AES process will be the specification of a new non-classified encryption algorithm that will have the global acceptance achieved by DES as well as the capability of long-term protection of sensitive information. The technical analysis used in determining which of the potential AES candidates will be selected as the Advanced Encryption Algorithm includes efficiency testing of both hardware and software implementations of candidate algorithms. Reprogrammable devices such as fieldprogrammablegatearrays (fpgas) are highly attractive options for hardware implementations of encryption algorithms as they provide cryptographic algorithm agility, physical security, and potentially much higher performance than software solutions. This contribution investigates the significance of an fpga implementation of Serpent, one of the Advanced Encryption Standard candidate algorithms. Multiple architecture options of the Serpent algorithm will be explored with a strong focus being placed on a high speed implementation within an fpga in order to support security for current and future high bandwidth applications. One of the main findings is that Serpent can be implemented with encryption rates beyond 4 Gbit/s on current fpgas.
This paper presents the power consumption estimation for the novel Virtex architecture. Due to the fact that the XC4000 and the Virtex core architecture are very similar, we used the basic approaches for the XC4000-FP...
详细信息
This paper presents the power consumption estimation for the novel Virtex architecture. Due to the fact that the XC4000 and the Virtex core architecture are very similar, we used the basic approaches for the XC4000-fpgas power consumption estimation and extended that method for the new Virtex family. We determined an appropriate technology-dependent power factor Kp to calculate the power consumption on Virtex-chips, and developed a special benchmark test design to conduct our investigations. Additionally, the derived formulas are evaluated on two typical industrial designs. Our own emulation environments called SPYDER-ASIC-X1 and SPYDER-VIRTEX-X2 were used, which are best suited for the emulation of hardware designs for embedded systems.
The Embedded System Block (ESB) of the APEX E programmable logic device family from Altera Corporation includes the capability of implementing content addressable memory (CAM) as well as product term macrocells, ROM, ...
详细信息
ISBN:
(纸本)9781581131932
The Embedded System Block (ESB) of the APEX E programmable logic device family from Altera Corporation includes the capability of implementing content addressable memory (CAM) as well as product term macrocells, ROM, and dual port RAM. In CAM mode each ESB can implement a 32 word CAM with 32 bits per word. In product term mode, each ESB has 16 macrocells built out of 32 product terms with 32 literal inputs. The ability to reconfigure memory blocks in this way represents a new and innovative use of resources in a programmable logic device, requiting creative solutions in both the hardware and software domains. The architecture and features of this Embedded System Block are described.
Embedded memory blocks (EMBs) are used in modern fieldprogrammablegatearrays (fpgas) for implementation of on-chip memories or specialized logic functions. In this paper, we propose an integrated approach with stru...
详细信息
ISBN:
(纸本)9781581131932
Embedded memory blocks (EMBs) are used in modern fieldprogrammablegatearrays (fpgas) for implementation of on-chip memories or specialized logic functions. In this paper, we propose an integrated approach with structural clustering and functional decomposition to minimize the circuit area using EMBs while preserving the circuit delay. The structural clustering method is based on the concepts of Maximum Fanout Free Cone (MFFC) and Maximum Fanout Free Subgraph (MFFS). In order to effectively use EMB in large clusters, single-output and multiple-output functional decompositions are used to decompose large clusters so that the encoding functions or base functions can be implemented by EMBs. It also considers multiple EMBs for individual large cluster so that better area reduction can be obtained. We have developed an algorithm called EMB_Syn that can be used as a postprocessing tool in the fpga synthesis flow. MCNC benchmarks are used to test EMB_Syn on Altera's FLEX10K device family and the experimental results are compared with those by EMB_Pack and SMAP. When EMB_Syn is used as postmapping processing, it shows 45.06% and up to 5.23% improvements over EMB_Pack and SMAP, respectively, in terms of the covered area by EMBs.
This paper gives a representation for graph data structures as electronic circuits in reconfigurable hardware. Graph properties, such as vertex reachability, are computed quickly by exploiting a graph's edge paral...
详细信息
This paper gives a representation for graph data structures as electronic circuits in reconfigurable hardware. Graph properties, such as vertex reachability, are computed quickly by exploiting a graph's edge parallelism - signals propagate along many graph edges concurrently. This new representation admits arbitrary graphs in which vertices/edges may be inserted and deleted dynamically at low cost - graph modification does not entail any re-fitting of the graph's circuit. Dynamic modification is achieved by rewriting cells in a re-configurable hardware array. Dynamic graph algorithms are given for vertex reachability, transitive closure, shortest unit path, cycle detection, and connected-component identification. On the task of computing a graph's transitive closure, for example, simulation of such a dynamic graph processor indicates possible speedups greater than three orders of magnitude compared to an efficient software algorithm running on a contemporaneously fast uniprocessor. Implementation of a prototype in an fpga verifies the accuracy of the simulation and demonstrates that a practical and efficient (compact) mapping of the graph construction is possible in existing fpga architectures. In addition to speeding conventional graph computations with dynamic graph processors, we note their potential as parallel graph reducers implementing general (Turing equivalent) computation.
In this paper we present a `high-level' fpga architecture description language which lets fpga architects succinctly and quickly describe an fpga routing architecture. We then present an `architecture generator...
详细信息
ISBN:
(纸本)9781581131932
In this paper we present a `high-level' fpga architecture description language which lets fpga architects succinctly and quickly describe an fpga routing architecture. We then present an `architecture generator' built into the VPR CAD tool that converts this high-level architecture description into a detailed and completely specified flat fpga architecture. This flat architecture is the representation with which CAD optimization and visualization modules typically work. By allowing fpga researchers to specify an architecture at a high-level, an architecture generator enables quick and easy `what-if' experimentation with a wide range of fpga architectures. The net effect is a more fully optimized final fpga architecture. In contrast, when fpga architects are forced to use more traditional methods of describing an fpga (such as the manual specification of every switch in the basic tile of the fpga), far less experimentation can be performed in the same time, and the architectures experimented upon are likely to be highly similar, leaving important parts of the design space completely unexplored. This paper describes the automated routing architecture generation problem, and highlights the two key difficulties - creating an fpga architecture that matches all of an fpga architect's specifications, while simultaneously determining good values for the many unspecified portions of an fpga so that a high quality fpga results. We describe the method by which we generate fpga routing architectures automatically, and present several examples.
In this paper, we study the technology mapping problem for a novel fpga architecture that is based on k-input single-output PLA-like cells, or, k/m-macrocells. Each cell in this architecture can implement a single out...
详细信息
ISBN:
(纸本)9781581131932
In this paper, we study the technology mapping problem for a novel fpga architecture that is based on k-input single-output PLA-like cells, or, k/m-macrocells. Each cell in this architecture can implement a single output function of up to k inputs and up to m product terms. We develop a very efficient technology mapping algorithm, k_m_flow, for this new type of architecture. The experiment results show our algorithm can achieve depth-optimality in practically all cases. Furthermore it is shown that the k/m-macrocell based fpgas are practically equivalent to the traditional k-LUT based fpgas with only a relatively small number of product terms (m≤k+3). We also investigate the total area and delay of k/m-macrocell based fpgas on various benchmarks to compare it with commonly used 4-LUT based fpgas. The experimental result shows k/m-macrocell based fpgas can outperform 4-LUT based fpgas in terms of both delay and area after placement and routing by VPR.
暂无评论