As programmable logic grows more viable for implementing full design systems, performance has become a primary issue for programmable logic device architectures. This paper presents the high-level design of Dali, a PL...
详细信息
ISBN:
(纸本)9781581134520
As programmable logic grows more viable for implementing full design systems, performance has become a primary issue for programmable logic device architectures. This paper presents the high-level design of Dali, a PLD architecture specifically aimed at performance-driven applications. We will present significant portions of the background research that contributed to our architectural decisions, an overview of the core routing architecture and benchmarking experiments used to evaluate the prototype device.
As device densities increase, testing cost is becoming a larger portion of the overall FPGA manufacturing cost. We present an approach to speed up testing FPGA interconnect by reconfiguring it during the test. Simple ...
详细信息
ISBN:
(纸本)9781581134520
As device densities increase, testing cost is becoming a larger portion of the overall FPGA manufacturing cost. We present an approach to speed up testing FPGA interconnect by reconfiguring it during the test. Simple additions are made to create feedback shift register structures, which considerably reduce the number of test configurations for the switching matrix interconnect. This new testing architecture reduces switching matrix test time by 66% and the diagnosis time by 72%. The additions are transparent to the users both in terms of speed and functionality.
Random number generators (RNGs) based upon neighborhood-of-four cellular automata (CA) with asymmetrical, non-local connections are explored. A number of RNGs that pass Marsaglia's rigorous Diehard suite of random...
详细信息
ISBN:
(纸本)9781581134520
Random number generators (RNGs) based upon neighborhood-of-four cellular automata (CA) with asymmetrical, non-local connections are explored. A number of RNGs that pass Marsaglia's rigorous Diehard suite of random number tests have been discovered. A neighborhood size of four allows a single CA cell to be implemented with a four-input lookup table and a one-bit register which are common building blocks in popular fieldprogrammablegatearrays (FPGAs). The investigated networks all had periodic (wrap around) boundary conditions with either 1-d, 2-d, or 3-d interconnection topologies. Trial designs of 64-bit networks using a Xilinx XCV 1000-6 FPGA predict a maximum clock rate of 214 MHz to 230 MHz depending upon interconnection topology.
field-programmable-Core-arrays (FPCA) will include various computing cores for a wide variety of applications ranging from DSP to general purpose computing. With the increasing gap between core computing speeds and me...
详细信息
ISBN:
(纸本)9781581134520
field-programmable-Core-arrays (FPCA) will include various computing cores for a wide variety of applications ranging from DSP to general purpose computing. With the increasing gap between core computing speeds and memory access latency, managing and orchestrating the movement of data across multiple cores will become increasingly important. In this paper we propose data reorganization engines that allow a wide variety of data reorganizations intra- as well as inter-memory modules for future FPCAs. We have experimented with a suite of data reorganizations pervasive in DSP applications. Our limited set of experiments reveals that the proposed designs for these engines are flexile and use little design area in current FPGA fabrics, making them amenable to be easily integrated in future FPCAs as either soft- or hard- macros.
This paper analyzes the dynamic power consumption in the fabric of fieldprogrammablegatearrays (FPGAs) by taking advantage of both simulation and measurement. Our target device is Xilinx Virtex™-II family, which co...
详细信息
ISBN:
(纸本)9781581134520
This paper analyzes the dynamic power consumption in the fabric of fieldprogrammablegatearrays (FPGAs) by taking advantage of both simulation and measurement. Our target device is Xilinx Virtex™-II family, which contains the most recent and largest programmable fabric. We identify important resources in the FPGA architecture and obtain their utilization, using a large set Of real designs. Then, using a number of representative case studies we calculate the switching activity corresponding to each resource. Finally, we combine effective capacitance of each resource with its utilization and switching activity to estimate its share of power consumption. According to our results, the power dissipation share of routing, logic and clocking resources are 60%, 16%, and 14%, respectively. Also, we concluded that dynamic power dissipation of a Virtex-II CLB is 5.9 μW per MHz for typical designs, but it may vary significantly depending on the switching activity.
As the capacity of FPGA's increases to millions of equivalent gates the use of Intellectual Property (IP) cores becomes increasingly important to control design complexity. FPGA's are becoming platforms for in...
详细信息
As the capacity of FPGA's increases to millions of equivalent gates the use of Intellectual Property (IP) cores becomes increasingly important to control design complexity. FPGA's are becoming platforms for integrating a system solution from components supplied by independent vendors in the same way as printed circuit boards provided a platform for earlier generations of designers. However, the current commercial model for IP cores involves large up-front license fees reminiscent of ASIC NRE charges. In order to match the IP core business model to the low to medium volume applications addressed by FPGA customers it is important to develop cryptographic techniques which allow IP core vendors to sell their product on a pay-per-use basis rather than through up-front license fees.
This paper presents abstract layout techniques for a variety of FPGA switch block architectures. We evaluate the relative density of subset, universal, and Wilton switch block architectures. For subset switch blocks o...
详细信息
ISBN:
(纸本)9781581134520
This paper presents abstract layout techniques for a variety of FPGA switch block architectures. We evaluate the relative density of subset, universal, and Wilton switch block architectures. For subset switch blocks of small size, we find the optimal implementations using a simple metric. We also develop a tractable heuristic that returns the optimal results for small switch blocks, and good results for large switch blocks. For switch blocks with general connectivity, we develop a representation and a layout evaluation technique. We use these techniques to compare a variety of small switch blocks. We find that the traditional Xilinx-style, subset switch block is superior to the other proposed architectures. Finally, we have hand-designed some small switch blocks to confirm our results.
We present a routability-driven bottom-up clustering technique for area and power reduction in clustered FPGAs. This technique uses a cell connectivity metric to identify seeds for efficient clustering. Effective seed...
详细信息
ISBN:
(纸本)9781581134520
We present a routability-driven bottom-up clustering technique for area and power reduction in clustered FPGAs. This technique uses a cell connectivity metric to identify seeds for efficient clustering. Effective seed selection, coupled with an interconnect-resource aware clustering and placement, can have a favorable impact on circuit routability. It leads to better device utilization, savings in area, and reduction in power consumption. Routing area reduction of 35% is achieved over previously published results. Power dissipation simulations using a buffered pass-transistor-based FPGA interconnect model are presented. They show that our clustering technique can reduce the overall device power dissipation by an average of 13%.
Video signal processing requires complex algorithms performing many basic operations on a video stream. To perform these calculations in real-time in a FPGA, we must use innovative structures to meet speed requirement...
详细信息
ISBN:
(纸本)9781581134520
Video signal processing requires complex algorithms performing many basic operations on a video stream. To perform these calculations in real-time in a FPGA, we must use innovative structures to meet speed requirements while managing complexity. As part of a project aiming at the development of a video noise reducer, we developed an optimized processing stream that required some floating-point calculations. This paper presents the rationale for developing a floating-point unit, justifies the data representation used, its implementation in a Xilinx VirtexE FPGA and reports the performance we obtained. A divider using this representation is also presented, with its implementation and performances in the same FPGA.
This paper examines circuit design of buffered routing switches in symmetrical, island-style FPGAs. The effects of switch size, tile length, level-restoring, and slow input slew rates are examined. Two new fanin-based...
详细信息
ISBN:
(纸本)9781581134520
This paper examines circuit design of buffered routing switches in symmetrical, island-style FPGAs. The effects of switch size, tile length, level-restoring, and slow input slew rates are examined. Two new fanin-based switch designs are used to eliminate nearly all of the increase in delay that arises from fanout with a previous switch design. Alternating between buffers and pass transistors is shown to improve connection delay without fanout by 25 %. To take advantage of this, we propose schemes to replace some buffers with pass transistors to simultaneously reduce area and delay. Routing a suite of MCNC benchmark circuits shows that 14% in areadelay, or 7% in delay can be saved using the new switch schemes. Alternatively, approximately 13 % in area can be saved with no degradation to delay.
暂无评论