Even with HiCuts algorithm, which is one of the most effective algorithms for packet classification, the on-line searching for each input packet still consumes the main CPU a large amount of computation resource if it...
详细信息
ISBN:
(纸本)9781595930293
Even with HiCuts algorithm, which is one of the most effective algorithms for packet classification, the on-line searching for each input packet still consumes the main CPU a large amount of computation resource if it is fulfilled by software. An effective alternative is to use a hardware co-processor to realize the on-line searching. Based on the principle of HiCuts algorithm, the architecture design of a hardware on-line searching co-processor with an fpga is presented in this paper. Especially, mapping the decision tree and linear search in each leaf node to the memory data structure is described in detail. Benefiting from multiple pipeline structure, there are a total of 12 searching engines working parallel to achieve very high searching speed (8M packet heads/second). The simulation test results show a useful guide for optimization of off-line pre-processing and the co-processor design.
Modern fpga architectures provide ample routing resources so that designs can be routed successfully. The routing architecture is designed to handle versatile connection configurations. However, providing such great f...
详细信息
Modern fpga architectures provide ample routing resources so that designs can be routed successfully. The routing architecture is designed to handle versatile connection configurations. However, providing such great flexibility comes at a high cost in terms of area, delay and power. We propose a new fpga routing architecture1 that utilizes a mixture of hard-wired and traditional flexible switches. The result is 24% reduction in leakage power consumption, 7% smaller area and 24% shorter delays, which translates to 30% increase in clock frequency. Despite the increase in clock speeds, the overall power consumption is reduced by 8%. Copyright 2005 acm.
Large, high density fpgas with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point...
详细信息
ISBN:
(纸本)9781595930293
Large, high density fpgas with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, it is not uncommon for microprocessors to yield only 10-20% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern fpgas and show that it can sustain high throughput, near peak, floating-point performance. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/fpga for a single Virtex II 6000-4 and 12 double precision Gflops for 16 Virtex Us (750Mflops/fpga). Copyright 2005 acm.
Although runtime dynamic reconfiguration of the fpga devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no stra...
详细信息
Although runtime dynamic reconfiguration of the fpga devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no straightforward design methodology, and the partitioning and CAD tool support is poor. This paper presents general concepts implemented in a placement and routing tool that provides an environment where designs that are partially and dynamically reconfigurable can be processed in order to be implemented on fpgas that support this technology, such as the Armel AT40K and AT94K series. The function of the tool is demonstrated on a simple real-world example.
fpgas are witnessing a big increase in their applications, especially with the introduction of state-of-the-art fpgas using nanometer technologies. This has been accompanied with a big increase in power dissipation in...
详细信息
fpgas are witnessing a big increase in their applications, especially with the introduction of state-of-the-art fpgas using nanometer technologies. This has been accompanied with a big increase in power dissipation in fpgas, which forms a road block to the integration of fpgas in several hand-held applications. Motivated by the increase in the percentage of leakage power dissipation to the total power dissipation in modern technologies, this work presents a complete CAD flow to mitigate leakage power dissipation in fpgas. The algorithm is based on a fpga architecture that employs multi-threshold CMOS technology. The flow is based on the VPR flow and it aims to pack and place logic blocks that exhibit similar idleness close to each other so that they can be turned off during their idle time. The flow is tested with a CMOS 0.13μm dual-vth technology and achieved an average power saving of 22%.
As the logic capacity of field-programmablegatearrays (fpgas) increases, they are being increasingly used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circ...
详细信息
ISBN:
(纸本)9781595930293
As the logic capacity of field-programmablegatearrays (fpgas) increases, they are being increasingly used to implement large arithmetic-intensive applications, which often contain a large proportion of datapath circuits. Since datapath circuits usually consist of regularly structured components (called bit-slices) which are connected together by regularly structured signals (called buses), it is possible to utilize datapath regularity in order to achieve significant area savings through fpga architectural innovations. This paper describes such an fpga routing architecture, called the multi-bit routing architecture, which employs bus-based connections in order to exploit datapath regularity. It is experimentally shown that, comparing to conventional fpga routing architectures, the multi-bit routing architecture can achieve 14% routing area reduction for implementing datapath circuits, which represents an overall fpga area savings of 10%. This paper also empirically determines the best values of several important architectural parameters for the new routing architecture including the most area efficient granularity values and the most area efficient proportion of bus-based connections. Copyright 2005 acm.
Sequential Control System in the industry has been used in applications based on programmable Logical Controllers (PLC). These Systems are, in general, highly complex and with an operation cycle around 1ms or 10ms. PL...
详细信息
Sequential Control System in the industry has been used in applications based on programmable Logical Controllers (PLC). These Systems are, in general, highly complex and with an operation cycle around 1ms or 10ms. PLC are, in general, expensive for theses high complex applications. In this work, a Dynamical Reconfigurable approach is presented, based on Xilinx Virtex-II fpga architecture, operating as a virtual hardware machine. In this context, the control process is specified in the industrial standard language SFC/Petri net (Sequential Function Chart). For large controllers, a partial and dynamical reconfiguration mechanism takes place and the controller is split into multiple contexts, which are sequentially executed within the same fpga, without violating the operation cycle of the system, in spite of the reconfiguration overhead. The solution is cost compatible with current PLC for complex applications and can reach better performance by exploration of the potential parallelism of control descriptions.
Vdd-programmablefpgas have been proposed recently to reduce fpga power, where Vdd levels can be customized for different circuit elements and unused circuit elements can be power-gated. In this paper, we first develo...
详细信息
ISBN:
(纸本)9781595930293
Vdd-programmablefpgas have been proposed recently to reduce fpga power, where Vdd levels can be customized for different circuit elements and unused circuit elements can be power-gated. In this paper, we first develop an accurate fpga power model and then design novel Vdd-programmable interconnect switches with minimum number of configuration SRAM cells. Applying our power model to placed and routed benchmark circuits, we evaluate Vddprogrammablefpga architecture using the new switches. The best architecture in our study uses Vdd-programmable logic blocks and Vdd-gateable interconnects. Compared to the baseline architecture similar to the leading commercial architecture, the best architecture reduces the minimal energy-delay product by 44.14% with 48% area overhead and 3% SRAM cell increase. Our evaluation results also show that LUT size 4 always gives the lowest energy consumption while LUT size 7 always leads to the highest performance for all evaluated architectures. Copyright 2005 acm.
Current fpga placement algorithms estimate the routability of a placement using architecture-specific metrics. The shortcoming of using architecture-specific routability estimates is limited adaptability. A placement ...
详细信息
Current fpga placement algorithms estimate the routability of a placement using architecture-specific metrics. The shortcoming of using architecture-specific routability estimates is limited adaptability. A placement algorithm that is targeted to a class of architecturally similar fpgas may not be easily adapted to other architectures. The subject of this paper is the development of a routability-driven architecture adaptive fpga placement algorithm called Independence. The core of the Independence algorithm is a simultaneous place-and-route approach that tightly couples a simulated annealing placement algorithm with an architecture adaptive fpga router (Pathfinder). The results of our experiments demonstrate Independence's adaptability to island-style and hierarchical fpga architectures. The quality of the placements produced by Independence is within 5% of the quality of VPR's placements and 17% better than the placements produced by HSRA's place-and-route tool. Further, our results show that Independence produces clearly superior placements on routing-poor island-style fpga architectures.
This paper presents a new universal test approach for fpga logic resources. It includes a new greedy configuration-generating algorithm, and a new fpga Configurable Logic Block (CLB) test model. The model is based on ...
详细信息
This paper presents a new universal test approach for fpga logic resources. It includes a new greedy configuration-generating algorithm, and a new fpga Configurable Logic Block (CLB) test model. The model is based on two directed graphs: a structure graph and a configuration graph, which convey the important information from the CLB gate level circuit to the greedy configuration- generating algorithm, so the algorithm can generate minimum the number of test configurations to achieve a given fault coverage. With this new approach, researchers can easily get test patterns optimized both in test time and fault coverage for different fpga architectures. At the end, we compare experiment results with other test approaches, and the results show test pattern from the new approach is even more efficient than pattern from manual optimization. It also proves that the approach can deal with different types of fpgas very well.
暂无评论