Although runtime dynamic reconfiguration of the FPGA devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no stra...
详细信息
Although runtime dynamic reconfiguration of the FPGA devices has been an issue of the last decade, it has yet to achieve general recognition by the design community. The reasons for this are clear;there exists no straightforward design methodology, and the partitioning and CAD tool support is poor. This paper presents general concepts implemented in a placement and routing tool that provides an environment where designs that are partially and dynamically reconfigurable can be processed in order to be implemented on FPGAs that support this technology, such as the Armel AT40K and AT94K series. The function of the tool is demonstrated on a simple real-world example.
Dynamically Reconfigurable Systems (DRS) offer a very interesting alternative for embedded digital systems design. Tasks scheduling within a reconfigurable environment allows the development of systems with better exe...
详细信息
Dynamically Reconfigurable Systems (DRS) offer a very interesting alternative for embedded digital systems design. Tasks scheduling within a reconfigurable environment allows the development of systems with better execution performance, chip area economy and lower power consumption. This paper describes a Petri Net based methodology for the design of dynamically reconfigurable systems, where tasks scheduling has as prime objective the best temporal performance of the overall application. The methodology includes the generation of an embedded controller supporting the scheduling process in the target architecture.
FPGA-based designs are more susceptible to single-event upsets (SEUs) compared to ASIC designs. Soft error rate (SER) estimation is a crucial step in the design of soft error tolerant schemes to balance reliability, p...
详细信息
ISBN:
(纸本)9781595930293
FPGA-based designs are more susceptible to single-event upsets (SEUs) compared to ASIC designs. Soft error rate (SER) estimation is a crucial step in the design of soft error tolerant schemes to balance reliability, performance, and cost of the system. Previous techniques on FPGA SER estimation are based on time-consuming fault injection and simulation methods. In this paper, we present an analytical approach to estimate the failure rate of designs mapped into FPGAs. Experimental results show that this technique is orders of magnitude faster than fault injection method while is very accurate. We also present a high-reliable low-cost mitigation technique which can significantly improve the availability of FPGA-based designs. This technique is able to tolerate SEUs in both user and configuration bits of mapped designs. Copyright 2005 acm.
Modern FPGA architectures provide ample routing resources so that designs can be routed successfully. The routing architecture is designed to handle versatile connection configurations. However, providing such great f...
详细信息
Modern FPGA architectures provide ample routing resources so that designs can be routed successfully. The routing architecture is designed to handle versatile connection configurations. However, providing such great flexibility comes at a high cost in terms of area, delay and power. We propose a new FPGA routing architecture1 that utilizes a mixture of hard-wired and traditional flexible switches. The result is 24% reduction in leakage power consumption, 7% smaller area and 24% shorter delays, which translates to 30% increase in clock frequency. Despite the increase in clock speeds, the overall power consumption is reduced by 8%. Copyright 2005 acm.
Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point...
详细信息
ISBN:
(纸本)9781595930293
Large, high density FPGAs with high local distributed memory bandwidth surpass the peak floating-point performance of high-end, general-purpose processors. Microprocessors do not deliver near their peak floating-point performance on efficient algorithms that use the Sparse Matrix-Vector Multiply (SMVM) kernel. In fact, it is not uncommon for microprocessors to yield only 10-20% of their peak floating-point performance when computing SMVM. We develop and analyze a scalable SMVM implementation on modern FPGAs and show that it can sustain high throughput, near peak, floating-point performance. For benchmark matrices from the Matrix Market Suite we project 1.5 double precision Gflops/FPGA for a single Virtex II 6000-4 and 12 double precision Gflops for 16 Virtex Us (750Mflops/FPGA). Copyright 2005 acm.
FPGAs are witnessing a big increase in their applications, especially with the introduction of state-of-the-art FPGAs using nanometer technologies. This has been accompanied with a big increase in power dissipation in...
详细信息
FPGAs are witnessing a big increase in their applications, especially with the introduction of state-of-the-art FPGAs using nanometer technologies. This has been accompanied with a big increase in power dissipation in FPGAs, which forms a road block to the integration of FPGAs in several hand-held applications. Motivated by the increase in the percentage of leakage power dissipation to the total power dissipation in modern technologies, this work presents a complete CAD flow to mitigate leakage power dissipation in FPGAs. The algorithm is based on a FPGA architecture that employs multi-threshold CMOS technology. The flow is based on the VPR flow and it aims to pack and place logic blocks that exhibit similar idleness close to each other so that they can be turned off during their idle time. The flow is tested with a CMOS 0.13μm dual-vth technology and achieved an average power saving of 22%.
FPGAs provide a speed advantage in processing for embedded systems, especially when processing is moved close to the sensors. Perhaps the ultimate embedded system is a neural prosthetic, where probes are inserted into...
详细信息
FPGAs provide a speed advantage in processing for embedded systems, especially when processing is moved close to the sensors. Perhaps the ultimate embedded system is a neural prosthetic, where probes are inserted into the brain and recorded electrical activity is analyzed to determine which neurons have fired. In turn, this information can be used to manipulate an external device such as a robot arm or a computer mouse. To make the detection of these signals possible, some baseline data must be processed to correlate impulses to particular neurons. One method for processing this data uses a statistical clustering algorithm called Expectation Maximization, or EM. In this paper, we examine the EM clustering algorithm, determine the most computationally intensive portion, map it onto a reconfigurable device, and show several areas of performance gain.
This paper proposes an integrated framework for the high level design of high performance signal processing algorithms' implementations on FPGAs. The framework emerged from a constant need to rapidly implement inc...
详细信息
This paper proposes an integrated framework for the high level design of high performance signal processing algorithms' implementations on FPGAs. The framework emerged from a constant need to rapidly implement increasingly complicated algorithms on FPGAs while maintaining the high performance needed in many real time digital signal processing applications. This is particularly important for application developers who often rely on iterative and interactive development methodologies. The central idea behind the proposed framework is to dynamically integrate high performance structural hardware description languages with higher level hardware languages in other to help satisfy the dual requirement of high level design and high performance implementation. The paper illustrates this by integrating two environments: Celoxica's Handel-C language, and HIDE, a structural hardware environment developed at the Queen's University of Belfast.
Even with HiCuts algorithm, which is one of the most effective algorithms for packet classification, the on-line searching for each input packet still consumes the main CPU a large amount of computation resource if it...
详细信息
ISBN:
(纸本)9781595930293
Even with HiCuts algorithm, which is one of the most effective algorithms for packet classification, the on-line searching for each input packet still consumes the main CPU a large amount of computation resource if it is fulfilled by software. An effective alternative is to use a hardware co-processor to realize the on-line searching. Based on the principle of HiCuts algorithm, the architecture design of a hardware on-line searching co-processor with an FPGA is presented in this paper. Especially, mapping the decision tree and linear search in each leaf node to the memory data structure is described in detail. Benefiting from multiple pipeline structure, there are a total of 12 searching engines working parallel to achieve very high searching speed (8M packet heads/second). The simulation test results show a useful guide for optimization of off-line pre-processing and the co-processor design.
The purpose of this paper is to detail the method and findings of an architectural exploration of mixed granularity fieldprogrammablegatearrays (FPGAs). The work carried out for the purposes of this study involves ...
详细信息
The purpose of this paper is to detail the method and findings of an architectural exploration of mixed granularity fieldprogrammablegatearrays (FPGAs). The work carried out for the purposes of this study involves the creation of an analytical framework within which a set of benchmark circuits can be studied. The idea is to maximise the performance over all benchmark circuits by choosing an optimal set of silicon cores to be placed within a given area constraint. When connected with flexible configurable routing, these cores should together be capable of performing any one of the benchmark circuits. In this paper the problem is cast as a formal optimisation, and solved using existing optimisation tools. Any multiplication or memory operation is allowed to be implemented either by configuring fine-grain resources, or by using specialised functional units such as those found in a Xilinx Virtex 2 FPGA. The design space is explored by examining the tradeoffs between area, speed and flexibility. The architectures generated are contrasted to commercial architectures with fixed ratios of functional units and, in addition, a sensitivity analysis is performed to see how the results are affected by the archtectural parameters of the problem.
暂无评论