Atherosclerotic disorders, such as peripheral artery disease (PAD), have a significant negative impact on patient outcomes. Inadequate treatment and poor detection rates can result in cardiovascular complications and ...
详细信息
A genetic algorithm-based design space exploration technique using parameterised cores is examined. A computer-aided design tool called SCBuild was developed which is capable of applying a genetic algorithm to a core&...
详细信息
A genetic algorithm-based design space exploration technique using parameterised cores is examined. A computer-aided design tool called SCBuild was developed which is capable of applying a genetic algorithm to a core's parameters, and generating hardware description language models of core variants. The tool can also compute estimates of a variant's area and critical path delay on a field-programmablegate array. Using this tool, several experiments were conducted using a soft-core processor with a large design space. It was concluded from these experiments that using a genetic algorithm to explore the design space of a parameterised core can help a designer make intelligent decisions regarding the assignment of values to the parameters of an embedded hardware platform.
THE AUTHORS PROPOSE AUGMENTING THE FPGA ARCHITECTURE WITH AN EMBEDDED NETWORK ON CHIP TO IMPLEMENT THE SYSTEM-LEVEL COMMUNICATION INFRASTRUCTURE AND MITIgate THE HARDWARE DESIGN CHALLENGES FACED BY CURRENT BUS-BASED I...
详细信息
THE AUTHORS PROPOSE AUGMENTING THE FPGA ARCHITECTURE WITH AN EMBEDDED NETWORK ON CHIP TO IMPLEMENT THE SYSTEM-LEVEL COMMUNICATION INFRASTRUCTURE AND MITIgate THE HARDWARE DESIGN CHALLENGES FACED BY CURRENT BUS-BASED INTERCONNECTS. WITH A FLEXIBLE INTERFACE BETWEEN THE NOC AND THE FPGA FABRIC, AN EMBEDDED NOC MAINTAINS CONFIGURABILITY AND SIMPLIFIES THE DISTRIBUTION OF I/O DATA THROUGHOUT THE CHIP, AND IS ALWAYS MORE ENERGY-EFFICIENT COMPARED TO CUSTOM BUSES CONFIGURED INTO THE FABRIC.
This paper presents a new design that implements the data-driven (i.e. dataflow) computation paradigm with intelligent memories. Also, a relevant prototype that employs FPGAs is presented for the support of intelligen...
详细信息
This paper presents a new design that implements the data-driven (i.e. dataflow) computation paradigm with intelligent memories. Also, a relevant prototype that employs FPGAs is presented for the support of intelligent memory structures. Instead of giving the CPU the privileged right to decide what instructions to fetch in each cycle (as is the case for control-flow CPUs), instructions in dataflow computers enter the execution unit on their own when they are ready to execute. This way, the application-knowledgeable algorithm, rather than the application-ignorant CPU, is in control. This approach could eventually result in outstanding performance and elimination of large numbers of redundant operations that plague current control-flow designs. Control-flow and dataflow machines are two extreme computation paradigms. In their pure form, the former machines follow an inherently sequential execution process while the latter are parallel in nature. The sequential nature of control-flow machines makes them relatively easy to implement compared to dataflow machines, which have to address a number of issues that are easily solved in the realm of the control-flow paradigm. Our dataflow design solves these issues at the intelligent memory level, separating the processor from dataflow maintenance tasks. It is shown that using intelligent memories with basic components similar to those of FPGAs produces a feasible approach. Expected improvements within the next few years in underlying intelligent memory and FPGA technologies will have the potential to make the effect of our approach even more dramatic. (C) 2002 Elsevier Science B.V. All rights reserved.
A survey of field-programmablegate Array (FPGA) architectures and the programming technologies used to customize them is presented. Programming technologies are compared on the basis of their volatility, size, parasi...
详细信息
A survey of field-programmablegate Array (FPGA) architectures and the programming technologies used to customize them is presented. Programming technologies are compared on the basis of their volatility, size, parasitic capacitance, resistance, and process technology complexity. FPGA architectures are divided into two constituents: logic block architectures and routing architectures. A classification of logic blocks based on their granularity is proposed and several logic blocks used in commercially available FPGA's are described. A brief review of recent results on the effect of logic block granularity on logic density and performance of an FPGA is then presented. Several commercial routing architectures are described in the context of a general routing architecture modeL Finally, recent results on the tradeoff between the flexibility of an FPGA routing architecture its routability and density are reviewed.
This paper addresses several issues involved for routing in field-programmable gate arrays (FPGAs) that have both horizontal and vertical routing channels, with wire segments of various lengths. Routing is studied by ...
详细信息
This paper addresses several issues involved for routing in field-programmable gate arrays (FPGAs) that have both horizontal and vertical routing channels, with wire segments of various lengths. Routing is studied by using CAD routing tools to map a set of benchmark circuits into FPGAs, and measuring the effects that various parameters of the CAD tools have on the implementation of the circuits. A two-stage routing strategy of global followed by detailed routing is used, and the effects of both of these CAD stages are discussed, with emphasis on detailed routing, We present a new detailed routing algorithm designed specifically for the types of routing structures found in the most recent generation of FPGAs, and show that the new algorithm achieves significantly better results than previously published FPGA routers with respect to the speed-performance of implemented circuits. The experiments presented in this paper address both of the key metrics for FPGA routing tools, namely the effective utilization of available interconnect resources in an FPGA, and the speed-performance of implemented circuits. The major contributions of this research include the following: 1) we illustrate the effect of a global router on both area-utilization and speed-performance of implemented circuits, 2) experiments quantify the impact of the detailed router cost functions on area-utilization and speed-performance, 3) we show the effect on circuit implementation of dividing multi-point nets in a circuit being routed into point-to-point connections, and 4) the paper illustrates that CAD routing tools should account for both routability and speed-performance at the same time, not just focus on one goal.
Device speed, or timing, is a critical aspect of system design. A realistic estimate of the achievable system speed is often required early in the design phase to avoid waste of valuable design time. System speed, of ...
详细信息
Device speed, or timing, is a critical aspect of system design. A realistic estimate of the achievable system speed is often required early in the design phase to avoid waste of valuable design time. System speed, of course, depends on the operation of all system components. This application note describes how to estimate the timing constraints of a field-programmablegate array (FPGA) design. However, the FPGA is only one component within the design: other devices also affect the system speed.
We present a general technology-mapping methodology (TULIP) for field-programmable gate arrays (FPGAs) that can yield optimal results, and is applicable to any FPGA with a logic block composed of lookup tables (LUTs)....
详细信息
We present a general technology-mapping methodology (TULIP) for field-programmable gate arrays (FPGAs) that can yield optimal results, and is applicable to any FPGA with a logic block composed of lookup tables (LUTs). We introduce the concept of a virtual switch to model the internal connections of a logic block with multiple LUTs;each configuration of virtual switches is called a multiple-LUT block (MLB). A logic block can be precisely defined by a small but complete set of representative configurations called an MLB basis. The MLB bases for various commercial FPGA families are demonstrated. Given a logic block represented by its MLB basis, technology mapping is precisely formulated as a graph-covering problem, which is transformed into a mixed integer-linear programming (MILP) optimization problem in order to achieve our optimality and generality objectives. The MILP model is solved using a general-purpose MILP solver tool. The results of using TULIP for mapping some ISCAS-85 benchmark circuits to a variety of logic blocks are presented. Circuits of a few hundred gates can be mapped directly in a few minutes. To map larger circuits to complex logic blocks, some approximation techniques are proposed based on partitioning the input circuit and simplifying the MLB basis. We show that these approximations result in close-to-optimal mappings of the benchmark circuits.
The flexibility of field-programmable gate arrays (FPGAs) encourages design reuse and can greatly enhance the upgradability of digital systems. This flexibility is particularly useful in the design of highly flexible ...
详细信息
ISBN:
(纸本)9781424416424
The flexibility of field-programmable gate arrays (FPGAs) encourages design reuse and can greatly enhance the upgradability of digital systems. This flexibility is particularly useful in the design of highly flexible video encoding systems that can accommodate a multitude of existing standards as well as the rapid emergence of new standards. In this paper, we investigate the use of FPGAs in the design of a highly scalable Variable Block Size Motion Estimation (VBSME) architecture for the H.264/AVC video encoding standard. The scalability of the architecture allows one to incorporate the system into low cost single FPGA solutions for low resolution encoding applications as well as into high performance multi-FPGA solutions targeting high-resolution video encoding applications. To overcome the performance gap between FPGAs and Application Specific Integrated Circuits (ASICs), our algorithm intelligently increases its parallelism as the design scales while minimizing the use of memory bandwidth. The core computing unit of the architecture is implemented on FPGAs and its performance is reported in this paper. It is shown that the computing unit is able to achieve real-time 40 fps performance for 640x480 resolution VGA video while incurring only 4% device utilization on a Xilinx XC5VLX330 (Virtex-5) FPGA. With 8 computing units (at 36% device utilization), the architecture is able to achieve real-time 45 fps performance for encoding full 1920x1088 progressive HDTV video.
This paper presents a novel method for implementing massive artificial neural networks (ANN) with field-programmable gate arrays (FPGA). Because of the sequential nature of programs, the execution of large ANNs in sof...
详细信息
This paper presents a novel method for implementing massive artificial neural networks (ANN) with field-programmable gate arrays (FPGA). Because of the sequential nature of programs, the execution of large ANNs in software is inefficient. On the other hand, in FPGA devices that consist of a large number of programmable circuits, the nodes of ANN may be executed in parallel. This provides for higher computational rates and a greater degree of robustness or fault tolerance than in conventional computers. The main goal of these studies was to implement as many neurons as possible on a single FPGA device, without giving up the minimal execution times. In the proposed solution, each neuron is implemented by a single multiplier. For the nonlinear behaviour of a neuron, the activation function is approximated using several linear segments. In order to overcome the limited memory capacity of FPGA, external memory is employed Execution of ANN is controlled by an embedded soft processor.
暂无评论