this work shows a modular architecture based on FPGA's to solve the eigenvalue problem according to the Jacobi method. this method is able to solve the eigenvalues and eigenvectors concurrently. the main contribut...
详细信息
ISBN:
(纸本)9781424403127
this work shows a modular architecture based on FPGA's to solve the eigenvalue problem according to the Jacobi method. this method is able to solve the eigenvalues and eigenvectors concurrently. the main contribution of this work is the low execution time compared with other sequential algorithms, and minimal internal FPGA consumed resources, mainly due to the fact of using the CORDIC algorithm. Two CORDIC modules have been designed to solve the trigonometric operations involved. A parallel CORDIC architecture is proposed as it is the best option to compute the eigenvalues withthis method. Both CORDIC modules can work in rotation and vector mode. the whole system has been done in VHDL language, attempting to optimize the design.
Process variation affecting timing and power is an important issue for modern integrated circuits in nanometer technologies. FPGAs are similar to ASICs in their susceptibility to these issues, but face unique challeng...
详细信息
ISBN:
(纸本)9781424403127
Process variation affecting timing and power is an important issue for modern integrated circuits in nanometer technologies. FPGAs are similar to ASICs in their susceptibility to these issues, but face unique challenges in that critical paths are unknown at test time. this paper presents the first in-depth study on applying statistical timing analysis with cross-chip and on-chip variations to speed-binning and guard-banding in FPGAs. Considering the uniqueness of re-programmability in FPGAs, we quantify the effects of timing-model with guard-banding and speed-binning on statistical performance and timing yield. We also develop a new variation aware placement, which is the first statistical algorithm for FPGA layout and reduces yield loss by 3.4X with guard-banding and 25X with speed-binning for MCNC and QUIP designs.
Both a FPGA implementation of a rodent's tactile sensory system and a neural FPGA based hardware processor, mimicking the brainstem behaviour are presented. they have principally been designed using biological con...
详细信息
ISBN:
(纸本)9781424403127
Both a FPGA implementation of a rodent's tactile sensory system and a neural FPGA based hardware processor, mimicking the brainstem behaviour are presented. they have principally been designed using biological considerations. the two systems are being ported on a single FPGA platform and will ultimately be embedded on a mobile robot, which will operate in real world environments for object recognition and surface textural discrimination purposes.
Optical flow computation has been extensively used for object motion estimation in image sequences. However, the results obtained by most optical flow techniques are as accurate as computationally intensive due to the...
详细信息
ISBN:
(纸本)9781424403127
Optical flow computation has been extensively used for object motion estimation in image sequences. However, the results obtained by most optical flow techniques are as accurate as computationally intensive due to the large amount of data involved. A new strategy for image sequence processing has been developed;pixels of the image sequence that significantly change fire the execution of the operations related to the image processing algorithm. the data reduction achieved withthis strategy allows a significant optical flow computation speed up. Furthermore, FPGAs allow the implementation of a custom data-flow architecture specially suited for this strategy. the bases of the change-driven image processing are presented, as well as the hardware custom implementation.
A fieldprogrammable Gate Array (FPGA), when used as a platform for implementing special-purpose computing architectures, offers the potential for increased functional parallelism over the alternative approach of soft...
详细信息
ISBN:
(纸本)9781424403127
A fieldprogrammable Gate Array (FPGA), when used as a platform for implementing special-purpose computing architectures, offers the potential for increased functional parallelism over the alternative approach of software running on a general-purpose microprocessor. However, the increasing disparity between the logic speed and density of a state-of-the-art FPGA versus a state-of-the-art microprocessor has already begun to negate the benefits of this increased functional parallelism for all but a limited set of applications. We believe that the solution to this problem is to construct distributed multi-FPGA architectures to aggregate the parallelism of multiple FPGAs. Such a system would require a high-capacity interconnect and thus we propose arranging the FPGAs onto a scalable direct network. this strategy requires each FPGA to contain an integrated router that must share the logic fabric withthe application logic. In this paper, we propose a novel routing technique that can significantly boost such a network's capacity and be implemented into compact and efficient routers. We begin with an existing lightweight routing algorithm and augment it with a novel technique called predictive load balancing, where routers collect information about the blocking behavior on their output ports and use this information when making routing decisions.
this paper presents an implementation of a high-performance network application layer parser in FPGAs. At the core of the architecture resides a pattern matcher and a parser. the pattern matcher scans for patterns in ...
详细信息
ISBN:
(纸本)9781424403127
this paper presents an implementation of a high-performance network application layer parser in FPGAs. At the core of the architecture resides a pattern matcher and a parser. the pattern matcher scans for patterns in high-speed streaming TCP data streams. the parser core augments each pattern found with semantic information determined from the patterns location within the data stream. the packet payload parser can provide a higher level of understanding of a data stream for many network applications. Such applications include high performance XML parsers, content-based/aware routers, and others. Additionally, a TCP processor allows stateful packet payload parsing of up to 8 million simultaneous TCP flows. the payload parser has been implemented in a Xilinx Virtex E 2000 FPGA on the field-programmable Port Extender platform. the parsing module runs,at 200 MHz and parse raw data at 6.4 Gbps. the payload parser, integrated withthe TCP processor, runs at 100 MHz for a throughput of 3.2 Gbps.
In the last decade, skin color has proven to be a useful cue for recognition and tracking of face and hand, and skin color segmentation has become the first step in several processing tasks. Withthe aim of overcoming...
详细信息
ISBN:
(纸本)9781424403127
In the last decade, skin color has proven to be a useful cue for recognition and tracking of face and hand, and skin color segmentation has become the first step in several processing tasks. Withthe aim of overcoming the weak points that existing software solutions show in real time mobile applications, we propose an FPGA-based implementation of a skin classifier. the skin classification algorithm and its hardware architecture are herein described. Results in terms of classification performance, processing rate and hardware resources used are presented.
Computing applications in FPGAs are commonly built from repetitive structures of computing and/or memory elements. In many cases, application performance depends on the degree of parallelism - ideally, the most that w...
详细信息
ISBN:
(纸本)9781424403127
Computing applications in FPGAs are commonly built from repetitive structures of computing and/or memory elements. In many cases, application performance depends on the degree of parallelism - ideally, the most that will fit into the fabric of the FPGA being used. Several factors complicate determination of the largest structure that will fit the FPGA: arrays that grow nonlinearly and in uneven step sizes, coupled structures that grow in different polynomial order, multiple design parameters controlling different aspects of the computing structure, and interlocked usage of different hardware resources. Combined with resource usage that depends on application-specific data elements and arithmetic details, these factors defeat any simple approach for scaling the computing structures up to the FPGA's capacity. We present a formal analysis of maximizing FPGA utilization, with adaptations that simplify the optimization problem. We also report on design tools containing extensions that support automated sizing of FPGA-based computation arrays.
this paper presents a novel tool flow combining rewriting logic with hardware synthesis. It enables the automated generation of synthesizable VHDL code from mathematical equations and the quick generation of functiona...
详细信息
ISBN:
(纸本)9781424403127
this paper presents a novel tool flow combining rewriting logic with hardware synthesis. It enables the automated generation of synthesizable VHDL code from mathematical equations and the quick generation of functionally equivalent alternative implementations. the simple but powerful semantics of rewriting logic provide a natural mechanism for manipulating algebraic expressions, using a high-level of abstraction which is afterwards automatically converted into lower levels of abstraction. the design flow is validated by generating polynomial approximations for arbitrary continuous functions. the polynomial generation process is completely parameterized regarding polynomial degree, number representation parameters, word width and polynomial evaluation approaches. Different functionally equivalent implementations for the resulting polynomial approximations were generated and synthesized for a Virtex4 device.
this paper presents Archlog, a language and framework for designing multiprocessor architectures in the logic programming domain. Our goal is to enable application developers in areas such as machine learning and cogn...
详细信息
ISBN:
(纸本)9781424403127
this paper presents Archlog, a language and framework for designing multiprocessor architectures in the logic programming domain. Our goal is to enable application developers in areas such as machine learning and cognitive robotics to produce high-performance designs for reconfigurable devices, without detailed knowledge of hardware development. the Archlog framework provides a high level of abstraction, enabling rapid system generation while supporting high performance. In this paper we present the Archlog language and its library-based compilation framework, which makes use of a customisable logic programming processor. the system generates multiple designs, with different trade-offs in the use of reconfigurable logic and embedded memories. An implementation of a multiprocessor for the machine learning system Progol on a 40MHz XC2V6000 FPGA is 10 times faster than a 2GHz Pentium 4 processor.
暂无评论