A multi-threaded microprocessor with a customisable instruction set, CUStomisable threaded ARchitecture (CUSTARD), is proposed. CUSTARD features include design space exploration and a compiler for automatic selection ...
详细信息
A multi-threaded microprocessor with a customisable instruction set, CUStomisable threaded ARchitecture (CUSTARD), is proposed. CUSTARD features include design space exploration and a compiler for automatic selection of custom instructions. Custom instructions, optimised for a specific application, accelerate frequently performed computations by implementing them as dedicated hardware. fieldprogrammable gate array implementations of CUSTARD are evaluated using media and cryptography benchmarks, and commercial MicroBlaze processor is compared. As low as 28% area overhead for four interleaved threads and up to 355% speedup over a processor without custom instructions are demonstrated.
We describe architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs. these are augmented by a new design methodology that uses pre-routed IP cores ...
详细信息
ISBN:
(纸本)9781424403127
We describe architectural enhancements to Xilinx FPGAs that provide better support for the creation of dynamically reconfigurable designs. these are augmented by a new design methodology that uses pre-routed IP cores for communication between static and dynamic modules and permits static designs to route through regions otherwise reserved. for dynamic modules. A new CAD tool flow to automate the methodology is also presented. the new tools initially target the Virtex-II, Virtex-II Pro and Virtex-4 families and are derived from Yjlinx's commercial CAD tools.
We propose a novel, high speed, low memory fully programable FPGA decoder architecture to decode quasi-cyclic LDPC codes. By performing optimizations at the code construction, algorithmic and architecture levels we ar...
详细信息
ISBN:
(纸本)9781424403127
We propose a novel, high speed, low memory fully programable FPGA decoder architecture to decode quasi-cyclic LDPC codes. By performing optimizations at the code construction, algorithmic and architecture levels we are able to achieve significant throughput and memory storage advantages over current FPGA decoder implementations. Our decoder employs the modified turbo decoding algorithm, to achieve a decoding throughput of 223Mbps for a framed length of 3200 bits whilst only consuming 71Kb of memory,using a Xilinx Virtex-4 architecture.
Domain-specific design flows can enable an efficient path to implementation, as well as making the design process intuitive and the designs reusable. When targeting FPGAs, there are few techniques in high level synthe...
详细信息
ISBN:
(纸本)9781424403127
Domain-specific design flows can enable an efficient path to implementation, as well as making the design process intuitive and the designs reusable. When targeting FPGAs, there are few techniques in high level synthesis that enable thorough exploration of the inherent flexibility of the FPGA fabric as an implementation medium. In this paper, we propose a new methodology, based on micro-coded data paths, that enables design space exploration of processing engine architectures implemented in programmablelogicthat range from a fixed finite state machine to a soft processor. As a use case, these processing engines can be embedded within programmablelogicthreads that are used to carry out network packet processing. We demonstrate the application of this methodology on a network address translation application, and show that micro-coded data paths indeed enable both human designers and automated tools to explore the design space in a structured way, thus exploiting the full potential of the FPGA technology.
this paper presents preliminary work exploring adaptive fieldprogrammable gate arrays (AFPGAs). An AFPGA is adaptative in the sense that the functionality of subcircuits placed on the chip can change in response to c...
详细信息
ISBN:
(纸本)9781424403127
this paper presents preliminary work exploring adaptive fieldprogrammable gate arrays (AFPGAs). An AFPGA is adaptative in the sense that the functionality of subcircuits placed on the chip can change in response to changes observed on certain control signals. We describe the high-level architecture which adds additional control logic and SRAM bits to a traditional FPGA to produce an AFPGA. We also describe a synthesis method that identifies and resynthesizes mutually exclusive pieces of logic so that they may share the resources available in an AFPGA. the architectural feature and its associated synthesis method helps reduce circuit size by 28% on average and up to 40% on select circuits.
FPGAs have become an attractive choice for scientific computing. In this paper, we propose a high performance design for LU decomposition, a key kernel in many scientific and engineering applications. Our design achie...
详细信息
ISBN:
(纸本)9781424403127
FPGAs have become an attractive choice for scientific computing. In this paper, we propose a high performance design for LU decomposition, a key kernel in many scientific and engineering applications. Our design achieves the optimal performance for LU decomposition using the available hardware resources. the design is parameterized. thus, it can be easily adapted to variousbardware constraints. Experimental results show that our design achieves high performance and offers good scalability. Our implementation on a Xilinx Virtex-II Pro XC2VPIOO achieves superior sustained floating-point performance over existing FPGA-based implementations and optimized libraries on the state-of-the-art processors.
the affective content of a video is defined as the expected amount and type of emotion that are contained in a video. Utilizing this affective content will extend the current scope of application possibilities. the di...
详细信息
ISBN:
(纸本)9781424403127
the affective content of a video is defined as the expected amount and type of emotion that are contained in a video. Utilizing this affective content will extend the current scope of application possibilities. the dimensional approach to representing emotion can play an important role in the development of an affective video content analyzer. the three basic affect dimensions are defined as valence, arousal and control [1]. this paper presents a novel FPGA-based system for modeling the arousal content of a video based on user saliency and film grammar. the design is implemented on a Xilinx Virtex-II xc2v6000 on board a RC300 board and it runs 25 times faster than a Pentium 4-based PC at 3.4 Ghz.
this tutorial describes the Why and How of the new 65-nm families of Virtex-5 FPGAs. It describes several aspects of the technology that affect speed, density, and power consumption. the basic device structure and pac...
详细信息
Block matching motion estimation takes a great part of the processing time for video encoding. To accelerate this process is must to reach real time video coding. the best motion vector is obtained by full-search bloc...
详细信息
ISBN:
(纸本)9781424403127
Block matching motion estimation takes a great part of the processing time for video encoding. To accelerate this process is must to reach real time video coding. the best motion vector is obtained by full-search block matching algorithm which has to be usually implemented by hardware. In recent years, several FPGA based designs have been proposed since these devices support high number of process elements in parallel mode. In this paper a survey, of recent architectures to perform the full-search block matching algorithm in FPGAs is presented. A further comparison on terms of frames per second reached, hardware cost in CLB slices and system frequency is presented.
this work shows a modular architecture based on FPGA's to solve the eigenvalue problem according to the Jacobi method. this method is able to solve the eigenvalues and eigenvectors concurrently. the main contribut...
详细信息
ISBN:
(纸本)9781424403127
this work shows a modular architecture based on FPGA's to solve the eigenvalue problem according to the Jacobi method. this method is able to solve the eigenvalues and eigenvectors concurrently. the main contribution of this work is the low execution time compared with other sequential algorithms, and minimal internal FPGA consumed resources, mainly due to the fact of using the CORDIC algorithm. Two CORDIC modules have been designed to solve the trigonometric operations involved. A parallel CORDIC architecture is proposed as it is the best option to compute the eigenvalues withthis method. Both CORDIC modules can work in rotation and vector mode. the whole system has been done in VHDL language, attempting to optimize the design.
暂无评论