As the complexity of designing electronic systems continues to grow, the most commonly used solution has been to move the design process to higher levels of abstraction via software tools. In this work we present one ...
详细信息
ISBN:
(纸本)9781450333153
As the complexity of designing electronic systems continues to grow, the most commonly used solution has been to move the design process to higher levels of abstraction via software tools. In this work we present one such tool that can be used to automatically generate custom processors and systems-on-chip (SoC) from C source code or application binary files, with no requirement for the user to understand any of the underlying hardware systems. This tool also does not call for the application to be profiled for any 'hot spots' as a prerequisite for generating the custom processor. We use the toolkit to generate two types of custom processors; the area-optimized processors and the performance-optimized processors. We study the resource utilization of the custom processors and compare them with those predicted by the core density model. We find that the performance-optimized processor results are as predicted by the core density model.
Latency insensitive communication oers many potential benets for FPGA designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to u...
详细信息
ISBN:
(纸本)9781450326711
Latency insensitive communication oers many potential benets for FPGA designs, including easier timing closure by enabling automatic pipelining, and easier interfacing with embedded NoCs. However, it is important to understand the costs and trade-os associated with any new design style. This paper presents optimized implementations of latency insensitive communication building blocks, quanties their overheads in terms of area and frequency, and provides guidance to designers on how to generate high-speed and areae cient latency insensitive systems.
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the...
详细信息
ISBN:
(纸本)9781450326711
A novel Digital to Analog Converter (DAC) modulates the overall power consumption of an FPGA by disabling/enabling short circuits programmed into the interconnect. The power pin of the FPGA serves as the output of the DAC. The DAC achieves high linearity and can be used to implement applications in communications, security, etc. The shortcircuit-based DAC consumes 1/3 the area of an alternative shift-register-based DAC that is presented for the sake of comparison.
Today's SRAM-based FPGAs provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM...
详细信息
ISBN:
(纸本)9781450326711
Today's SRAM-based FPGAs provide a reach set of computing resources which makes them attractive in demanding and critical application domains, such as avionics and space. Unfortunately, their high reliance on SRAM configuration memory arise reliability issues due to the single-event upsets (SEUs). Considering the criticality of these applications, the vulnerability analysis of FPGA designs to SEUs becomes essential part of the design flow. In this context, we present an open-source framework for the soft error vulnerability analysis of Xilinx FPGA devices. The proposed framework will allow researchers to evaluate their reliability-aware CAD algorithms and estimate the soft error susceptibility of the designs at early stages of the implementation flow for the latest Xilinx architectures.
A mixed-grained reconfigurable computing platform targeting multiple-standard video decoding is proposed in this paper. The platform integrates eight coarse-grained Reconfigurable Processing Units (RPUs), each of whic...
详细信息
ISBN:
(纸本)9781450333153
A mixed-grained reconfigurable computing platform targeting multiple-standard video decoding is proposed in this paper. The platform integrates eight coarse-grained Reconfigurable Processing Units (RPUs), each of which consists of 16×16 multi-functional Processing Elements (PEs) and are implemented in TSMC 65 nm technology and two Altera Stratix IV EP4SE820 FPGAs. By exploiting dynamic reconfiguration of the RPUs and static reconfiguration of the FPGAs, the proposed platform achieves scalable performances and cost trade-offs to support a variety of video coding standards, including H.264, MPEG-2, AVS and HEVC. Two types of platform configuration are tested in this work. One configuration utilizes two RPUs and targets multiple-standard high-definition (HD) video decoding, while the other utilizes only one RPU, which works under a lower frequency and targets at standard resolution (SD) decoding. The HD configuration can decode 1920×1080 H.264 video streams at 30 frames per second (fps) under 200 MHz and 1920×1080 HEVC video streams at 30 fps under 236 MHz. It achieves a 25% performance gain over an industrial coarse-grained reconfigurable processor for H.264 decoding, and a 3.85× performance boosts over the Intel i5 general-purpose CPU for HEVC decoding.
Emerging nonvolatile memories (NVMs) have a potential to overcome the issues in the conventional static random-access memory (SRAM) based reconfigurable logic cell arrays (RLCAs). Replacing a CMOS switch element compo...
详细信息
ISBN:
(纸本)9781450333153
Emerging nonvolatile memories (NVMs) have a potential to overcome the issues in the conventional static random-access memory (SRAM) based reconfigurable logic cell arrays (RLCAs). Replacing a CMOS switch element composed of a SRAM and a pass transistor by a NVM reduces chip size. And non-volatility reduces the stand-by power. More importantly, the compactness of NVM allows fine-grain logic cells (small cluster size), which advantageously enables a highly efficient cell usage, resulting in compact circuit for applications. In this paper, we investigate the fine-grain cell architecture using atom switch which is one of the NVMs. We evaluate the effect of the cluster size and the segment length on the atom-switch-based RLCA to confirm the optimal point considering area-delay product. Cluster size is optimized to be 4, which is smaller than that in the conventional SRAM- and multiplexer-based RLCA. This optimization is originated from the fact that the inter-delay among clusters is only twice of the intra-delay in cluster for atom-switch-based RLCA with routing block formed by crossbar switches because of very small capacitance and resistance of atom switches. On the other hand, the segment length is optimized to be 4, which is the same as that in the conventional SRAM- and multiplexer-based RLCA.
Timing margins in FPGAs are already significant and as process scaling continues they will have to grow to guarantee operation under increased variation. Margins enforce worst-case operation even in typical conditions...
详细信息
ISBN:
(纸本)9781450326711
Timing margins in FPGAs are already significant and as process scaling continues they will have to grow to guarantee operation under increased variation. Margins enforce worst-case operation even in typical conditions and result in devices operating more slowly and consuming more energy than necessary. This paper presents a method of dynamic voltage and frequency scaling that uses online slack measurement to determine timing headroom in a circuit while it is operating and scale the voltage and/or frequency in response. Doing so can significantly reduce power consumption or increase throughput with a minimal overhead. The method is demonstrated on a number of benchmark circuits under a range of operating conditions, constraints and optimisation targets.
Frequent item counting is one of the most important operations in time series data mining algorithms, and the space saving algorithm is a widely used approach to solving this problem. With the rapid rising of data inp...
详细信息
ISBN:
(纸本)9781450326711
Frequent item counting is one of the most important operations in time series data mining algorithms, and the space saving algorithm is a widely used approach to solving this problem. With the rapid rising of data input speeds, the most challenging problem in frequent item counting is to meet the requirement of wire-speed processing. In this paper, we propose a streaming oriented PE-ring framework on FPGA for counting frequent items. Compared with the best existing FPGA implementation, our basic PE-ring framework saves 50% lookup table resources cost and achieves the same throughput in a more scalable way. Furthermore, we adopt SIMD-like cascaded filter for further performance improvements, which outperforms the previous work by up to 3.24 times in some data distributions.
Geometric algebra (GA) is a powerful and versatile mathematical tool which helps to intuitively express and manipulate complex geometric relationships. It has recently been used in engineering problems such computer g...
详细信息
ISBN:
(纸本)9781450333153
Geometric algebra (GA) is a powerful and versatile mathematical tool which helps to intuitively express and manipulate complex geometric relationships. It has recently been used in engineering problems such computer graphics, machine vision, robotics, among others. The problem with GA in its numeric version is that it requires many arithmetic operations, and the length of the input vectors is unknown until runtime in a generic architecture operating over homogeneous elements. Few works in hardware architectures for GA were developed to improve the performance in GA applications. In this work, a hardware architecture of a unit for GA operations (geometric product) for FPGA is presented. The main contribution of this work is the use of parallel memory arrays with access conflict avoidance for dealing with the issue of unknown length of input/output vectors, the intention is to reduce memory wasted when storing the input and output vectors. In this first stage of the project, we have implemented only a single access function (fixed-length) in the memory array in order to test the core of geometric product. In future works we will implement a full set of access functions with different lengths and shapes. In this work, only the simulations are presented; in the future, we will also present the experimental results
暂无评论