We present a custom hardware architecture for fast heavy hitter detection in large data streams. The architecture probabilistically estimates the frequency of each element in the data stream using the Countmin-CU sket...
详细信息
ISBN:
(纸本)9781538673768
We present a custom hardware architecture for fast heavy hitter detection in large data streams. The architecture probabilistically estimates the frequency of each element in the data stream using the Countmin-CU sketch with the H3 family of hash functions. The sketch is stored in on-chip memory, and the architecture exploits the parallelism available in the data by simultaneously processing each row of the sketch. The hash functions map each element to a set of counters on the sketch, and the sketch increments the counters that hold the minimum value, which corresponds to the estimated frequency of the element. The hash functions and sorting network are implemented in hardware as fully pipelined circuits, in order to maximize their operating clock frequency. We show a prototype of the architecture running on a Xilinx Kintex-7 XC7K325T FPGA operating with a 300MHz clock, which can process a stream of 3,982,496 32-bit elements and detect the heavy hitters with an 4x16,384-element sketch in 13.27ms, achieving a speedup of 768 compared to a modern desktop computer.
This paper describes a new high speed combined Octal Serial Peripheral Interface (SPI) and HyperBus flash memory host controller, which can work in a mixed single-datarate/double-data-rate (SDR/DDR) mode. The proposed...
详细信息
ISBN:
(纸本)9781728111902
This paper describes a new high speed combined Octal Serial Peripheral Interface (SPI) and HyperBus flash memory host controller, which can work in a mixed single-datarate/double-data-rate (SDR/DDR) mode. The proposed controller can support SPI, Dual/Quad/Octal SPI, and HyperBus protocols. The operation frequency ranges from 5 up to 200MHz for SDR/DDR write and read operations, which can make systems boot from flash at the low frequency and run software at the high frequency. The controller features an efficient calibration method for maximizing margin for receiving data. This memory controller prototype is designed and implemented on Xilinx Zynq-7030 field-programmable gate array (FPGA).
Hummingbird is an ultra-lightweight cryptography targeted for resource-constrained devices such as RFID tags, smart cards and sensor nodes. It has been implemented across different target platforms. In this paper, we ...
详细信息
ISBN:
(纸本)9781612849720
Hummingbird is an ultra-lightweight cryptography targeted for resource-constrained devices such as RFID tags, smart cards and sensor nodes. It has been implemented across different target platforms. In this paper, we present two different FPGA-based implementations for both throughput-oriented (TO) and area-oriented (AO) Hummingbird Cryptography (HC). The throughput-oriented design is optimized for operation speed while the area-oriented design consumes smaller area resource usage. Both proposed designs have been implemented on a Xilinx low-cost Spartan-3 XC3S200 FPGA. When compared with existed methods, the results from the proposed designs show that our designs cost less FPGA slices while the same throughput can be obtained. The proposed architectures are designed to best suit for adding customizable security to embedded control systems.
Digital signal processing (DSP) is nowadays a key enabling technology in coherent optical transmission systems, triggering an increasing research effort on the development and optimization of DSP subsystems. The first...
详细信息
ISBN:
(纸本)9781467378802
Digital signal processing (DSP) is nowadays a key enabling technology in coherent optical transmission systems, triggering an increasing research effort on the development and optimization of DSP subsystems. The first DSP development step is often based on the use of simulation tools and offline processing of experimental data. Complementarily, real-time implementation is a critical implementation step to assess the performance and feasibility of advanced DSP subsystems. In this work, we address both the offline and real-time stages of development, supported by an experimental optical testbed. Real-time demonstration is enabled by an FPGA processing platform and 1.25 Gsample/s analog-to-digital conversion.
As technology continues to shrink, leakage power becomes an important issue for modern FPGAs. In this paper, we address the leakage issue of partially dynamical reconfigurable FPGAs. We focus on eliminating leakage wa...
详细信息
ISBN:
(纸本)9781595937094
As technology continues to shrink, leakage power becomes an important issue for modern FPGAs. In this paper, we address the leakage issue of partially dynamical reconfigurable FPGAs. We focus on eliminating leakage waste due to the delay between reconfiguration and task execution. We propose a post-placement leakage-aware scheduling algorithm that refines a placement generated by a performance-driven scheduler such that leakage waste is minimized and performance is not sacrificed. Experimental results on real and synthetic designs demonstrate the effectiveness and efficiency of our algorithm on leakage optimization.
Embedded systems are built from various hardware components and execute software on one or more microcontroller units (MCU). These MCUs usually contain a fixed integrated circuit, thus disallowing modifications to the...
详细信息
ISBN:
(纸本)9781665427036
Embedded systems are built from various hardware components and execute software on one or more microcontroller units (MCU). These MCUs usually contain a fixed integrated circuit, thus disallowing modifications to their logic at runtime. While this keeps the instruction set architecture (ISA) fixed as well, it leaves the software as the only flexible part in the system. But what if the MCU logic could be easily changed at runtime in order to fix bugs or if the ISA could be extended on-the-fly in order to introduce application-specific instructions and features on demand? This work demonstrates a concept for introducing more hardware flexibility through application-specific MCU modifications. Therefore, the MCU is implemented as a soft core on a field-programmable gate array (FPGA) and we reconfigure its logic with support of the operating system (OS) running on it. The reconfiguration happens on-the-fly, so no interruption of the application code or even a system restart is required. Therefore, (i) the MCU pipeline is specially designed for extensibility by new instructions, and (ii) the FPGA is selected to support partial self-reconfiguration of its logic cells at runtime. As long as an instruction is not yet part of the ISA, the OS supports its emulation to provide a consistent interface for applications. Apart, no special compiler support is required, but the application must provide either the emulation code or a hardware description for adding the required logic. For a proof of concept, we use a RISC-V based MCU on a Xilinx Artix-7 FPGA and for evaluating the general benefit of our approach we use an algorithm that is costly when executed with the original ISA but fast with application-specific instructions added at runtime. The experimental evaluation also shows that the on-the-fly hardware update does not disrupt or compromise the software execution flow.
In this paper, a novel hardware-efficient integrated cochlear model, whose nonlinear dynamics is described by ergodic sequential logics, is presented. It is shown that the presented cochlear model can reproduce combin...
详细信息
ISBN:
(纸本)9781665451093
In this paper, a novel hardware-efficient integrated cochlear model, whose nonlinear dynamics is described by ergodic sequential logics, is presented. It is shown that the presented cochlear model can reproduce combination tone generation of a mammalian cochlea, which is one of the most typical nonlinear sound processing functions. In addition, the presented model is implemented by a field-programmable gate array (FPGA) and its operations are verified by experiments. It is then shown that the presented model consumes much less hardware resources and much less power compared to a standard ordinary differential equation (ODE) model of cochlea.
We present an algorithm for multimodal registration between short-wave and long-wave infrared images. We use a histogram of oriented gradients to extract features in each image, the Chi-square distance to match the fe...
详细信息
ISBN:
(纸本)9781538673768
We present an algorithm for multimodal registration between short-wave and long-wave infrared images. We use a histogram of oriented gradients to extract features in each image, the Chi-square distance to match the features, a projective transformation to map the objective image onto the reference system, and bilinear interpolation to obtain the pixel values. We designed a heterogeneous embedded system that combines a custom hardware accelerator to perform coordinate transformation and pixel value interpolation, and a programmable processor core to perform feature extraction, feature association, and to compute the transformation parameters. We implemented our design on a Xilinx Zynq XC7Z020 system-on-a-chip, which uses 2.525W of power, 30% of the logic resources of the chip, and 60% of the available on-chip memory. The system runs at 66.6MHz, which allows us to process 640x512-pixel images at more than 60 frames per second after the initial calibration to obtain the transformation parameters.
This paper introduces an analysis and design of a digital controlled double loop voltage-current for a buck converter. Taking into account the implementation requirements, the chosen platform was an field-programmable...
详细信息
ISBN:
(纸本)9781424452262
This paper introduces an analysis and design of a digital controlled double loop voltage-current for a buck converter. Taking into account the implementation requirements, the chosen platform was an field-programmable gate array (FPGA). This paper also shows in detail the analog-digital interface. The study is completed by a detailed analysis of the FPGA programming including states machines and logic diagrams. Finally, theoretical results are validated with extensive simulations.
We report on the software design of an ultra-parallel ultra-high speed spectral domain optical coherence tomography (SD-OCT) system. In our system, optical de-multiplexers divide an interferogram into 320 light every ...
详细信息
ISBN:
(纸本)9780819488565
We report on the software design of an ultra-parallel ultra-high speed spectral domain optical coherence tomography (SD-OCT) system. In our system, optical de-multiplexers divide an interferogram into 320 light every 18.7 GHz frequency, instead of a refractive grating for spectroscopy so far used in conventional SD-OCT. These optical elements enable to get rid of a re-sampling process and contribute to reduce the load of computing. The fast Fourier transform (FFT) is performed by field-programmable gate array (FPGA) and real-time 3D OCT images are created on graphics processing unit (GPU). Our system achieves a real-time 3D OCT image display (4D display) with an A-scan, B-scan, and volume rate of 10 MHz, 4 kHz, and 12 volumes per second, respectively.
暂无评论