Full field laser Doppler imaging (LDI) and single exposure laser speckle contrast imaging (LSCI) are directly compared using a novel instrument which can concurrently image blood flow using both LDI and LSCI signal pr...
详细信息
Full field laser Doppler imaging (LDI) and single exposure laser speckle contrast imaging (LSCI) are directly compared using a novel instrument which can concurrently image blood flow using both LDI and LSCI signal processing. Incorporating a commercial CMOS camera chip and a field programmable gate array (FPGA) the flow images of LDI and the contrast maps of LSCI are simultaneously processed by utilizing the same detected optical signals. The comparison was carried out by imaging a rotating diffuser. LDI has a linear response to the velocity. In contrast, LSCI is exposure time dependent and does not provide a linear response in the presence of static speckle. It is also demonstrated that the relationship between LDI and LSCI can be related through a power law which depends on the exposure time of LSCI. (C) 2016 Elsevier Ltd. All rights reserved.
Purpose - The purpose of this paper is to integrate the function of a speed controller for induction motor (IM) drive, such as the speed PI controller, the current vector controller, the slip speed estimator, the spac...
详细信息
Purpose - The purpose of this paper is to integrate the function of a speed controller for induction motor (IM) drive, such as the speed PI controller, the current vector controller, the slip speed estimator, the space vector pulse width modulation scheme, the quadrature encoder pulse, and analog to digital converter interface circuit, etc. into one field programmable gate array (FPGA). Design/methodology/approach - First, the mathematical modeling of an IM drive, the field-oriented control algorithm, and PI controller are derived. Second, the very high speed IC hardware description language (VHDL) is adopted to describe the behavior of the algorithms above. Third, based on electronic design automation simulator link, a co-simulation work constructed by ModelSim and Simulink is applied to verify the proposed VHDL code for the speed controller intellectual properties (IP). Finally, the developed VHDL code will be downloaded to the FPGA for further control the IM drive. Findings - In realization aspect, it only needs 5,590 LEs, 196,608 RAM bits, and 14 embedded 9-bit multipliers in FPGA to build up a speed control IP. In computational power aspect, the operation time to complete the computation of the PI controller, the slip speed estimator, the current vector controller are only 0.28 mu s, 0.72 mu s, and 0.96 mu s, respectively. Practical implications - Fast computation in FPGA can speed up the speed response of IM drive system to increase the running performance. Originality/value - This is the first time to realize all the function of a speed controller for IM drive within one FPGA.
Markov Chain Monte Carlo (MCMC) is a method to draw samples from a given probability distribution. Its frequent use for solving probabilistic inference problems, where big-scale data are repeatedly processed, means th...
详细信息
Markov Chain Monte Carlo (MCMC) is a method to draw samples from a given probability distribution. Its frequent use for solving probabilistic inference problems, where big-scale data are repeatedly processed, means that MCMC runtimes can be unacceptably large. This paper focuses on population-based MCMC, a popular family of computationally intensive MCMC samplers;we propose novel, highly optimized accelerators in three parallel hardware platforms (multi-core CPUs, GPUs and FPGAs), in order to address the performance limitations of sequential software implementations. For each platform, we jointly exploit the nature of the underlying hardware and the special characteristics of population-based MCMC. We focus particularly on the use of custom arithmetic precision, introducing two novel methods which employ custom precision in the largest part of the algorithm in order to reduce runtime, without causing sampling errors. We apply these methods to all platforms. The FPGA accelerators are up to 114x faster than multi-core CPUs and up to 53x faster than GPUs when doing inference on mixture models.
The paper deals with FLIPPER, a fault injection tool for SRAM-based field programmable gate array (FPGA) devices developed by the INAF, the Italian National Institute for Astrophysics, under a European Space Agency co...
详细信息
The paper deals with FLIPPER, a fault injection tool for SRAM-based field programmable gate array (FPGA) devices developed by the INAF, the Italian National Institute for Astrophysics, under a European Space Agency contract. SRAM-based FPGAs offer a number of attractive features, namely high gate density, performances, flexibility and reduced development costs. However, when employed in space or avionic applications, they are vulnerable to ionizing radiation, and, thus, their ability to cope with radiation must be assessed. This paper summarizes more than ten years of experiences with the FLIPPER fault injection platform.
Security problems introduced with rapid increase in deployment of Internet-of-Things devices can be overcome only with lightweight cryptographic schemes and modules. A compact prime field (GF(p)) elliptic curve digita...
详细信息
Security problems introduced with rapid increase in deployment of Internet-of-Things devices can be overcome only with lightweight cryptographic schemes and modules. A compact prime field (GF(p)) elliptic curve digital signature algorithm (ECDSA) engine suitable for use in such applications is presented. Generic architecture of the engine makes it suitable for other elliptic curve (EC) based schemes (EC Diffie-Hellman key exchange, EC integrated encryption, EC factoring etc.) with slight modifications. The presented engine is composed of a simple microcoded controller and application-specific processing units. It can work with ECs of up to 256 bits, while 160-bit ECDSA signature generation takes 490 K cycles. The engine is implemented as an intellectual property (IP) in a 180 nm process. However, its architecture allows it to be implemented on any application specific integrated circuit (ASIC) or FPGA platform with dual-port memory support. In view of its gate count of 11,366 gate equivalents, the presented work is the most compact ECDSA engine with capability for a wide range of curves and different applications.
This paper presents a 2D Delaunay triangulation core for surface reconstruction implemented on a field programmable gate array (FPGA) chip. The core implementation is derived using high-level synthesis from a C++ desc...
详细信息
This paper presents a 2D Delaunay triangulation core for surface reconstruction implemented on a field programmable gate array (FPGA) chip. The core implementation is derived using high-level synthesis from a C++ description of an incremental 2D Delaunay triangulation algorithm. This description was modified accordingly so that it can be embedded into a FPGA chip using hardware description language. Goal of this work is to increase the execution speed of the algorithm so as to allow for real-time operation. Towards this end, we performed an optimization process using high level synthesis directives which pipeline regions of the code in order to achieve delay optimization. We show preliminary results using standard benchmark models for surface reconstruction, which show the performance of our design.
This paper presents a hardware module design for the forward Binary Discrete Cosine Transform (BinDCT) and its implementation on a field programmable gate array device. Different architectures of the BinDCT module wer...
详细信息
This paper presents a hardware module design for the forward Binary Discrete Cosine Transform (BinDCT) and its implementation on a field programmable gate array device. Different architectures of the BinDCT module were explored to ensure the maximum efficiency. The elaboration of these architectures included architectural design, timing and pipeline analysis, hardware description language modeling, design synthesis, and implementation. The developed BinDCT hardware module presents a high efficiency in terms of operating frequency and hardware resources, which has made it suitable for the most recent video standards with high image resolution and refresh frequency. Additionally, the high hardware efficiency of the BinDCT would make it a very good candidate for time and resource-constrained applications. By comparison with several recent implementations of discrete cosine transform approximations, it has been shown that the proposed hardware BinDCT module presents the best performances. (C) 2016 Production and hosting by Elsevier B.V. on behalf of Cairo University.
Characterisation of the standard deviation of a time-series signal has uncommon, yet widespread applications. The usual requirement for a representation of signal standard deviation in real-time implies a high computa...
详细信息
Characterisation of the standard deviation of a time-series signal has uncommon, yet widespread applications. The usual requirement for a representation of signal standard deviation in real-time implies a high computation speed. A method based on a field programmable gate array (FPGA) implementation is presented. The technique is benchmarked against conventional computational approaches and shows a single windowed standard deviation update calculation of a 16 bit sample can be achieved in 11 ns on a modern CPU. The FPGA implementation is found to be superior to all other approaches examined with an operation time of below 10 ns, and thus provides a useful tool for the real-time measurement of the standard deviation of signals above 100 MHz.
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solut...
详细信息
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed.
Voice enabled ignition combines the speaker recognition and word recognition aspects of speech recognition. It replaces the function of a key in the starting of the ignition system of a car. An Fpga design incorporate...
详细信息
Voice enabled ignition combines the speaker recognition and word recognition aspects of speech recognition. It replaces the function of a key in the starting of the ignition system of a car. An Fpga design incorporates the required components of a generic speech recognition system and uses the unique capabilities of hardware in term of parallelism to improve performance. The compression of speech for storage and playback was facilitated by the usage of the G729 standard for compression of speech.
暂无评论