The paper deals with the design, implementation and experiment of an embedded drive system diagnostics for interior permanent magnet synchronous motor drive for safety critical applications. The drive is intended for ...
详细信息
ISBN:
(纸本)9781665482400
The paper deals with the design, implementation and experiment of an embedded drive system diagnostics for interior permanent magnet synchronous motor drive for safety critical applications. The drive is intended for traction drive applications therefore it uses a combination of a digital signal processor (DSP) and a field programmable gate array (FPGA) as is often the case in modern industrial drives. A real-time harmonic monitoring is employed to indicate the development or existence of fault within the drive system. This is achieved by embedding a real-time measurement and control within the DSP operating in conjunction with the motor control, to estimate the machine's transient reactance after applying an excitation with voltage pulses using the switching of the inverter, and a real-time frequency analysis of the parameter within the FPGA operating independently of the motor control. Experimental testing is used to validate the proposed condition monitoring algorithm on a laboratory prototype of interior permanent magnet synchronous motor drive with a rated power of 4.5 kW.
The ECG signals are one of the most important signals to check the human heart's condition. On monitoring the heart continuously, a large amount of ECG signal data will be produced. So, there is a need for efficie...
详细信息
ISBN:
(纸本)9781665486842
The ECG signals are one of the most important signals to check the human heart's condition. On monitoring the heart continuously, a large amount of ECG signal data will be produced. So, there is a need for efficient compression techniques. Discrete Anamorphic Stretch Transform (DAST) is one of the most efficient techniques. It is a one-dimensional complex transform that includes the phase recovery technique for recovering the phase from the magnitudes. This paper deals with implementing the phase recovery block in field programmable gate array (FPGA), which will recover the phase by using magnitudes. Phase recovery block plays a key role in reconstructing the phases from the magnitudes. First, the required signal is passed through the linear filter or phase recovery filter. Then the phase value is estimated using a non-iterative algorithm depending on the linearity and causality conditions. The new approach for the phase recovery block is also used for any complex signal transmission. The input ECG signal is taken from the MIT-BIH Arrhythmia database and implementation is carried out in Artix-7 NEXYS 4 DDR FPGA Board. The performance of the phase recovery block is quantified in terms of hardware and computational complexity.
Simultaneous Localization and Mapping (SLAM) is pivotal for autonomous robotics, yet feature-based SLAM systems struggle with sparse environmental representations and robustness under dynamic conditions. Optical-flow-...
详细信息
Simultaneous Localization and Mapping (SLAM) is pivotal for autonomous robotics, yet feature-based SLAM systems struggle with sparse environmental representations and robustness under dynamic conditions. Optical-flow-based SLAM (OpF-SLAM) addresses these limitations by leveraging pixel-level motion data for dense mapping;however, its computational intensity hinders real-time deployment. This paper presents RT-FLOW, an FPGA-based accelerator for OpF-SLAM that achieves real-time performance through three key innovations: 1) A feature-context encoding engine that exploits inter-frame similarity to resolve data dependency in correlation construction, reducing latency by 77.5%. 2) A heterogeneous mixed-precision flow update engine guided by correlation sparsity, enabling 3.7x faster optical flow computation with negligible accuracy loss. 3) A pivoting-free linear solver using Householder transformations for stable pose optimization. Implemented on Xilinx XCZU7EV FPGA, RT-FLOW processes full-image pixels per frame at 65 fps with an energy efficiency of 0.358 mu J/point, outperforming previous FPGA designs. Evaluated on benchmark datasets, RT-FLOW demonstrates robustness in diverse environments while maintaining sub-110mJ/frame energy consumption. This work bridges the gap between algorithmic potential and hardware feasibility for high-density SLAM, empowering next-generation mobile robots with real-time scene understanding capabilities.
Reinforcement learning, augmented by the representational power of deep neural networks, has shown promising results on high-dimensional problems, such as game playing and robotic control. However, the sequential natu...
详细信息
Reinforcement learning, augmented by the representational power of deep neural networks, has shown promising results on high-dimensional problems, such as game playing and robotic control. However, the sequential nature of these problems poses a fundamental challenge for computational efficiency. Recently, alternative approaches such as evolutionary strategies and deep neuroevolution demonstrated competitive results with faster training time on distributed CPU cores. Here we report record training times (running at about 1 million frames per second) for Atari 2600 games using deep neuroevolution implemented on distributed FPGAs. Combined hardware implementation of the game console, image preprocessing and the neural network in an optimized pipeline, multiplied with the system level parallelism enabled the acceleration. These results are the first application demonstration on the IBM Neural Computer, which is a custom designed system that consists of 432 Xilinx FPGAs interconnected in a 3D mesh network topology. In addition to high performance, experiments also showed improvement in accuracy for all games compared to the CPU implementation of the same algorithm.
This paper proposes an efficient high-order finite impulse response (FIR) filter structure for field programmable gate array (FPGA)-based applications with simultaneous digital signal processing (DSP) and look-up-tabl...
详细信息
This paper proposes an efficient high-order finite impulse response (FIR) filter structure for field programmable gate array (FPGA)-based applications with simultaneous digital signal processing (DSP) and look-up-table (LUT) reduced utilization. The real-time updating of the filter coefficients is also put into perspective. In order to perform these objectives, both the speed and the structure of FPGA are efficiently exploited. The gap between the required input sampling frequency and the FPGA allowed maximum frequency is managed to achieve additional computing sequences. Furthermore, the special structures of the FPGA Look-up-table Shift-Register (LUT-SR) and their internal connections are fully employed for pipelining and selecting the input samples. The FPGA Block RAMs (BRAMs) are employed for handling the reconfigurable filter coefficients, and the FPGA DSP slices are associated for computing the output data of the BRAMs and the multiplexers. To synchronize the BRAM unit addressing with the LUT multiplexer selection, a single unit is used for simultaneous control. The obtained results show that the proposed reconfigurable 16-tap FIR filter offers reductions of 79.3% and 74.4% of slice utilization over the hybrid variable size partitioning (VP-Hybrid) based structure and the Radix-2(R) based structure, respectively when implemented on a Xilinx Spartan-6 XC6SLX45 FPGA. Moreover, an improvement of efficiency is achieved compared to all reputed FPGA-based architectures.
The electrostrictive 2-D field-effect transistor (EFET) is a steep-slope device that promises to offer aggressive length and voltage scalability. Two key features of this device are its high-drive strength with high O...
详细信息
The electrostrictive 2-D field-effect transistor (EFET) is a steep-slope device that promises to offer aggressive length and voltage scalability. Two key features of this device are its high-drive strength with high ON-OFF current ratio and the isolated back-gate terminal, which provides us the fourth knob to control the transistor drive strength. The disadvantage of the technology is the increased device capacitance incurred due to the additional piezoelectric layer in the transistor structure. Second, although the back-gate biasing of EFETs provides us the fourth knob of control, statically biasing the back gate increases the static power consumption. Despite the idiosyncrasies of the technology, this work shows the use of EFETs in field-programmablegatearrays (FPGAs) to be advantageous because the added energy cost of device capacitance gets amortized by the improvement in performance and energy efficiency of using high-drive EFET transistors in the FPGA interconnect architecture. We also show that co-optimization of back-bias voltage along with transduction efficiency is essential in the FPGA subcircuit level for achieving an energy-efficient architecture. This work highlights the specific design approach tradeoffs that differ from prior CMOS approaches and provides guidance for the engineering parameters necessary for EFETs to evolve as a competitive technology.
The mammalian spatial navigation system is characterized by an initial divergence of internal representations, with disparate classes of neurons responding to distinct features including location, speed, borders and h...
详细信息
The mammalian spatial navigation system is characterized by an initial divergence of internal representations, with disparate classes of neurons responding to distinct features including location, speed, borders and head direction;an ensuing convergence finally enables navigation and path integration. Here, we report the algorithmic and hardware implementation of biomimetic neural structures encompassing a feed-forward trimodular, multi-layer architecture representing grid-cell, place-cell and decoding modules for navigation. The grid-cell module comprised of neurons that fired in a grid-like pattern, and was built of distinct layers that constituted the dorsoventral span of the medial entorhinal cortex. Each layer was built as an independent continuous attractor network with distinct grid-field spatial scales. The place-cell module comprised of neurons that fired at one or few spatial locations, organized into different clusters based on convergent modular inputs from different grid-cell layers, replicating the gradient in place-field size along the hippocampal dorsoventral axis. The decoding module, a two-layer neural network that constitutes the convergence of the divergent representations in preceding modules, received inputs from the place-cell module and provided specific coordinates of the navigating object. After vital design optimizations involving all modules, we implemented the tri-modular structure on Zynq Ultrascale+ field-programmablegatearray silicon chip, and demonstrated its capacity in precisely estimating the navigational trajectory with minimal overall resource consumption involving a mere 2.92% Look Up Table utilization. Our implementation of a biomimetic, digital spatial navigation system is stable, reliable, reconfigurable, real-time with execution time of about 32 s for 100k input samples (in contrast to 40 minutes on Intel Core i7-7700 CPU with 8 cores clocking at 3.60 GHz) and thus can be deployed for autonomous-robotic navigation without
The deep joint source-channel coding and modulation (JSCCM) is a promising technology to realize efficient communication over extreme environments such as underwater area. In previous works, it is shown that deep conv...
详细信息
ISBN:
(纸本)9781665435406
The deep joint source-channel coding and modulation (JSCCM) is a promising technology to realize efficient communication over extreme environments such as underwater area. In previous works, it is shown that deep convolutional neural networks (CNN) can successfully learn JSCCM encoder and decoder, outperforming conventional separation-based coding and modulation schemes in low signal-to-noise ratio settings. This paper proposes a new architecture for deep JSCCM based on the self-attention mechanism. We show that the proposed architecture achieves significant performance improvement compared with the CNN-based schemes while requiring a smaller network size in terms of the number of weight parameters. Furthermore, we present efficient hardware implementation of the proposed JSCCM encoder on a field programmable gate array (FPGA). In particular, we demonstrate that a systolic-array-like structure is effective for FPGA implementation of the proposed JSCCM scheme based on the self-attention mechanism.
In distributed deep learning (DL), collective communication algorithms, such as Allreduce, used to share training results between graphical processing units (GPUs) are an inevitable bottleneck. We hypothesize that a c...
详细信息
In distributed deep learning (DL), collective communication algorithms, such as Allreduce, used to share training results between graphical processing units (GPUs) are an inevitable bottleneck. We hypothesize that a cache access latency occurred at every Allreduce is a significant bottleneck in the current computational systems with high-bandwidth interconnects for distributed DL. To reduce this frequency of latency, it is important to aggregate data at the network interfaces. We implement a data aggregation circuit in a field-programmablegatearray (FPGA). Using this FPGA, we proposed novel Allreduce architecture and training strategy without accuracy degradation. Results of the measurement show Allreduce latency reduction to 1/4. Our system can also conceal about 90% of the communication overhead and improve scalability by 20%. The end-to-end time consumed for training in distributed DL with ResNet-50 and ImageNet is reduced to 87.3% without any degradation in validation accuracy.
An adaptive Lock-in Amplifier (LIA), which works for sinusoidal signals in the frequency range of 9 - 11 kHz and an amplitude range of 0.3 - 10 V is being proposed. LIAs can extract useful signals from a very high noi...
详细信息
ISBN:
(纸本)9781665473507
An adaptive Lock-in Amplifier (LIA), which works for sinusoidal signals in the frequency range of 9 - 11 kHz and an amplitude range of 0.3 - 10 V is being proposed. LIAs can extract useful signals from a very high noisy environment. For an adaptive LIA, the reference signal has to be generated by a phase locked loop (PLL) from the incoming signal. The phase and frequency error between the reference and input signals may reduce the accuracy of LIA system. To eliminate this error, a PLL with an enhanced phase detector is proposed. Using this quadrature PLL (QPLL), the accuracy of the LIA has effectively increased in the designed frequency range. The simulation results show that the proposed model can extract the amplitude of the signal buried in noise with a signal-to-noise ratio (SNR) as small as 10 dB and harmonics. The system-on-chip implementation of the adaptive LIA is carried out in the Altera Stratix III FPGA device. Testing the implementation for noise as well as harmonics have been performed in the designed frequency and amplitude range.
暂无评论