This work aims to develop a novel system, including software and hardware, to perform independent control tasks in a genuine parallel manner. Currently, to control processes with various sampling periods, distributed ...
详细信息
This work aims to develop a novel system, including software and hardware, to perform independent control tasks in a genuine parallel manner. Currently, to control processes with various sampling periods, distributed control systems are most commonly utilized. The main goal of this system is to propose an alternative solution, which allows simultaneous control of both fast and slow processes. The presented approach utilizes FPGA (field programmable gate array) with Nios II processor (Intel Soft Processor Series) to implement and maintain instances of independent controllers. Instances can implement FDMC (Fast Dynamic Matrix Control) and PID (Proportional-Integral-Derivative) control algorithms with various sampling times. The FPGA-based design allows for true independence of controllers' execution both from one another and the managing processor. Also, pure parallel execution allows for implementing slow and fast controllers in the same device. The complete flexible system with a matrix of controllers working in parallel in real-time was tested with both simulated and actual control processes (servomotor), yielding the same results as fully simulated experiments.
In order to improve the operating efficiency of the algorithm, some intelligent optimization algorithms are considered to be implemented on hardware. However, the existing design scheme has the problem of poor versati...
详细信息
In order to improve the operating efficiency of the algorithm, some intelligent optimization algorithms are considered to be implemented on hardware. However, the existing design scheme has the problem of poor versatility. Therefore, this paper proposes a general software-hardware co-design scheme of intelligent optimization algorithms. In the design scheme, the initialization module and fitness module of the algorithm are deployed on the Advanced RISC Machines (ARM) for execution to increase the flexibility of the program. The update module of the algorithm is deployed on the field programmable gate array (FPGA) for execution to realize the hardware acceleration. The data between ARM and FPGA is transferred through Advanced eXtensible Interface (AXI) bus. In this paper, the PSO, BA, WOA, GWO, CMAES and EO algorithms are implemented with the proposed design scheme. And the six algorithms are tested on thirteen benchmark functions of different types. The experimental results prove the feasibility of the design scheme. In addition, by comparing with software and other implementation methods in execution time, resource occupancy and convergence, the effectiveness and superiority of the proposed scheme are proved. (c) 2022 Elsevier B.V. All rights reserved.
An effective System-on-Chip (SoC) for smart Quality-of-Service (QoS) management over a virtual local area network (LAN) is presented in this study. The SoC is implemented by field programmable gate array (FPGA) for ac...
详细信息
An effective System-on-Chip (SoC) for smart Quality-of-Service (QoS) management over a virtual local area network (LAN) is presented in this study. The SoC is implemented by field programmable gate array (FPGA) for accelerating the delivery quality prediction for a service. The quality prediction is carried out by the general regression neural network (GRNN) algorithm based on a time-varying profile consisting of the past delivery records of the service. A novel record replacement algorithm is presented to update the profile, so that the bandwidth usage of the service can be effectively tracked by GRNN. Experimental results show that the SoC provides self-aware QoS management with low computation costs for applications over virtual LAN.
Finite impulse response (FIR) filters find wide application in signal processing applications on account of the stability and linear phase response of the filter. These digital filters are used in applications, like b...
详细信息
Finite impulse response (FIR) filters find wide application in signal processing applications on account of the stability and linear phase response of the filter. These digital filters are used in applications, like biomedical engineering, wireless communication, image processing, speech processing, digital audio and video processing. Low power design of FIR filter is one of the major constraints that researchers are trying hard to achieve. This paper presents the implementation of a novel power efficient design of a 4-tap 16-bit FIR filter using a modified Vedic multiplier (MVM) and a modified Han Carlson adder (MHCA). The units are coded using Verilog hardware description language and simulated using Xilinx Vivado Design Suite 2015.2. The filter is synthesized for the 7-series Artix field programmable gate array with xc7a100tcsg324-1 as the target device. The proposed filter design showed an improvement of a maximum of 57.44% and a minimum of 2.44% in the power consumption compared to the existing models.
This paper presents a high throughput hardware architecture for deblocking filter in high efficiency video coding (H.265/HEVC) standard. The architecture uses an efficient hybrid pipelining and parallel processing tec...
详细信息
This paper presents a high throughput hardware architecture for deblocking filter in high efficiency video coding (H.265/HEVC) standard. The architecture uses an efficient hybrid pipelining and parallel processing techniques for intra encoder. A single edge filter is designed to process both horizontal and vertical filtering of pixels one after the other respectively. In our proposed architecture, the video frame is divided into 32 x 32 blocks and each block is processed by splitting them into blocks of 8 x 32 pixels in a pipelined manner. Parallel processing is employed for filtering the edges which helped in improving the throughput by decreasing the processing clock cycles. It has been observed that the largest coding tree unit block that the hardware architecture can process is 64 x 64 pixels and can be achieved in 64 clock cycles. Synthesis results of the Verilog design for the proposed architecture using application specific integrated circuit 180 nm standard cell library shows that it consumes 102k 2-input NAND gates and can work at a maximum clock frequency of 250 MHz. The proposed design is capable of supporting 8k ultra high definition video sequences at 322 Frames Per Second (fps) which is the best among the existing present-day architectures.
The synchronization of chaotic systems plays an extremely imperative and fundamental role in the fields of science and engineering. Notably, various external noise disturbances have a great impact on the synchronizati...
详细信息
The synchronization of chaotic systems plays an extremely imperative and fundamental role in the fields of science and engineering. Notably, various external noise disturbances have a great impact on the synchronization of chaotic systems because chaotic systems are quite sensitive to the change of their initial values. Consequently, the robustness of chaotic system synchronization must be considered in practical applications. From this viewpoint, the present paper proposes a disturbance suppression zeroing neural network (DSZNN) for robust synchronization of chaotic and hyperchaotic systems, and the DSZNN is implemented on field programmable gate array (FPGA) for further hardware validation. The distinctive features of the proposed DSZNN controller have the ability to suppress disturbance with faster convergent speed and higher accuracy compared with super-exponential zeroing neural network (SEZNN) and conventional zeroing neural network (CZNN). Moreover, theoretical analysis, comparative numerical simulations and hardware validations for the synchronization of a hyperchaotic system are presented to demonstrate the superior performance of the proposed DSZNN.
In this manuscript, previously trained Convolutional neural network (CNN), Quantum Neural Network (QNN), and Binarized Neural Network (BNN) models performed employing Tensor Flow's Application Programming Interfac...
详细信息
In this manuscript, previously trained Convolutional neural network (CNN), Quantum Neural Network (QNN), and Binarized Neural Network (BNN) models performed employing Tensor Flow's Application Programming Interface (API) for real-time object detection and implemented on FPGA. Then, the proposed real time objects detection based on CNN, QNN and BNN Deep Neural Networks classifier mode activated on python, and then the dataset taken from PASCAL VOC. For an accuracy analysis of real time objection detection, this real time objects detection based on CNN Deep Neural Networks classifier provide 3.458% and 1.600% higher accuracy value than proposed real time objects detection. Then, the proposed real time objects detection based on CNN, QNN and BNN Deep Neural Networks classifier model verified by using the Verilog programming language in the Xilinx ISE 14.5 design tools in the ZYNQ FPGA development team. These results show the FPGA implementation of this real time objects detection based on CNN Deep Neural Networks classifier model meets the objective efficiently.
The work aims to develop an original software and hardware structure for the regulation of fast dynamic processes. The structure will allow simultaneous predictive regulation of many processes with independent dynamic...
详细信息
The work aims to develop an original software and hardware structure for the regulation of fast dynamic processes. The structure will allow simultaneous predictive regulation of many processes with independent dynamics. Designing the matrix of regulators managed by a specialized subsystem allows the user to use advanced regulation algorithms within a single embedded system, which will improve the design and development processes of an industrial regulation system. The presented innovative approach will allow the reproduction of many control scenarios such as parallel regulation of several processes or securing the regulation of a critical process by a redundancy of controllers.
A p-norm extreme learning machine (ELM) based on sparsity constraint is presented in this study for tracking of fundamental frequency, harmonic and dc in current power signals which finds application in phasor measure...
详细信息
A p-norm extreme learning machine (ELM) based on sparsity constraint is presented in this study for tracking of fundamental frequency, harmonic and dc in current power signals which finds application in phasor measurement units for wide area power network in smart grid environment. Real-time power applications typically are furnished with on-board controller and hence have constraints to stock a complex architecture. Moreover, the data from online practices are polluted by noises of diverse statistical features obtained on a sample-by-sample basis. Hence, approaches with improved learning paradigm and close model dealing with noises of varied statistical characteristics are essential. The proposed approach formulates a cost function with recursive p-norm error criterion and sparsity penalty that updates the output weights in succession besides adjusting some coefficients of the output weights to zeros that promotes quicker convergence and higher accuracy results. Exhaustive computer simulations have been carried out with synthetic signals and real-time signals to track the dynamic changes in the power signal amplitude, phase and frequency that demonstrate the accuracy, efficiency and robustness of the proposed p-norm ELM. Additionally, the new ELM network also is validated on a field programmable gate array (FPGA) hardware to prove its practicability towards current developments on phasor measurement units.
The Graph Attention Networks (GATs) exhibit outstanding performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to implement an...
详细信息
The Graph Attention Networks (GATs) exhibit outstanding performance in multiple authoritative node classification benchmark tests (including transductive and inductive). The purpose of this research is to implement an FPGA-based accelerator called FPGAN for graph attention networks that achieves significant improvement on performance and energy efficiency without losing accuracy compared with PyTorch baseline. It eliminates the dependence on digital signal processors (DSPs) and large amounts of on-chip memory and can even work well on low-end FPGA devices. We design FPGAN with software and hardware co-optimization across the full stack from algorithm through architecture. Specifically, we compress model to reduce the model size, quantify features to perform fixed-point calculation, replace multiplication addition cell (MAC) with shift addition units (SAUs) to eliminate the dependence on DSPs, and design an efficient algorithm to approximate SoftMax function. We also adjust the activation functions and fuse operations to further reduce the computation requirement. Moreover, all data is vectorized and aligned for scalable vector computation and efficient memory access. All the above optimizations are integrated into a universal hardware pipeline for various structures of GATs. We evaluate our design on an Inspur F10A board with an Intel Arria 10 GX1150 and 16 GB DDR3 memory. Experimental results show that FPGAN can achieve 7.34 times speedup over Nvidia Tesla V100 and 593 times over Xeon CPU Gold 5115 while maintaining accuracy, and 48 times and 2400 times on energy efficiency respectively.
暂无评论