Deep learning hardware accelerators commonly incorporate a substantial quantity of multiplier units. Yet, the considerable complexity of multiplier circuits renders them a bottleneck, contributing to increased costs a...
详细信息
ISBN:
(纸本)9798350383638;9798350383645
Deep learning hardware accelerators commonly incorporate a substantial quantity of multiplier units. Yet, the considerable complexity of multiplier circuits renders them a bottleneck, contributing to increased costs and latency. Approximate computing proves to be an effective strategy for mitigating the overhead associated with multipliers. This paper introduces an original approximation technique for signed multiplication on FPGAs. The approach involves a novel segmentation method applied to the Baugh-Wooley multiplication algorithm. Each segment is optimally accommodated within look-up table resources of modern AMD-Xilinx FPGA families. The paper details the design of an INT8 multiplier using the proposed approach, presenting implementation results and accuracy assessments for the inference of benchmark deep learning models. The implementation results reveal significant savings of 53.6% in LUT utilization compared to the standard INT8 Xilinx multiplier. Accuracy measurements conducted on four popular deep learning benchmarks show an average accuracy degradation of 4.8% in post-training deployment and 0.7% after retraining. The source code for this work is available on GitHub(1).
Power converters are used for grid integration of renewable sources that can achieve certain objectives through system control. Finite state - model predictive control (FS-MPC) is one of the techniques used for the gr...
详细信息
ISBN:
(纸本)9788957083130
Power converters are used for grid integration of renewable sources that can achieve certain objectives through system control. Finite state - model predictive control (FS-MPC) is one of the techniques used for the grid integration of voltage source inverter (VSI) and possessing distinctive features such as fast dynamic performance and ability to incorporate constraints inherently. However, system development is one of the concern for FS-MPC due to computational delay problem. fieldprogrammablegatearray (FPGA) based system development is a way to tackle the mentioned problem because of its parallel processing nature. In this paper, FS-MPC is presented for three-phase grid-connected VSI system using modeling-based digital system design approach that is advantageous for analysis, easy debugging and FPGA-based system development. The integrated platform of MATLAB-Simulink and system generator is used for modeling and Hardware-in-the-loop (HIL) simulation to validate the system.
A plasma control system to sustain divertor configurations is developed on QUEST (Q-shu university experiment with steady-state spherical tokamak). Magnetic fluxes are numerically integrated at 100 kHz using FPGA (Fie...
详细信息
A plasma control system to sustain divertor configurations is developed on QUEST (Q-shu university experiment with steady-state spherical tokamak). Magnetic fluxes are numerically integrated at 100 kHz using FPGA (field-programmable gate array) modules and transferred to a main calculation loop at 4 kHz. With these signals, plasma shapes are identified in real time at 2 kHz under the assumption that the plasma current can be represented as one filament current. This calculation is done in another calculation loop in parallel by taking advantage of a multi-core processor of the plasma control system. The inside and outside plasma edge positions are controlled to their target positions using PID (proportional-integral-derivative) control loops. Whereas the outside edge position can not be controlled by the outer PF coil current, the inside edge position can be controlled by the inner PF coil current. (C) 2013 Elsevier B.V. All rights reserved.
This paper investigates the design of a Parallel Hybrid Converter topology for DC/AC applications connected using a three-winding transformer. The original Parallel Hybrid Converter (PHC) topology consists of a Silico...
详细信息
ISBN:
(纸本)9798350316643
This paper investigates the design of a Parallel Hybrid Converter topology for DC/AC applications connected using a three-winding transformer. The original Parallel Hybrid Converter (PHC) topology consists of a Silicon IGBT bridge and a partially rated Silicon Carbide MOSFET bridge with a shared DC bus. A common-mode inductor is needed in this topology to mitigate the circulating current between the two bridges. This paper introduces a novel variant of the PHC with a three-winding transformer that removes the requirement of the common-mode inductor. Component sizing optimisation for this topology is performed using a Non-Denominated Set Genetic Algorithm. A Long-Horizon Model predictive control method will be implemented into the FPGA real-time system, including the MOSFET bridge's current limit constraint in the cost functions. Experimental tests validate the performance of this topology and implementation of FPGA, with an IGBT bridge frequency about 1000 Hz and grid-side output current THD below 3% achieved.
Updating the state of reservoir nodes is one of the essential operations of reservoir computing (RC), which highly affects the system's performance. In an echo state network (ESN), one of the primary types of RC, ...
详细信息
ISBN:
(纸本)9781665494663
Updating the state of reservoir nodes is one of the essential operations of reservoir computing (RC), which highly affects the system's performance. In an echo state network (ESN), one of the primary types of RC, the process of state renewal can be divided into two stages: multiplication of the weight matrix with the input-state vector and applying a nonlinear activation function on the sum of products. The weight matrix is typically large and sparse, providing opportunities for optimizing the matrix multiplication;the choices of activation functions may also affect hardware resource utilization. This paper introduces an optimized reservoir node architecture for FPGA-based RC systems. Specifically, we adopt the bit-serial matrix multiplier and direct spatial implementation of the weight matrix to fully exploit the sparseness property. The canonical signed digit representation is also employed to further optimize the multiplier logic. Furthermore, a hyperbolic tangent activation function is designed and optimized to maintain the nonlinearity of the neural network without affecting its accuracy. Compared with existing hardware ESN designs, our reservoir node architecture significantly reduces resource utilization while maintaining comparable performance.
IP-core is a block with a complex function that can be re-used in integrated circuits design. There are two types of FPGA IP-cores: hard IP-core and soft IP-core. Hard IP-cores have an exact location and pre-routed in...
详细信息
ISBN:
(纸本)9781665404761
IP-core is a block with a complex function that can be re-used in integrated circuits design. There are two types of FPGA IP-cores: hard IP-core and soft IP-core. Hard IP-cores have an exact location and pre-routed interconnects while soft IP-cores can be synthesized from logic elements and should be placed and routed. To use IP-cores in automated design flow of integrated circuits on FPGA it is necessary to develop IP-cores libraries that allow identifying blocks on every stage of flow. This article shows IP-core libraries types and forms used as a part of design flow developed by IPPM RAS for Russian FPGA. It describes challenges of libraries for logical synthesis development and automatic mapping on an existing basis. The paper presents libraries needed by CAD on every stage of physical design for clustering, placement and routing. Also, it considers soft and hard IP-cores libraries distinct features and methods of their formation taking into account the FPGA architecture.
The performance in radar receiver signal processing is a critical factor in identifying radar pulses. The use of field-programmable gate arrays (FPGA) in hardware acceleration provides multiple advantages in radar sig...
详细信息
ISBN:
(纸本)9781665414906
The performance in radar receiver signal processing is a critical factor in identifying radar pulses. The use of field-programmable gate arrays (FPGA) in hardware acceleration provides multiple advantages in radar signal processing. High-level synthesis (HLS) tools enable systems developed in high-level languages, such as C, flexibility in conversion to a register-transfer level (RTL) design. A direct HLS translation for an FPGA target may not always improve performance, and analysis is compelling in systems where speed and performance are crucial. System on a Chip (SoC) FPGAs includes a processing system (PS) and programmable logic (PL) architectures on a single device. The performance between high-level language designs executed on the PS and HLS adaption implemented on the PL can be directly analyzed. This paper presents the performance comparisons of a Python and HLS versions of a previous work radar pulse on pulse identification algorithm implemented on an SoC FPGA.
The paper proposes a review of the modern research experience of the cellular automata algorithms, models and methods to solve the problem of searching the shortest paths and the placement task for the logic elements ...
详细信息
ISBN:
(纸本)9781665404761
The paper proposes a review of the modern research experience of the cellular automata algorithms, models and methods to solve the problem of searching the shortest paths and the placement task for the logic elements within the design flow for FPGA. The classical and hybrid cellular automata models are considered. The particular attention is paid to the cellular automata with multi-agent semantics for the shortest path search algorithms. A detailed analysis of the model for solving the placement problem based on the systolic structure is given. The main difficulties and possible problems of the cellular automata models and algorithms for placement and routing stages of an FPGA design flow are highlighted and discussed.
Reconfigurable System-on-Chip (SoC) is a computing architecture that allows users to program and reprogram the hardware to perform different tasks, making it highly flexible and adaptable. Due to its potential to sign...
详细信息
ISBN:
(纸本)9798350348132
Reconfigurable System-on-Chip (SoC) is a computing architecture that allows users to program and reprogram the hardware to perform different tasks, making it highly flexible and adaptable. Due to its potential to significantly accelerate various applications, reconfigurable SoC has become a subject of much research for academicians and industry. On the other hand, the classical practice of experimenting in the laboratory has been changing radically with the advanced utilization of computers, electronic devices, and the internet. As a result, it has given birth to a new concept of laboratory practice called Internet-of-Things (IoT) based collaborative simulation laboratory, which enables students to access and use the laboratory equipment at their convenience, enabling them to learn at their own pace and comfort. First, this paper reviewed the state of the art of reconfigurable SoC, specially field-programmable gate arrays (FPGAs) laboratory architecture. Then, it developed an IoT-based collaborative simulation laboratory for reconfigurable SoC, allowing multiple students to work together to solve laboratory tasks remotely. Furthermore, this laboratory architecture is for reconfigurable FPGAs, which can manage several tasks automatically, such as logic synthesis, logic simulation, and FPGA run from multiple FPGAs or SoC. In addition to its benefits, the capability to virtualize user input/output interfaces and remotely measure the design's response further enhances its advantages. However, this proposed laboratory architecture is essential not only for doing laboratory tasks remotely but also simplifies the existing teaching methods and helping evaluate student performance in the examination.
field-programmable gate arrays (FPGAs) are being extensively used for a wide range of digital applications due to their flexibility and reprogrammability. This paper presents a FPGA implementation of the second -order...
详细信息
ISBN:
(纸本)9781467365406
field-programmable gate arrays (FPGAs) are being extensively used for a wide range of digital applications due to their flexibility and reprogrammability. This paper presents a FPGA implementation of the second -order difference plot (SODP) technique which can be used for the classification of ictal and seizure -free electroencephalogram (EEG) signals. Empirical mode decomposition (EMD) can break down an EEG signal into simple oscillatory modes called intrinsic mode functions (IMFs). The hardware design developed takes a sampled IMF of an EEG signal as input and generates its SODP while simultaneously calculating the 95 percent confidence ellipse area of the SODP. The ellipse area can be used as a parameter for detecting epileptic seizures in EEG signals. The digital circuit was designed in the Vivado integrated development environment (IDE) using Verilog hardware description language (HDL) and a Xilinx Artix-7 xc7a100tcsg324 FPGA was used to verify operation of the physical implementation. The hardware was tested on EEG data made publicly available by the University of Bonn and the results were found to be consistent with MATLAB simulations.
暂无评论