Recently, deconvolutional neural network (DeCNN) has attracted widespread attention in various applications. The deconvolution (DeConv), as the main operation in DeCNN, has become the bottleneck of acceleration, due t...
详细信息
ISBN:
(纸本)9781665484855
Recently, deconvolutional neural network (DeCNN) has attracted widespread attention in various applications. The deconvolution (DeConv), as the main operation in DeCNN, has become the bottleneck of acceleration, due to its high computational complexity. Previous works have introduced fast algorithms such as the cascaded fast FIR algorithm (CFFA) and the Winograd algorithm to reduce the computational complexity of DeConv for the applications on mobile devices. Since these fast algorithms need different computing parameters to accelerate various operations, directly applying these methods to process DeCNNs with different kernels usually causes limited flexibility. To address this problem, we propose a reconfigurable scheme based on the fast transformation algorithm (FTA) to accelerate multiple types of DeConvs, minimizing the hardware overhead for reconfigurability. Based on this scheme, a reconfigurable hardware architecture is developed to support several types of DeConvs. In addition, an adaptive dataflow is proposed to handle different convolutional layers. The presented design can support several types of operations and achieve up to 222.54 GOPS under 210 MHz on the Intel Arria 10SX FPGA platform, which shows our design can obtain better flexibility and computational efficiency compared with prior arts.
In this paper, a zero-vector distribution variable k(1), which changes from zero to one, is introduced. The continuous SVPWM is generated when 0<k(1)<1, and several typical kinds of discontinuous SVPWM are gener...
详细信息
ISBN:
(纸本)9789810594237
In this paper, a zero-vector distribution variable k(1), which changes from zero to one, is introduced. The continuous SVPWM is generated when 0fast algorithm suitable for the digital implementation of all continuous and discontinuous SVPWM strategies is proposed according to the relationship between carrier-based PWM and SVPWM methods. Compared with the SVPWM conventional algorithm, the proposed algorithm can be more easily implemented with a microprocessor because it calculates the actual gating times for each inverter leg directly using three-phase reference voltages and needn't complex coordinate transformations, trigonometric function calculations, sector number identification, and the recombination of actual gating times. At last, the influence of the zero-vector distribution factor and modulation index on the voltage harmonic characteristics of all SVPWM strategies is investigated. The simulation and experimental results show that the proposed method is valid and feasible.
A fast Dynamic Matrix Control (DMC) algorithm is proposed to satisfy the requirements of the embedded temperature regulator. There exists online high-dimension matrix inversion in the DMC control algorithm, so it is d...
详细信息
ISBN:
(纸本)9781665478960
A fast Dynamic Matrix Control (DMC) algorithm is proposed to satisfy the requirements of the embedded temperature regulator. There exists online high-dimension matrix inversion in the DMC control algorithm, so it is difficult to apply to the embedded temperature regulator with limited computing capacity. In this work, the high-dimension matrix is decomposed into the product of a triangular matrix and its transpose matrix by Cholesky decomposition, and the inversion of the matrix calculation could be calculated recursively. Based on the proposed algorithm, an embedded temperature regulator has been developed. Therefore, the computation cost of the fast algorithm is greatly reduced compared to the traditional DMC algorithm. The experimental results have verified that the algorithm can be applied to the embedded temperature regulator and provides better regulation effect than that with the PID control algorithm.
The propagator method (PM) belongs to a class of subspace based methods for direction-of-arrival estimation which only requires linear operations but does not involve any eigendecomposition or singular value decomposi...
详细信息
The propagator method (PM) belongs to a class of subspace based methods for direction-of-arrival estimation which only requires linear operations but does not involve any eigendecomposition or singular value decomposition as in common subspace techniques. In this paper, we apply the PM for estimating the frequencies of multiple real sinusoids in noise and a computationally simple as well as high resolution multiple frequency estimation algorithm is developed. The estimation accuracy of the proposed method is contrasted with the conventional MUSIC and Cramer-Rao lower bound under different noise conditions.
In the letter, the fast one-dimensional (I-D) and two-dimensional (2-D) algorithms for realizing low-complexity 4 x 4 discrete cosine transform (DCT) for H.264 applications are developed. Through applying matrix utili...
详细信息
In the letter, the fast one-dimensional (I-D) and two-dimensional (2-D) algorithms for realizing low-complexity 4 x 4 discrete cosine transform (DCT) for H.264 applications are developed. Through applying matrix utilizations with Kronecker product and direct sum, the efficient fast 2-D 4 x 4 DCT algorithm can be developed from the proposed fast 1-D 4 x 4 DCT algorithm by matrix decompositions. The fast 1-D and 2-D low-complexity 4 x 4 DCT algorithms requires fewer multiplications and additions than other fast DCT algorithms. Owing to regular modularity, the proposed fast algorithms can achieve real-time H.264 video signal processing with VLSI implementation.
New modification of the fast algorithm based on the Barnes–Hut (BH) and multipole (FMM) methods is developed for the problem of velocities calculation in vortex particle method. It provides a quasilinear computationa...
详细信息
3D dynamic holographic display is one of the most attractive techniques for achieving real 3D vision with full depth cue without any extra devices. However, huge 3D information and data should be preceded and be compu...
详细信息
ISBN:
(纸本)9780819497826
3D dynamic holographic display is one of the most attractive techniques for achieving real 3D vision with full depth cue without any extra devices. However, huge 3D information and data should be preceded and be computed in real time for generating the hologram in 3D dynamic holographic display, and it is a challenge even for the most advanced computer. Many fast algorithms are proposed for speeding the calculation and reducing the memory usage, such as: look-up table (LUT), compressed look-up table (C-LUT), split look-up table (S-LUT), and novel look-up table (N-LUT) based on the point-based method, and full analytical polygon-based methods, one-step polygon-based methodbased on the polygon-based method. In this presentation, we overview various fast algorithms based on the point-based method and thepolygon-based method, and focus on the fast algorithm (C-LUT) with low memory usage and one-step polygon-based method by the 2D Fourier analysis of the 3D affine transformation. The numerical simulations and the optical experiments are presented, and several other algorithms are compared. The results show that the C-LUT algorithm and the one-step polygon-based method are efficient methods for saving calculation time. It is believed that those methods could be used in the real-time 3D holographic display in the future.
To improve the performance of Saitou and Nei's algorithm (SN) and Studier and Keppler's improved algorithm (SK) for constructing neighbor-joining phylogenetic trees and reduce the time complexity of the computat...
详细信息
To improve the performance of Saitou and Nei's algorithm (SN) and Studier and Keppler's improved algorithm (SK) for constructing neighbor-joining phylogenetic trees and reduce the time complexity of the computation, a fast algorithm is proposed. The proposed algorithm includes three techniques. First, a linear array A[N] is introduced to store the sum of every row of the distance matrix (the same as SK), which can eliminate many repeated computations. Secondly, the value of A [i] is computed only once at the beginning of the algorithm, and is updated by three elements in the iteration. Thirdly, a very compact formula for the sum of all the branch lengths of operational taxonomic units (OTUs) i and j is designed, and the correctness of the formula is proved. The experimental results show that the proposed algorithm is from tens to hundreds times faster than SN and roughly two times faster than SK when N increases, constructing a tree with 2 000 OTUs in 3 min on a current desktop computer. To earn the time with the cost of the space and reduce the computations in the innermost loop are the basic solutions for algorithms with many loops.
This paper proposes a method for fast operation by software for a certain kind of filtering, e.g., maximum, minimum, logical sum and logical product, among the nonlinear filtering for the image. The computation time o...
详细信息
This paper proposes a method for fast operation by software for a certain kind of filtering, e.g., maximum, minimum, logical sum and logical product, among the nonlinear filtering for the image. The computation time of the proposed filtering is independent of the filter size or the content of the image. The computation for an output for the two-dimensional gray-scale image can be completed by at most two times of 2-term operations. When the image to be processed is a binary image. Then, the computation time can be reduced by a factor of more than 20 by the actual measurement of the computation time for the whole image, compared to the case of the gray-scale image. In the proposed method, the ring-type buffer is connected in multistages, so that the computation can be executed without storing the whole image in the main memory. This property is suited to the large-scale image filtering, as in the cases of a satellite image or drawn diagram. The proposed method is actually programmed. Using the gray-scale and binary images, the method is compared by experiment to the conventional method, and the high speed of the proposed method is verified.
Numerical analysis is presented for the nonlocal Allen-Cahn equation, which contains spatial nonlocal operator and time-fractional derivative. By employing the spatial quadrature-based finite difference method and the...
详细信息
Numerical analysis is presented for the nonlocal Allen-Cahn equation, which contains spatial nonlocal operator and time-fractional derivative. By employing the spatial quadrature-based finite difference method and the nonuniform L1 formula jointed with the scalar auxiliary variable (SAV) approach in temporal discretization, a nonuniform numerical scheme is established. The nonlinear solver can be transformed into linear one effectively due to the SAV approach. And, the proposed scheme is proven to be energy stable by use of the positive definiteness of the kernel function. Moreover, the fast algorithm based on the nonuniform L1 formula is applied in the numerical example to improving computational efficiency. Finally, the numerical results demonstrate the temporal convergence of numerical scheme, energy property, comparisons with the nonlocal cases and local cases and maximum principle of the numerical solution. (C) 2021 Elsevier Ltd. All rights reserved.
暂无评论