With the rise of IoT and edge computing, deploying neural networks (NNs) on low-power edge computing devices is drawing more and more attention. In NNs, convolutional layers take up the majority of the computing cycle...
详细信息
With the rise of IoT and edge computing, deploying neural networks (NNs) on low-power edge computing devices is drawing more and more attention. In NNs, convolutional layers take up the majority of the computing cycles, especially when NNs are implemented on ARM processors. Therefore, it is necessary to optimize the convolutional implementation on ARM Cortex-M MCUs. This paper proposes an efficient im2row-based fast convolution algorithm with two innovations. First, a novel im2row method for reusing the data of adjacent convolutional windows is presented. This method utilizes a reusable im2row buffer for data reuse, significantly reducing the amount of data copied during im2row and improving efficiency. Second, in algorithm implementation, a q7_t to q15_t data type extension technique that avoids data reordering is employed. This technique eliminates data reordering instructions, thus reducing the runtime of the algorithm. We evaluate our algorithm in separate convolutional layers and NNs. The results for convolutional layers show that, compared to baseline, the proposed algorithm speeds up the convolutional layer by an average of 1.42x, and the maximum speedup is up to 2.9x. Experiments on different NNs demonstrate that our algorithm can speed up the overall NN by up to 2.15x.
In this paper, an effective finite difference scheme of high order accuracy is proposed for the nonlinear time fractional Burgers' equation. Specifically, we apply the Alikhanov's scheme on graded mesh in the ...
详细信息
In this paper, an effective finite difference scheme of high order accuracy is proposed for the nonlinear time fractional Burgers' equation. Specifically, we apply the Alikhanov's scheme on graded mesh in the temporal direction and a novel fourth-order compact scheme in the spatial discretization. The proposed scheme resolves initial weak singularity of the solution and preserves high resolution in the space direction. It is rigorously proved that the finite difference scheme is uniquely solvable, discrete variational energy dissipation law and unconditionally stable and convergent in sense of discrete L-2-norm. With appropriate choice of the grading parameter, the convergence accuracy is min?{r alpha, 2} order in time and fourth order in space, where r is the mesh grading. In the numerical implementation procedure, fastconvolution technique and adaptive time-stepping strategy are adopted to accelerate the presented solver and to capture evolution of the solution. Numerical experiments are carried out to verify the validity and effectiveness of the proposed scheme for solving nonlinear time-fractional Burgers' equation.
In this work, an effective and fast finite element numerical method with high-order accuracy is discussed for solving a nonlinear time fractional diffusion equation. A two-level linearized finite element scheme is con...
详细信息
In this work, an effective and fast finite element numerical method with high-order accuracy is discussed for solving a nonlinear time fractional diffusion equation. A two-level linearized finite element scheme is constructed and a temporal-spatial error splitting argument is established to split the error into two parts, that is, the temporal error and the spatial error. Based on the regularity of the time discrete system, the temporal error estimate is derived. Using the property of the Ritz projection operator, the spatial error is deduced. Unconditional superclose result in H-1-norm is obtained, with no additional regularity assumption about the exact solution of the problem considered. Then the global superconvergence error estimate is obtained through the interpolated postprocessing technique. In order to reduce storage and computation time, a fast finite element method evaluation scheme for solving the nonlinear time fractional diffusion equation is developed. To confirm the theoretical error analysis, some numerical results are provided.
A fast, easily implemented and high efficiency algorithm for time fractional Maxwell's system is constructed. The algorithm is based on recently developed the sum-of-exponentials (SOE) approximation and Finite-Dif...
详细信息
A fast, easily implemented and high efficiency algorithm for time fractional Maxwell's system is constructed. The algorithm is based on recently developed the sum-of-exponentials (SOE) approximation and Finite-Difference Time-Domain (FDTD) method. A particular feature of our proposed algorithm is that it can achieve high efficiency with no loss in accuracy. The computing process of our algorithm in detail is derived. Numerical experiments in 2D and 3D are presented to verify the efficiency and correctness of our proposed algorithm. (c) 2020 Elsevier Ltd. All rights reserved.
In this article, an efficient algorithm for the evaluation of the Caputo fractional derivative and the superconvergence property of fully discrete finite element approximation for the time fractional subdiffusion equa...
详细信息
In this article, an efficient algorithm for the evaluation of the Caputo fractional derivative and the superconvergence property of fully discrete finite element approximation for the time fractional subdiffusion equation are considered. First, the space semidiscrete finite element approximation scheme for the constant coefficient problem is derived and supercloseness result is proved. The time discretization is based on the L1-type formula, whereas the space discretization is done using, the fully discrete scheme is developed. Under some regularity assumptions, the superconvergence estimate is proposed and analyzed. Then, extension to the case of variable coefficients is also discussed. To reduce the computational cost, the fast evaluation scheme of the Caputo fractional derivative to solve the fractional diffusion equations is designed. Finally, numerical experiments are presented to support the theoretical results.
Nonreflecting boundary conditions for problems of wave propagation are nonlocal in space and time. While the nonlocality in space can be efficiently handled by Fourier or spherical expansions in special geometries, th...
详细信息
Nonreflecting boundary conditions for problems of wave propagation are nonlocal in space and time. While the nonlocality in space can be efficiently handled by Fourier or spherical expansions in special geometries, the arising temporal convolutions still form a computational bottleneck. In the present article, a new algorithm for the evaluation of these convolution integrals is proposed. To compute a temporal convolution over N-t successive time steps, the algorithm requires O(N-t log N-t) operations and O(log N-t) memory. In the numerical examples, this algorithm is used to discretize the Neumann-to-Dirichlet operators arising from the formulation of nonreflecting boundary conditions in rectangular geometries for Schrodinger and wave equations.
Symmetric filters and symmetric extension of image edges have been widely used in wavelet image compression. Since the filters are symmetric, it is possible to take advantage of the symmetric property to reduce the co...
详细信息
Symmetric filters and symmetric extension of image edges have been widely used in wavelet image compression. Since the filters are symmetric, it is possible to take advantage of the symmetric property to reduce the computational complexity for the filtering. In this paper, we present a fast convolution algorithm for the discrete wavelet transform (DWT) and the inverse DWT (IDWT) such that the transform time can be greatly reduced. Compared with regular convolution, the new algorithm can decrease the multiplication operations by nearly one half. Converted into real programming, it sped up the DWT and IDWT in our experiments by at least 12% and 55%, respectively. Incorporated with enhancing zerotree coding, the proposed algorithm results in a rapid and efficient coder. Experimental results showed that the coder is competitive with other high performance coders. The pro posed convolutionalgorithm is also suitable for many types of wavelet-based coding, including wavelet video coding.
In this paper, we propose an efficient approach based on a fast convolution algorithm to reduce the computational complexity of the Least Mean Square (LMS) adaptive algorithm for the quadratic filter, i.e. the quadrat...
详细信息
ISBN:
(纸本)0780374029
In this paper, we propose an efficient approach based on a fast convolution algorithm to reduce the computational complexity of the Least Mean Square (LMS) adaptive algorithm for the quadratic filter, i.e. the quadratic part of the second order Volterra filter (SOVF). The previous works using the fastconvolution in the adaptive LMS filtering are limited to the linear case. We show that this approach reduces the multiplications number by close to 25%, at the expense of only 25% more additions. The steady-state performance of this algorithm is studied for gaussian inputs and in stationary setting. The Steady-State Excess Mean-Square-Error is evaluated, The theoretical performance predictions are shown to be in good agreement with simulation results, especially for small step-sizes.
暂无评论