The Convolutional Neural Networks (CNNs) architecture is one of the most widely used deep learning tools. The execution time of CNNs is dominated by the time spent on the convolution steps. Most CNNs implementations a...
详细信息
ISBN:
(纸本)9781450380751
The Convolutional Neural Networks (CNNs) architecture is one of the most widely used deep learning tools. The execution time of CNNs is dominated by the time spent on the convolution steps. Most CNNs implementations adopt an approach that lowers the convolution into a matrix-based operation through the im2col (image to column) process. The transformed convolution then can be easily parallelized with highly efficient BLAS libraries. The contribution of this paper is that we observe significant but intricately patterned data redundancy in this matrix representation of convolution. This redundancy has not been exploited before to improve the performance of CNNs. In this paper, we analyze the origin of the redundancy generated by the im2col process, and reveal a new data pattern to more mathematically concisely describe the matrix representation for convolution. Based on this redundancy-minimized matrix representation, we implement a FFT-based convolution with finer FFT granularity. It achieves on average 23% and maximum 50% speedup over the regular FFT convolution, and on average 93% and maximum 286% speedup over the Im2col+GEMM method from NVIDIA's cuDNN library, one of the most widely used CNNs libraries. Moreover, by replacing existing methods with our new convolution method in a popular deep-learning programming framework Caffe, we observe on average 74% speedup for multiple synthetic CNNs in closer-to-real-world application scenarios and 25% speedup for a variant of the VGG network.
In Differential Evolution (DE), there are many adaptive DE algorithms proposed for parameter adaptation. However, they mainly focus on tuning the mutation factor F and the crossover probability CR. The adaptation of p...
详细信息
ISBN:
(纸本)9781467363433
In Differential Evolution (DE), there are many adaptive DE algorithms proposed for parameter adaptation. However, they mainly focus on tuning the mutation factor F and the crossover probability CR. The adaptation of population size NP has not been widely studied in the literature of DE. Reducing population size without jeopardizing the performance of an algorithm could save computational resources and hence accelerate it's convergence speed. This is beneficial to algorithms for optimization problems which need expensive evaluations. In this paper, we propose an improved population reduction method for DE, called dynNPMinD-DE, by considering the difference between individuals. When the reduction criterion is satisfied, dynNPMinD-DE selects the best individual and pairs of individuals with minimal-step difference vectors to form a new population. dynNPMinD-DE is tested on a set of 13 scalable benchmark functions in the number of dimensions of D=30 and D=50, respectively. The results show that dynNPMinD-DE outperforms the other peer DE algorithms in terms of both solution accuracy and convergence speed on most test functions.
The objective of this work is to use efficiently various sensors to create a SLAM system. This algorithm has to be fast (real-time), computationally light and efficient enough to allow the robot to navigate in the env...
详细信息
ISBN:
(纸本)9789897583803
The objective of this work is to use efficiently various sensors to create a SLAM system. This algorithm has to be fast (real-time), computationally light and efficient enough to allow the robot to navigate in the environment. Because other processes embedded require large amount of cpu-time, our objective was to use efficiently complementary sensors to obtain a fairly accurate localization with minimal computation. To reach this, we used a combination of two sensors: a 2D lidar and a camera, mounted above each other on the robot and oriented toward the same direction. The objective is to pinpoint and cross features in the camera and lidar FOV. Our optimized algorithms are based on segments detection. We decided to observe intersections between vertical lines seen with the camera and locate them in 3D with the ranges provided by the 2D lidar. First we implemented a RGB vertical line detector using RGB gradient and linking process, then a lidar data segmentation with accelerated computation and finally we used this feature detector in a Kalman filter. The final code is evaluated and validated using an advanced real-time robotic simulator and later confirmed with a real experiment.
Public transportation is not only a significant symbol of the urban modernization, but also an optimal approach of solving urban crowded traffic problem. The bus route search algorithm is the key technical query syste...
详细信息
ISBN:
(纸本)9781479942626
Public transportation is not only a significant symbol of the urban modernization, but also an optimal approach of solving urban crowded traffic problem. The bus route search algorithm is the key technical query system. This paper mainly discusses an algorithm based on set theory and proposes technological flow of transfer algorithm. An improved plan is presented. The algorithm is simple and effective, helping users select bus route quickly.
This research work proposes the design of a latest generation digital filter, which is particularly suitable for the optimization of environment measurements. Filters are important components of signal processing that...
详细信息
ISBN:
(纸本)9798350395334;9798350395327
This research work proposes the design of a latest generation digital filter, which is particularly suitable for the optimization of environment measurements. Filters are important components of signal processing that used for correct acquiring and analysis of environmental information. But in current designs various problems of accuracy, response time and power consumption are usually encountered. In this directed method, we have incorporated novel algorithms as well as optimization procedures that enhance the filter's efficiency. Experiments substantiate the superiority of the proposed approach compared to prior art in such categories as accuracy and time. Other real life environmental monitoring applications also support our observations and the efficiency of our proposed approach. Such profuse outcomes have scientific significance to progress in the Emergency and environmental monitoring systems contributing to higher reliability and accuracy.
In order to overcome some technical problems in 3D modeling and model interaction operation,a new geological reserves calculation method is proposed based on 3D entity ***,this paper discusses how to infer and link mi...
详细信息
ISBN:
(纸本)9783037850992
In order to overcome some technical problems in 3D modeling and model interaction operation,a new geological reserves calculation method is proposed based on 3D entity ***,this paper discusses how to infer and link mine rock boundary lines and generate geological profile map with it the 3D entity model of ore body can be ***,the principle of geological reserves calculation is discussed in *** calculation results show that this method is of characteristic of high precision and easy operation compared with the traditional empirical formula and provides a new operation and design platform for engineering designers.
Optical tomography imaging is widely used in target-detection, aerospace precision instrumentation and geological material detection for its non-contact, long-distance and high-precision imaging characteristics. Due t...
详细信息
ISBN:
(纸本)9781510636323
Optical tomography imaging is widely used in target-detection, aerospace precision instrumentation and geological material detection for its non-contact, long-distance and high-precision imaging characteristics. Due to the different application range and structural design of the tomography system, the inevitable inadequate projection data and the offset of the rotation center may occur in the actual acquisition, which may cause artifacts and unclearness in the reconstructed image. In this paper, based on the research of image reconstruction algorithm, the paper compares filter back projection algorithm with iterative algorithm and analyzes effects of the reconstruction process with iterative algorithm under multiple parameters. Determining the appropriate weighting model, iteration number and relaxation factor, etc. Combined with high quality initial image and convex set constraints, an optimized SART algorithm is proposed. The experiment uses the optimized SART algorithm for image reconstruction. By comparing the image evaluation parameters with sharpness and average gradient, it is verified that the construction image with the optimized SART is better and clearer than those with the unoptimized SART and the simple filtered back projection algorithm.
In this paper, we propose a scalable network for tiny object detection based on Faster RCNN. Compared with the previous feature extraction network, our network can be better applied to tiny objects. In the process of ...
详细信息
ISBN:
(纸本)9781538682463
In this paper, we propose a scalable network for tiny object detection based on Faster RCNN. Compared with the previous feature extraction network, our network can be better applied to tiny objects. In the process of feature extraction, the feature representation of large object will be strengthened, and the important tiny object information is ignored. By merging the feature maps output from different filters on the same layer, different sizes of targets will be captured. Then, not only considers the width of the network, but also realizes the deep integration of the network, which can avoid that the network is too deep to filter out tiny target information. Finally, by optimizing the algorithm for tiny objects based on deep learning, we achieved the best results with the accuracy rate of 34.1% on the Tsinghua-Tencent 100K.
In view of the problems of low efficiency and high cost of manual cloth loading,the introduction of loading robots has become a key strategy to improve the loading *** three-dimensional detection system was designed o...
详细信息
ISBN:
(纸本)9798350370010;9798350370003
In view of the problems of low efficiency and high cost of manual cloth loading,the introduction of loading robots has become a key strategy to improve the loading *** three-dimensional detection system was designed on the basis of the loading robot, and the hardware design and software demand analysis software were divided and designed. The system uses advanced 3D sensors as the core perception equipment, which is installed at the front end of the robot arm to realize the intelligent detection of environmental information in the truck compartment. Based on the TOF time-of-flight technology, high-quality point cloud data was obtained. In terms of point cloud processing, the normal estimation algorithm and the RANSAC circle fitting algorithm are proposed, which aim to optimize the quality of point cloud data, reduce noise, and accelerate the real-time processing *** can generate high-precision three-dimensional point cloud data, obtain the position and size of the detection carriage, and detect the loading status of the cloth, which can achieve higher speed and more accurate grasping and placing of the cloth, reduce the error, thereby significantly improving the overall loading efficiency and reducing labor costs.
Conjunctive Boolean query is one fundamental operation for document retrieval in many information systems and databases. In its most basic and popular form, a conjunctive query can be seen as the intersection problem ...
详细信息
ISBN:
(纸本)9781538636817
Conjunctive Boolean query is one fundamental operation for document retrieval in many information systems and databases. In its most basic and popular form, a conjunctive query can be seen as the intersection problem of multiple sets of sorted integers. Various algorithms have been put up in terms of maximizing the query efficiency. In recent years, researchers began to exploit the parallel advantage of single-instruction multiple-data (SIMD) instructions to accelerate the intersection procedure and achieved substantial gains over previous scalar algorithms. However, these works only focus on intersecting two sets at a time and ignore the scenario of multiple sets intersection. Missing from the literature is a thorough study that explores the combination of traditional multiple sets intersection algorithms and SIMD instructions. This article discusses software optimizations for the intersection algorithms via AVX2 and AVX512 SIMD instructions of modern processor architectures. Through an experimental analysis we show that the proposed is able to reduce comparisons executed while improving instruction throughput, thus gaining performance enhancement over previous methods.
暂无评论