In addition to static features, dynamic features are also important for smoke recognition. 3d convolution can extract temporal and spatial information from video sequences. Currently, for video smoke detection, 3d con...
详细信息
In addition to static features, dynamic features are also important for smoke recognition. 3d convolution can extract temporal and spatial information from video sequences. Currently, for video smoke detection, 3d convolution is usually used as a tool for secondary judgment of the detection results of single frame approaches. In this work, an end-to-end object detection neural network based on 3d convolution for video smoke detection, named3dVSd, is proposed for the first time. The network captures moving objects from the input video se-quences by the dynamic feature extraction part first and then inputs the feature tensor to the static feature extraction part for recognition and localization, which makes full use of the spatiotemporal features of smoke and improves the reliability of the algorithm. In addition, a time-series smoke video dataset for network training is proposed. The proposed algorithm is compared with other related studies. The experimental results demon-strated that the 3dVSd is promising with an accuracy rate of 99.54%, a false alarm rate of 1.11%, and a misseddetection rate of 0.14%, and meets the requirements of real-time detection.
Pansharpening fuses spectral information from the multi-spectral image and spatial information from the panchromatic image, generating super-resolution multi-spectral images with high spatial resolution. In this paper...
详细信息
Pansharpening fuses spectral information from the multi-spectral image and spatial information from the panchromatic image, generating super-resolution multi-spectral images with high spatial resolution. In this paper, we proposed a novel 3d multi-scale attention convolutional network (MSAC-Net) based on the typical U-Net framework for multi-spectral imagery pansharpening. MSAC-Net is designed via 3d convolution, and the attention mechanism replaces the skip connection between the contraction and expansion pathways. Multiple pansharpening layers at the expansion pathway are designed to calculate the reconstruction results for preserving multi-scale spatial information. The MSAC-Net performance is verified on the IKONOS and QuickBird satellites' datasets, proving that MSAC-Net achieves comparable or superior performance to the state-of-the-art methods. Additionally, 2d and3d convolution are compared, and the influences of the number of convolutions in the convolution block, the weight of multi-scale information, and the network's depth on the network performance are analyzed.
Human Action Recognition (HAR) is a challenging domain in computer vision, involving recognizing complex patterns by analyzing the spatiotemporal dynamics of individuals’ movements in videos. These patterns arise in ...
详细信息
Infrareddim and small target detection is widely used in military and civil fields. Traditional methods in that application rely on the local contrast between the target and background for single-frame detection. On ...
详细信息
Infrareddim and small target detection is widely used in military and civil fields. Traditional methods in that application rely on the local contrast between the target and background for single-frame detection. On the other hand, those algorithms depend on the motion model with fixed parameters for multi-frame association. For the great similarity of gray value and the dynamic changes of motion model parameters in the condition of low SNR and strong clutter, those methods possess weak robustness, low detection probability, and high false alarm rate. In this paper, an infrared video sequences encoding anddecoding model based on Bidirectional convolutional Long Short-Term Memory structure (Bi-Conv-LSTM) and3d convolutional structure (3d-Conv) is proposed, addressing the problem of high similarity anddynamic changes of parameters. For solving the problem of dynamic change in parameters, Bi-Conv-LSTM structure is used to learn the motion model of targets. And for the problem of low local contrast, 3d-Conv structure is adopted to extend receptive field in the time dimension. In order to improve the precision of detection, the decoding part is divided into two different full connections with distinctive active function. Simulation results show that the trajectory detection accuracy of the proposed model is more than 90% under the condition of low SNR and maneuvering motion, which is better than traditional method of 80% in dB-TBd 20% in others. Real data experiment to illustrate that that our proposed method can detect small infrared targets of a low false alarm rate and high detection probability.
Recently, as the application of the convolutional neural network in artificial intelligence is becoming increasingly diversified, a growing number of neural network methods are put forward. For example, 3d convolution...
详细信息
ISBN:
(纸本)9781728119854
Recently, as the application of the convolutional neural network in artificial intelligence is becoming increasingly diversified, a growing number of neural network methods are put forward. For example, 3d convolution and two-stream convolution method based on RGB and optical stream are applied to the neural network. convolutional neural network with 3d convolutional core is able to extract spatio-temporal features directly from a set of video sequences, used for action recognition. Although the 3d convolutional neural network can obtain partial spatio-temporal information, a new ConvNet architecture called CVdN(Combined Video-stream deep Network) is proposed to extract more spatio-temporal features from video fragments so as to effectively utilize the temporal information in the dataset. We evaluate our method on the UCF-101 dataset and obtain a good result. The following is some details about our method: First, we use pre-trained ResNets models on Kinetics dataset to initialize our training models, training and extracting the video stream features from UCF-101 dataset. Then, optical flow graphs obtained from the UCF-101 dataset, which are the input of the optical stream, are used to extract the optical features. At length, two-stream features are combined and the results are obtained after Softmax layer. When the linear fusion ratio of video stream features and optical stream features is 5:4, CVdN obtains good results. And the accuracy of our method with Resnet-101 achieves 92.2%.
In the contemporary surveillance schemes of Computer Vision, videos concerning human action categorization have become a predominant zone, involving Pattern Recognition tasks. Factually, most of the human actions comp...
详细信息
In the contemporary surveillance schemes of Computer Vision, videos concerning human action categorization have become a predominant zone, involving Pattern Recognition tasks. Factually, most of the human actions comprise complex temporal information, and it is quite difficult to discover the diverse activities of humans precisely, in an unpredictable variety of environmental circumstances. A deep Learning paradigm can tackle this issue, by providing additional capabilities to vision-based human action recognition. However, there are more complex challenges in extracting the spatio-temporal features, for instance, the presence of noise in videos and the highly vague feature points. This paper proposes a hybrid intelligent Intuitionistic Fuzzy 3d Convolution Neural Network that uses Chaotic Quantum Swarm Intelligence (CQSI-IFCNN), to optimize video-based human action categorization. Vagueness and ambiguity of input video frames are inherited by Intuitionistic Fuzzy networks in terms of membership, hesitation and non-membership components. By applying Chaotic Quantum Swarm Intelligence (CQSI), the learning parameters and error rates that occur in standardconvolutional neural network are considerably reduced. The chaotic searching scheme is applied to overcome premature local optima in Quantum Swarm Intelligence. Therefore, this model produces optimized outcomes in Intuitionistic fuzzy 3d convolutional Neural Networks, thus improving the categorization of human actions in videos. The Performance of CQSI-IFCNN is assessed by using the KTH and UCF Sports Action datasets. From the simulation outcomes, it is observed that CQSI-IFCNN has attained a higher rate of action categorization accuracy than standard CNN and PSO-CNN.
暂无评论