In this paper,a novel methodology is presented to settle the region of interest(ROI) detection problem in vehicle color recognition so as to remove the redundant components of vehicles that interfere greatly with colo...
详细信息
In this paper,a novel methodology is presented to settle the region of interest(ROI) detection problem in vehicle color recognition so as to remove the redundant components of vehicles that interfere greatly with color *** order to make full use of the local color and spatial information,vehicle images are divided into different superpixels at *** spatial relationship between superpixels and the outermost pixels is then used for the background removal of vehicle *** comparing with the vehicle window clustering centroids obtained by k-means,the superpixels close to the universal color characteristics of windows are removed so that the dominant color superpixels are ***,a linear Support Vector Machine classifier is trained for color *** experiments demonstrate that the proposed methodology is effective for color region of interest detection and thus contribute to vehicle color recognition.
Remote sensing image scene classification is one of the key points in remote sensing image interpretation. The traditional remote sensing image scene classification feature performance is not strong, and the deep lear...
详细信息
ISBN:
(数字)9781728132488
ISBN:
(纸本)9781728132495
Remote sensing image scene classification is one of the key points in remote sensing image interpretation. The traditional remote sensing image scene classification feature performance is not strong, and the deep learning extraction semantic feature process is complex. This paper proposes a fusion feature remote sensing image scene classification method which is based on artificial features and deep learning semantic features. Firstly, the SURF feature of the remote sensing image is extracted and encoded by the VLAD algorithm. The semantic feature of a remote sensing image is extracted by transfer learning. Then the feature reduction is performed by PCA algorithm and feature fusion is performed. Finally, the scene classifier is trained by using the random forest algorithm. The experimental results show that the classification accuracy and Kappa coefficient of this method are higher and the method is effective.
Background/Purpose Multiple parametric imaging in positron emission tomography (PET) is challenging due to the noisy dynamic data and the complex mapping to kinetic parameters. Although methods like direct parametric ...
详细信息
Background/Purpose Multiple parametric imaging in positron emission tomography (PET) is challenging due to the noisy dynamic data and the complex mapping to kinetic parameters. Although methods like direct parametric reconstruction have been proposed to improve the image quality, limitations persist, particularly for nonlinear and small-value micro-parameters (e.g., k 2 , k 3 ). This study presents a novel unsupervised deep learning approach to reconstruct and improve the quality of these micro-parameters. Methods We proposed a direct parametric image reconstruction model, DIP-PM, integrating deep image prior (DIP) with a parameter magnification (PM) strategy. The model employs a U-Net generator to predict multiple parametric images using a CT image prior, with each output channel subsequently magnified by a factor to adjust the intensity. The model was optimized with a log-likelihood loss computed between the measured projection data and forward projected data. Two tracer datasets were simulated for evaluation: 82 Rb data using the 1-tissue compartment (1 TC) model and 18 F-FDG data using the 2-tissue compartment (2 TC) model, with 10-fold magnification applied to the 1 TC k 2 and the 2 TC k 3 , respectively. DIP-PM was compared to the indirect method, direct algorithm (OTEM) and the DIP method without parameter magnification (DIP-only). Performance was assessed on phantom data using peak signal-to-noise ratio (PSNR), normalized root mean square error (NRMSE) and structural similarity index (SSIM), as well as on real 18 F-FDG scan from a male subject. Results For the 1 TC model, OTEM performed well in K 1 reconstruction, but both indirect and OTEM methods showed high noise and poor performance in k 2 . The DIP-only method suppressed noise in k 2 , but failed to reconstruct fine structures in the myocardium. DIP-PM outperformed other methods with well-preserved detailed structures, particularly in k 2 , achieving the best metrics (PSNR: 19.00, NRMSE: 0.3002, SSIM: 0
Person re-identification is an important task in the field of intelligent video surveillance, which has become one of the research focus spots in the field of computer vision. Video-based person re-identification aims...
详细信息
ISBN:
(纸本)9781538646748;9781538646731
Person re-identification is an important task in the field of intelligent video surveillance, which has become one of the research focus spots in the field of computer vision. Video-based person re-identification aims to verify a pedestrian identity of the video sequences which captured from non-overlapping cameras at different time. In this paper, we propose a novel feature extractor based on LSTM networks. These LSTM networks are used to extract the effective space-time feature representation named the attribute-constraints space-time feature (ASTF). Different from other methods, we manually annotate pedestrians in videos with three attributes. In the meantime, the attributes with the IDs of pedestrians are regarded as labels to train the feature extractor. The ASTF representation for a testing video is extracted by this feature extractor, which is an effective space-time feature representation for video-based re-identification. Extensive experiments on two public datasets demonstrate that our approach outperforms the state-of-the-art video-based re-identification methods.
Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* mapping. Methods: ...
详细信息
With the rapid development of artificial intelligence (AI) in medical imageprocessing, deep learning in color fundus photography (CFP) analysis is also evolving. Although there are some open-source, labeled datasets ...
详细信息
This paper presents a study on the use of input codes in the neural network acoustic modeling for expressive TTS. Specifically, we use different kinds of input codes, augmented with the linguistic features, as the inp...
详细信息
ISBN:
(纸本)9781538653128
This paper presents a study on the use of input codes in the neural network acoustic modeling for expressive TTS. Specifically, we use different kinds of input codes, augmented with the linguistic features, as the input of a BLSTM-based acoustic model, to control the expressivity of the synthesized speech. The input codes, in one-hot representation, include dialogue code, sentiment code and sentence position code. The dialogue code indicates whether the text is a dialogue or narration in an audiobook story. The sentiment code is obtained from a sentiment analysis tool, which labels each sentence as positive, negative and neutral. The sentence position code indicates the position of the sentence in the paragraph. We believe these codes are highly related to the expressiveness of the audiobook speech. Experiments on the data from the Blizzard Challenge 2017 demonstrate the effectiveness of the use of input codes in the neural network approach for expressive TTS.
Different from RGB videos, depth data in RGB-D videos provide key complementary information for tristimulus visual data which potentially could achieve accuracy improvement for action recognition. However, most of the...
详细信息
Aiming at the problem of low detection accuracy of multi-mode mean model in complex scenarios, an improved detection method of moving target based on multi-mode mean model is ***, the background model is constructed u...
Aiming at the problem of low detection accuracy of multi-mode mean model in complex scenarios, an improved detection method of moving target based on multi-mode mean model is ***, the background model is constructed using the multi-mode mean value model. According to different scene information, different thresholds are set and adjusted adaptively. The foreground image obtained by background difference method is detected by frame difference method, and the experiment is compared and analyzed. The detection rate and the error rate are reduced, and the detection accuracy is improved. Finally, the simulation results of three-segment video verify the effectiveness of the proposed method.
In order to improve the time-consuming and large error problem of camera motion estimation in dense trajectory feature extraction of video, a dense trajectory action recognition algorithm based on Improved Speeded-Up ...
In order to improve the time-consuming and large error problem of camera motion estimation in dense trajectory feature extraction of video, a dense trajectory action recognition algorithm based on Improved Speeded-Up Robust Features (SURF) is proposed. The algorithm mainly performs dense sampling of video images, and then executes camera motion estimation. In the feature point detection stage, the Gaussian pyramid layer was constructed dynamically to improve the real-time and accuracy of feature point extraction. Based on the SURF algorithm, the brightness center algorithm is used to obtain direction of feature. Binary Robust Independent Elementary Feature (BRIEF) is used to generate feature descriptors to determine matching points and optimized images, then to conducts feature tracking and feature extraction on the images to classify features. The experimental results show that the algorithm performs better in terms of speed when removing camera motion, and improves the real-time performance of feature extraction and the accuracy of action recognition.
暂无评论