Visual tracking algorithms based on deep learning have robust performance against variations in a complex environment because deep learning can learn generic features from numerous unlabeled images. However, due to th...
详细信息
ISBN:
(纸本)9781479951994
Visual tracking algorithms based on deep learning have robust performance against variations in a complex environment because deep learning can learn generic features from numerous unlabeled images. However, due to the multi-layer architecture, the deep learning trackers suffer from expensive computational costs and are not suitable for real-time applications. In this paper, a low-complexity visual tracking scheme with single hidden layer neural network is proposed based on denoising autoencoder. To further reduce the computational costs, feature selection is applied to simplify the networks and two optimization methods are used during the online tracking process. The experimental results have demonstrated that the proposed algorithm is about six times faster than the trackers based on deep nets and rapid enough for real-time applications with encouraging accuracy.
The stable operation of a power system often depends on inscribing the faults that may arise when transmitting and distributing electrical power. Characterizing these faults is necessary to analyze the post-fault osci...
详细信息
ISBN:
(纸本)9781665434515
The stable operation of a power system often depends on inscribing the faults that may arise when transmitting and distributing electrical power. Characterizing these faults is necessary to analyze the post-fault oscillography of the power lines. The power lines are prone to be affected by noises. The noises are responsible to introduce uncertainty in operating conditions. The variation in operating conditions leads to an unbalanced system. The diagnosis of faults is essential to ensure the secured operation of a power network. This paper introduces a unified unsupervised learning framework for short circuit fault analysis of a power transmission line. The proposed approach works with a small number of data set and reduces the computational cost. It uses a capsule network that investigates the low-level fault-oriented features. To guarantee the robustness of the proposed framework against noises a stacked denoising-autoencoder is integrated and modeled. The performance of the proposed model is measured and compared with some of the techniques available in the literature in terms of noise. The test with field data for three types of fault classification results in an accuracy of 9 ms for fault triggering.
A source signalis estimated using an associative memory model (AMM) and used for separation matrix optimization in linear blind source separation (BSS) to yield high quality and less distorted speech. Linear-filtering...
详细信息
ISBN:
(纸本)9780992862633
A source signalis estimated using an associative memory model (AMM) and used for separation matrix optimization in linear blind source separation (BSS) to yield high quality and less distorted speech. Linear-filtering-based BSS, such as independent vector analysis (IVA), has been shown to he effective in sound source separation while avoiding non-linear signal distortion. This technique, however, requires several assumptions of sound sources being independent and generated from non-Gaussian distribution. We propose a method for estimating a linear separation matrix without, any assumptions about the sources by repeating the following two steps: estimating non-distorted reference signals by using an AMM and optimizing the separation matrix to minimize an error between the estimated signal and reference signal. Experimental comparisons carried out in simultaneous speech separation suggest that the proposed method can reduce the residual distortion caused by IVA.
This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background. We consider two different situations: 1) scenarios with very small amount of labeled training utterances (duration 1 ho...
详细信息
ISBN:
(纸本)9781538646588
This paper addresses the task of Automatic Speech Recognition (ASR) with music in the background. We consider two different situations: 1) scenarios with very small amount of labeled training utterances (duration 1 hour) and 2) scenarios with large amount of labeled training utterances (duration 132 hours). In these situations, we aim to achieve robust recognition. To this end we investigate the following techniques: a) multi-condition training of the acoustic model, b) denoising autoencoders for feature enhancement and c) joint training of both above mentioned techniques. We demonstrate that the considered methods can be successfully trained with the small amount of labeled acoustic data. We present substantially improved performance compared to acoustic models trained on clean speech. Further, we show a significant increase of accuracy in the under-resourced scenario, when utilizing additional amount of non-labeled data. Here, the non-labeled dataset is used to improve the accuracy of the feature enhancement via autoencoders. Subsequently, the autoencoders are jointly fine-tuned along with the acoustic model using the small amount of labeled utterances.
In this paper, we explore the potential of using deep learning for extracting speaker-dependent features for noise robust speaker identification. More specifically, an SNR-adaptive denoising classifier is constructed ...
详细信息
ISBN:
(纸本)9789881476807
In this paper, we explore the potential of using deep learning for extracting speaker-dependent features for noise robust speaker identification. More specifically, an SNR-adaptive denoising classifier is constructed by stacking two layers of restricted Boltzmann machines (RBMs) on top of a denoising deep autoencoder, where the top-RBM layer is connected to a soft-max output layer that outputs the posterior probabilities of speakers and the top-RBM layer outputs speaker-dependent bottleneck features. Both the deep autoencoder and RBMs are trained by contrastive divergence, followed by backpropagation fine-tuning. The autoencoder aims to reconstruct the clean spectra of a noisy test utterance using the spectra of the noisy test utterance and its SNR as input. With this denoising capability, the output from the bottleneck layer of the classifier can be considered as a low-dimension representation of denoised utterances. These frame-based bottleneck features are than used to train an iVector extractor and a PLDA model for speaker identification. Experimental results based on a noisy YOHO corpus show that the bottleneck features slightly outperform the conventional MFCC under low SNR conditions and that fusion of the two features lead to further performance gain, suggesting that the two features are complementary with each other.
For speech recognition in noisy environments, we propose a multi-task autoencoder which estimates not only clean speech features but also noise features from noisy speech. We introduce the deSpeeching autoencoder, whi...
详细信息
ISBN:
(纸本)9781538646588
For speech recognition in noisy environments, we propose a multi-task autoencoder which estimates not only clean speech features but also noise features from noisy speech. We introduce the deSpeeching autoencoder, which excludes speech signals from noisy speech, and combine it with the conventional denoising autoencoder to form a unified multi-task autoencoder (MTAE). We evaluate it using the Aurora 2 dataset and CHIME 3 dataset. It reduced WER by 15.7% from the conventional denoising autoencoder in the Aurora 2 test set A.
Communication system mismatch represents a major influence for loss in speaker recognition performance. This paper considers a type of nonlinear communication system mismatch- modulation/demodulation (Mod/DeMod) carri...
详细信息
ISBN:
(纸本)9781510817906
Communication system mismatch represents a major influence for loss in speaker recognition performance. This paper considers a type of nonlinear communication system mismatch- modulation/demodulation (Mod/DeMod) carrier drift in single side band (SSB) speech signals. We focus on the problem of estimating frequency offset in SSB speech in order to improve speaker verification performance of the drifted speech. Based on a two-step framework from previous work, we propose using a multi-layered neural network architecture, stacked denoising autoencoder (SDA), to determine the unique interval of the offset value in the first step. Experimental results demonstrate that the SDA based system can produce up to a +16.1% relative improvement in frequency offset estimation accuracy. A speaker verification evaluation shows a +65.9% relative improvement in EER when SSB speech signal is compensated with the frequency offset value estimated by the proposed method.
The use of computer-aided image analysis for disease diagnosis and prognosis has dramatically increased during the past 10 years. The introduction of computer-assisted image analysis of images produced by equipme...
详细信息
In contrast to fully-supervised models, self-supervised representation learning only needs a fraction of data to be labeled and often achieves the same or even higher downstream performance. The goal is to pre-train d...
详细信息
ISBN:
(纸本)9783031585463;9783031585470
In contrast to fully-supervised models, self-supervised representation learning only needs a fraction of data to be labeled and often achieves the same or even higher downstream performance. The goal is to pre-train deep neural networks on a self-supervised task, making them able to extract meaningful features from raw input data afterwards. Previously, autoencoders and Siamese networks have been successfully employed as feature extractors for tasks such as image classification. However, both have their individual shortcomings and benefits. In this paper, we combine their complementary strengths by proposing a new method called SidAE (Siamese denoising autoencoder). Using an image classification downstream task, we show that our model outperforms two self-supervised baselines across multiple data sets and scenarios. Crucially, this includes conditions in which only a small amount of labeled data is available. Empirically, the Siamese component has more impact, but the denoising autoencoder is nevertheless necessary to improve performance.
Human behavior anomaly detection in video aims to identify unusual behaviors that are crucial for public safety. Recently, there has been an increase in reconstruction or predictionbased methods that integrate diverse...
详细信息
ISBN:
(纸本)9798350359329;9798350359312
Human behavior anomaly detection in video aims to identify unusual behaviors that are crucial for public safety. Recently, there has been an increase in reconstruction or predictionbased methods that integrate diverse modal features to enhance anomaly detection. However, they use methods that independently or directly fusion multimodal features without fully considering the collaborative potential between multimodal features, which are susceptible to interference from semantic differences, thereby impacting detection performance. In contrast, we design a collaborative framework using multimodal data and adaptive noise for behavior anomaly detection. Our framework detects anomalies by analyzing the contrastive differences between two modalities alongside single-frame reconstruction errors. Specifically, we first learn the correlation between RGB and skeletal modalities for normal behavior through contrastive learning and use inter-modal contrast difference to detect motion anomalies. Additionally, we propose a single-frame reconstruction network that adaptively adds noise based on the importance of foreground features to detect appearance anomalies. Anomalies often occur in the motion foreground, and increasing noise in this area can make it more difficult to reconstruct anomalies. Extensive experiments validate the state-of-the-art performance of our method on three public datasets.
暂无评论