Speech data typically contains task irrelevant information lying within features. Specifically, phonetic information, speaker characteristic information, emotional information and noise are always mixed together and t...
详细信息
ISBN:
(纸本)9781479928934
Speech data typically contains task irrelevant information lying within features. Specifically, phonetic information, speaker characteristic information, emotional information and noise are always mixed together and tend to impair one another for certain task. We propose a new type of auto-encoder for feature learning called contrastive auto-encoder. Unlike other variants of auto-encoders, contrastive auto-encoder is able to leverage class labels in constructing its representation layer. We achieve this by modeling two auto-encoders together and making their differences contribute to the total loss function. The transformation built with contrastive auto-encoder can be seen as a task-specific and invariant feature learner. Our experiments on TIMIT clearly show the superiority of the feature extracted from contrastive auto-encoder over original acoustic feature, feature extracted from deep auto-encoder, and feature extracted from a model that contrastive auto-encoder originates from.
Dysarthric speech recognition is a challenging task due to acoustic variability and limited amount of available data. Diverse conditions of dysarthric speakers account for the acoustic variability, which make the vari...
详细信息
ISBN:
(纸本)9781713836902
Dysarthric speech recognition is a challenging task due to acoustic variability and limited amount of available data. Diverse conditions of dysarthric speakers account for the acoustic variability, which make the variability difficult to be modeled precisely. This paper presents a variational auto-encoder based variability encoder (VAEVE) to explicitly encode such variability for dysarthric speech. The VAEVE makes use of both phoneme information and low-dimensional latent variable to reconstruct the input acoustic features, thereby the latent variable is forced to encode the phoneme-independent variability. Stochastic gradient variational Bayes algorithm is applied to model the distribution for generating variability encodings, which are further used as auxiliary features for DNN acoustic modeling. Experiment results conducted on the UASpeech corpus show that the VAEVE based variability encodings have complementary effect to the learning hidden unit contributions (LHUC) speaker adaptation. The systems using variability encodings consistently outperform the comparable baseline systems without using them, and obtain absolute word error rate (WER) reduction by up to 2.2% on dysarthric speech with "Very low" intelligibility level, and up to 2% on the "Mixed" type of dysarthric speech with diverse or uncertain conditions.
Face recognition is one of the most widely used biometrics for identifying people. However, face images suffer from several issues that could affect the achieved results, especially in a crowded environment. Such as f...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Face recognition is one of the most widely used biometrics for identifying people. However, face images suffer from several issues that could affect the achieved results, especially in a crowded environment. Such as facial expression, occlusion, low resolution, noise, illumination and pose variation. In this paper, we propose a robust image representation system for face recognition. First, 3D face data reconstructed from 2D images are used instead of 3D capture. This is accomplished by modeling the difference in the texture map of the 3D aligned input and reference images. Then, fusing shape and texture local binary patterns (LBP) on a mesh for face recognition using the Mesh-LBP. Finally, we used a deep auto-encoder to create a compact data representation based on the obtained face images descriptors from the Mesh-LBP. Through experiments conducted on the Multi-PIE and Bosphorus databases, we show that our method is very competitive against state-of-the-art methods.
Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal ima...
详细信息
ISBN:
(纸本)9781728198354
Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.
Interference in millimeter-wave active radar imaging causes harmful effects such as amplitude fluctuation and phase distortion, resulting in deterioration in visualization quality in a radar system employing complex-v...
详细信息
ISBN:
(纸本)9783319701363;9783319701356
Interference in millimeter-wave active radar imaging causes harmful effects such as amplitude fluctuation and phase distortion, resulting in deterioration in visualization quality in a radar system employing complex-valued self-organizing map. We show that a complex-valued auto-encoder is capable of extracting features properly even under these influences, resulting in improvement of clustering performance effectively.
In this paper, an auto-encoder is proposed to learn conversation representation. First, the long short term memory (LSTM) neural network is used to encode the sequence of sentences in a conversation. The interactive c...
详细信息
ISBN:
(纸本)9783319265322;9783319265315
In this paper, an auto-encoder is proposed to learn conversation representation. First, the long short term memory (LSTM) neural network is used to encode the sequence of sentences in a conversation. The interactive context is encoded into a fixed-length vector. Then, through the LSTM-decoder, the learnt representation is used to reconstruct the sentence vectors of a conversation. To train our model, we construct one corpus with 32,881 conversations from the online shopping platform. Finally, experiments on topic recognition task demonstrate the effectiveness of the proposed auto-encoder on learning conversation representation, especially when training data of topic recognition is relatively small.
This paper presents an efficient Deep Neural Network (DNN) design optimized for the modulation classification of the received Radio Frequency (RF) signal. Considering that the transmitted signals are exposed to variou...
详细信息
ISBN:
(纸本)9798350375053;9798350375046
This paper presents an efficient Deep Neural Network (DNN) design optimized for the modulation classification of the received Radio Frequency (RF) signal. Considering that the transmitted signals are exposed to various noise sources through the transmission channel, we propose an adaptive auto-encoder mechanism to suppress the noise efficiently. The proposed auto-encoder enables the adaptive characteristics by adopting the additional parameters to make a balance between the skip connection and the compression/decompression process. The results show that the proposed adaptive auto-encoder can improve the classification accuracy especially in low signal-to-noise ratio (SNR) area. The impact due to the additional network required to generate the balancing parameters on the hardware design is minimized by sharing the data compression process that already exists in the auto-encoder.
Generalization capability of multi-layer perceptron (MLP) depends on the initialization of its weights. If the weights of an MLP are not initialized properly, it may fail to achieve good generalization. In this articl...
详细信息
ISBN:
(数字)9783319710693
ISBN:
(纸本)9783319710693;9783319710686
Generalization capability of multi-layer perceptron (MLP) depends on the initialization of its weights. If the weights of an MLP are not initialized properly, it may fail to achieve good generalization. In this article, we propose a weight initialization technique for MLP to improve its generalization. This is achieved by a regularized stacked auto-encoder based pre-trainingmethod. During pre-training, the weights between each adjacent layers of an MLP, upto the penultimate layer, are trained layer wise by an auto-encoder. To train the auto-encoder, we use weighted sum of two terms: (i) mean squared error (MSE) and (ii) sum of squares of the first order derivatives of the outputs with respect to inputs. Here, the second term acts as a regularizer. It is used to penalize the training of autoencoder during pre-training to generate better initial values of the weights for each successive layers of MLP. To compare the proposed initialization technique with random weight initialization, we have considered ten standard classification data sets. Empirical results show that the proposed initialization technique improves the generalization of MLP.
In this paper, we proposed a semi-automatic pulmonary nodule segmentation algorithm, which is operated within a region of interest for each nodule. It mainly includes two parts: the unsupervised training of auto-encod...
详细信息
ISBN:
(数字)9781510622005
ISBN:
(纸本)9781510622005
In this paper, we proposed a semi-automatic pulmonary nodule segmentation algorithm, which is operated within a region of interest for each nodule. It mainly includes two parts: the unsupervised training of auto-encoder and the supervised training of segmentation network. Applying an auto-encoder's unsupervised learning, we obtain a feature extractor that consists of its encoded part. Through adding some new neural network layers behind the feature extractor and do supervised learning on it, we get the final segmentation neural network. Compared with the traditional maximum two-dimensional entropy threshold segmentation algorithm, the dice correlation coefficient of this algorithm is 1% - 9% higher in 36 regions of interest segmentation experiments.
With the popularity of smartphones, abnormal driving detection via smartphone sensors has been proposed in recent years. However, existing methods are insufficient in exploring feature extraction, so the practical val...
详细信息
ISBN:
(纸本)9781665484855
With the popularity of smartphones, abnormal driving detection via smartphone sensors has been proposed in recent years. However, existing methods are insufficient in exploring feature extraction, so the practical value is limited due to the low accuracy. To address this problem, we propose an attentionbased auto-encoder framework for abnormal driving detection that combines the advantages of bi-directional long short-term memory and self-attention. Specifically, these two modules are embedded in the auto-encoder for modeling latent vector and exploring the internal correlations of spatial-temporal features, respectively, so as to improve the capability of reconstructing driving time series using small and representative features. We conduct experiments on the real-world datasets, and the results show that the proposed framework achieves significant performance with recall and F1-score of 96.2% and 95.0%, superior to the other baselines.
暂无评论