Synthesizing talking face from text and audio is increasingly becoming a direction in human-machine and face-to-face interactions. Although progress has been made, several existing methods either have unsatisfactory c...
详细信息
Synthesizing talking face from text and audio is increasingly becoming a direction in human-machine and face-to-face interactions. Although progress has been made, several existing methods either have unsatisfactory co-articulation modeling effects or ignore relations between adjacent inputs. Moreover, some of these methods often train models on shaky head videos or utilize linear-based face parameterization strategies, which further decrease synthesized quality. To address the above issues, this study proposes a sequence-to-sequence convolutional neural network to automatically synthesize talking face video with accurate lip sync. First, an advanced landmark location pipeline is used to accurately locate the facial landmarks, which can effectively reduce landmark shake. Then, a part-based autoencoder is presented to encode face images into a low-dimensional space and obtain compact representations. A sequence-to-sequence network is also presented to encode the relation of neighboring frames with multiple loss functions, and talking faces are synthesized through a reconstruction strategy with a decoder. Experiments on two public audio-visual datasets and a new dataset called CCTV news demonstrate the effectiveness of the proposed method against other state-of-the-art methods. (C) 2020 Elsevier Ltd. All rights reserved.
In this work we address the problem of real-time dynamic medical (MRI and X-Ray CT) image reconstruction from parsimonious samples (Fourier frequency space for MRI and sinogram/tomographic projections for CT). Today t...
详细信息
In this work we address the problem of real-time dynamic medical (MRI and X-Ray CT) image reconstruction from parsimonious samples (Fourier frequency space for MRI and sinogram/tomographic projections for CT). Today the de facto standard for such reconstruction is compressed sensing (CS). CS produces high quality images (with minimal perceptual loss);but such reconstructions are time consuming, requiring solving a complex optimization problem. In this work we propose to 'learn' the reconstruction from training samples using an autoencoder. Our work is based on the universal function approximation capacity of neural networks. The training time for the autoencoder is large, but is offline and hence does not affect performance during operation. During testing/operation, our method requires only a few matrix vector products and hence is significantly faster than CS based methods. In fact, for MRI it is fast enough for real-time reconstruction (the images are reconstructed as fast as they are acquired) with only slight degradation of image quality;for CT our reconstruction speed is slightly slower than required for real-time reconstruction. However, in order to make the autoencoder suitable for our problem, we depart from the standard Euclidean norm cost function of autoencoders and use a robust 12-norm instead. The ensuing problem is solved using the Split Bregman method.
In big data era, multi-source heterogeneous data become the biggest obstacle to data sharing due to its high dimension and inconsistent structure. Using text classification to solve the ontology construction and mappi...
详细信息
In big data era, multi-source heterogeneous data become the biggest obstacle to data sharing due to its high dimension and inconsistent structure. Using text classification to solve the ontology construction and mapping problem of multi-source heterogeneous data can not only reduce manual operation, but also improve the accuracy and efficiency. This paper proposes an ontology construction and mapping scheme based on hybrid neural network and autoencoder. Firstly, the proposed text classification method uses the multi-core convolutional neural network to capture local features and uses the improved Bidirectional Long Short-Term Memory network to compensate for the shortcomings of the convolutional neural network that cannot obtain context-related information. Secondly, a similarity matching method is used for ontology mapping, which integrate autoencoder to improve anti-interference ability. We have carried out several sets of experiments to test the validity of the proposed ontology construction and mapping scheme.
A key challenge in building machine learning models for time series prediction is the incompleteness of the datasets. Missing data can arise for a variety of reasons, including sensor failure and n...
详细信息
A key challenge in building machine learning models for time series prediction is the incompleteness of the datasets. Missing data can arise for a variety of reasons, including sensor failure and network outages, resulting in datasets that can be missing significant periods of measurements. Models built using these datasets can therefore be biased. Although various methods have been proposed to handle missing data in many application areas, more air quality missing data prediction requires additional investigation. This study proposes an autoencoder model with spatiotemporal considerations to estimate missing values in air quality data. The model consists of one-dimensional convolution layers, making it flexible to cover spatial and temporal behaviours of air contaminants. This model exploits data from nearby stations to enhance predictions at the target station with missing data. This method does not require additional external features, such as weather and climate data. The results show that the proposed method effectively imputes missing data for discontinuous and long-interval interrupted datasets. Compared to univariate imputation techniques (most frequent, median and mean imputations), our model achieves up to 65% RMSE improvement and 20–40% against multivariate imputation techniques (decision tree, extra-trees, k-nearest neighbours and Bayesian ridge regressors). Imputation performance degrades when neighbouring stations are negatively correlated or weakly correlated.
Complex dynamic characteristics resulting from multi-system coupling and closed-loop control are ubiquitous in modern industrial process data, presenting significant challenges for process fault detection. However, co...
详细信息
Complex dynamic characteristics resulting from multi-system coupling and closed-loop control are ubiquitous in modern industrial process data, presenting significant challenges for process fault detection. However, conventional data-driven fault detection methods assume the data to be static or slightly dynamic. Addressing the complex dynamic characteristics and nonlinearity inherent in industrial processes, this paper proposes a dualattention long short-term memory autoencoder (DALSTM-AE) for fault detection in dynamic processes. Long short-term memory (LSTM) and autoencoder (AE) are combined into a special encoder-decoder LSTM architecture to learn both dynamic features and deep representations of variables in an unsupervised manner. Then, a dual-attention module is embedded in the decoder to properly learn the temporal dependencies associated with long input sequences and retain the most critical information. In addition, based on the reconstruction results of the DALSTM-AE model, two monitoring statistics are designed for fault detection. Finally, the effectiveness and superiority of the proposed method are fully demonstrated through case studies on a numerical simulation example, the Tennessee Eastman (TE) benchmark process, and practical coal pulverizing systems in power plants.
With the development of communication, the Internet of Things (IoT) has been widely deployed and used in industrial manufacturing, intelligent transportation, and healthcare systems. The time-series feature of the IoT...
详细信息
With the development of communication, the Internet of Things (IoT) has been widely deployed and used in industrial manufacturing, intelligent transportation, and healthcare systems. The time-series feature of the IoT increases the data density and the data dimension, where anomaly detection is important to ensure hardware and software security. However, for the general anomaly detection methods, the anomaly may be well-reconstructed with tiny differences that are hard to discover. Measuring model complexity and the dataset feature space is a long and inefficient process. In this paper, we propose a memory-augmented autoencoder approach for detecting anomalies in IoT data, which is unsupervised, end-to-end, and not easily overgeneralized. First, a memory mechanism is introduced to suppress the generalization ability of the model, and a memory-augmented time-series autoencoder (TSMAE) is designed. Each memory item is encoded and recombined according to the similarity with the latent representation. Then, the new representation is decoded to generate the reconstructed sample, based on which the anomaly score can be obtained. Second, the addressing vector tends to be sparse by adding penalties and rectification functions to the loss. Memory modules are encouraged to extract typical normal patterns, thus inhibiting model generalization ability. Long short-term memory (LSTM) is introduced for decoding and encoding time-series data to obtain the contextual characteristics of time-series data. Finally, through experiments on the ECG and Wafer datasets, the validity of the TSMAE is verified. The rationality of the hyperparameter setting is discussed by visualizing the memory module addressing vector.
This study focuses on the detection of suspicious transactions characterized by the opaque and complex electronic channels that have emerged with the advancement of electronic financial technology. A model that can im...
详细信息
This study focuses on the detection of suspicious transactions characterized by the opaque and complex electronic channels that have emerged with the advancement of electronic financial technology. A model that can immediately reflect trends in various types of fund and transaction flows, and autonomously learn complex transaction types, is proposed. As a key outcome, an internal control model for detecting suspicious transactions based on the risk-based approach is constructed by utilizing autoencoder to enhance anti-money laundering (AML) operations, and this method surpasses traditional AML methods. Additionally, the proposed model facilitates the extraction of candidate factors for suspicious transactions and updates warning models in AML monitoring systems, thereby allowing for the analysis of alert cases. As a result, AML operations based on the proposed model are quantitatively and qualitatively superior to those based on the traditional approaches, resulting in swift processing by avoiding exhaustive examinations of suspicious transaction types. This research provides information that can improve the AML operation systems used within the financial sector by evaluating the risk of suspicious transactions and reflecting various elements of funds and transactions.
Conventionally, autoencoders are unsupervised representation learning tools. In this work, we propose a novel discriminative autoencoder. Use of supervised discriminative learning ensures that the learned representati...
详细信息
Conventionally, autoencoders are unsupervised representation learning tools. In this work, we propose a novel discriminative autoencoder. Use of supervised discriminative learning ensures that the learned representation is robust to variations commonly encountered in image datasets. Using the basic discriminating autoencoder as a unit, we build a stacked architecture aimed at extracting relevant representation from the training data. The efficiency of our feature extraction algorithm ensures a high classification accuracy with even simple classification schemes like KNN (K-nearest neighbor). We demonstrate the superiority of our model for representation learning by conducting experiments on standard datasets for character/ image recognition and subsequent comparison with existing supervised deep architectures like class sparse stacked autoencoder and discriminative deep belief network.
Spatiotemporal irregularities (i.e., the uncommon appearance and motion patterns) in videos are difficult to detect, as they are usually not well defined and appear rarely in videos. We tackle this problem by learning...
详细信息
Spatiotemporal irregularities (i.e., the uncommon appearance and motion patterns) in videos are difficult to detect, as they are usually not well defined and appear rarely in videos. We tackle this problem by learning normal patterns from regular videos, while treating irregularities as deviations from normal patterns. To this end, we introduce a 3D fully convolutional autoencoder (3D-FCAE) that is trainable in an end-to-end manner to detect both temporal and spatiotemporal irregularities in videos using limited training data. Subsequently, temporal irregularities can be detected as frames with high reconstruction errors, and irregular spatiotemporal patterns can be detected as blurry regions that are not well reconstructed. Our approach can accurately locate temporal and spatiotemporal irregularities thanks to the 3D fully convolutional autoencoder and the explored effective architecture. We evaluate the proposed autoencoder for detecting irregular patterns on benchmark video datasets with weak supervision. Comparisons with state-of-the-art approaches demonstrate the effectiveness of our approach. Moreover, the learned autoencoder shows good generalizability across multiple datasets. (C) 2020 Elsevier Inc. All rights reserved.
Unsupervised signal modulation clustering is becoming increasingly important due to its application in the dynamic spectrum access process of 5G wireless communication and threat detection at the physical layer of Int...
详细信息
Unsupervised signal modulation clustering is becoming increasingly important due to its application in the dynamic spectrum access process of 5G wireless communication and threat detection at the physical layer of Internet of Things. The need for better clustering results makes it a challenge to avoid feature drift and improve feature separability. This article proposes a novel separable loss function to address the issue. Besides, the high-level semantic properties of modulation types make it difficult for networks to extract their features. An autoencoder structure based on the random Fourier feature (RffAe) is proposed to simulate the demodulation process of unknown signals. Combined with the separable loss of RffAe (RffAe-S), it has excellent feature extraction ability. Great experiments were carried out on RADIOML 2016.10 A and RADIOML 2016.10 B. Experimental evaluations on these datasets show that our approach RffAe-S achieves state-of-the-art results compared to classical and the most relevant deep clustering methods.
暂无评论