With growing dependence of industrial robots, a failure of an industrial robot may interrupt current operation or even overall manufacturing workflows in the entire production line, which can cause significant economi...
详细信息
With growing dependence of industrial robots, a failure of an industrial robot may interrupt current operation or even overall manufacturing workflows in the entire production line, which can cause significant economic losses. Hence, it is very essential to maintain industrial robots to ensure high-level performance. It is widely desired to have a real-time technique to constantly monitor robots by collecting time series data from robots, which can automatically detect incipient failures before robots totally shut down. Model-based methods are typically used in anomaly detection for robots, yet explicit domain knowledge and accurate mathematical models are required. Data-driven techniques can overcome these limitations. However, a major difficulty for them is the lack of sufficient fault data of industrial robots. Besides, the used technique for anomaly detection of robots should be required to not only capture the temporal dependency in collected time series data, but also the inter-correlations between different metrics. In this paper, we introduce an unsupervised anomaly detection for industrial robots, sliding-window convolutional variational autoencoder (SWCVAE), which can realize real-time anomaly detection spatially and temporally by coping with multivariate time series data. This method has been verified by a KUKA KR6R 900SIXX industrial robot, and the results prove that the proposed model can successfully detect anomaly in the robot. Thus, this work presents a promising tool for condition-based maintenance of industrial robots.
We present a novel end-to-end partially supervised deep learning approach for video anomaly detection and localization using only normal samples. The insight that motivates this study is that the normal samples can be...
详细信息
We present a novel end-to-end partially supervised deep learning approach for video anomaly detection and localization using only normal samples. The insight that motivates this study is that the normal samples can be associated with at least one Gaussian component of a Gaussian Mixture Model (GMM), while anomalies either do not belong to any Gaussian component. The method is based on Gaussian Mixture variational autoencoder, which can learn feature representations of the normal samples as a Gaussian Mixture Model trained using deep learning. A Fully Convolutional Network (FCN) that does not contain a fully-connected layer is employed for the encoder-decoder structure to preserve relative spatial coordinates between the input image and the output feature map. Based on the joint probabilities of each of the Gaussian mixture components, we introduce a sample energy based method to score the anomaly of image test patches. A two-stream network framework is employed to combine the appearance and motion anomalies, using RGB frames for the former and dynamic flow images, for the latter. We test our approach on two popular benchmarks (UCSD Dataset and Avenue Dataset). The experimental results verify the superiority of our method compared to the state of the art.
Prior information regarding subsurface spatial patterns may be used in geophysical inversion to obtain realistic subsurface models. Field experiments require prior information with sufficiently diverse patterns to acc...
详细信息
Prior information regarding subsurface spatial patterns may be used in geophysical inversion to obtain realistic subsurface models. Field experiments require prior information with sufficiently diverse patterns to accurately estimate the spatial distribution of geophysical properties in the sensed subsurface domain. A variational autoencoder (VAE) provides a way to assemble all patterns deemed possible in a single prior distribution. Such patterns may include those defined by different base training images and also their perturbed versions, for example, those resulting from geologically consistent operations such as erosion/dilation, local deformation, and intrafacies variability. Once the VAE is trained, inversion may be done in the latent space which ensures that inverted models have the patterns defined by the assembled prior. Gradient-based inversion with both a synthetic and a field case of cross-borehole GPR traveltime data shows that using the VAE assembled prior performs as good as using the VAE trained on the pattern with the best fit, but it has the advantage of lower computation cost and more realistic prior uncertainty. Moreover, the synthetic case shows an adequate estimation of most small-scale structures. The absolute values of wave velocity are computed by assuming a linear mixing model which involves two additional parameters that effectively shift and scale velocity values and are included in the inversion.
To quickly estimate tsunami hazards along the coastline, we present a data-driven transfer function method to reconstruct onshore tsunami hazard curves from offshore hazard curves with corresponding topographic and ba...
详细信息
To quickly estimate tsunami hazards along the coastline, we present a data-driven transfer function method to reconstruct onshore tsunami hazard curves from offshore hazard curves with corresponding topographic and bathymetric data. The transfer function is approximated by a type of artificial neural network called a variational autoencoder (VAE). The VAE first encodes input data, including offshore hazard curves and topographic and bathymetric data. Once encoded, the data are represented by a normal distribution of latent variables. The VAE then uses a trained decoder to sample the distribution created by the latent variables and reconstruct a continuous hazard function at the onshore location. As a probabilistic distribution represents the encoded values, the resulting hazard curve output has inherent stochasticity. Thus, model variance can be found through many realizations of the transfer function for a single set of inputs. We developed a set of transfer functions to accurately predict the onshore hazard curves for (a) onshore flow depth, (b) Froude number (dimensionless velocity), and (c) dimensionless momentum flux. We construct two flow depth transfer functions with one version utilizing an “anchor point” taken from established site-specific numerical modeling data. The VAEs to predict velocity and momentum flux incorporate an approach that leverages condensed topographic information around the point of interest (topographic rings). The resulting VAE's provide estimates of tsunami hazard with accuracy sufficient to perform impact and risk assessments. Overall, the transfer function method efficiently estimates onshore tsunami hazard curves, together with model uncertainty quantification, without requiring computationally expensive numerical simulations. A data-driven transfer function uses variational autoencoder-based regression to estimate onshore tsunami hazard curves from offshore data Transfer functions allow the tsunami hazard assessment of coastal are
Topic models are widely explored for summarizing a corpus of documents. Recent advances in variational autoencoder (VAE) have enabled the development of black-box inference methods for topic modeling in order to allev...
详细信息
Topic models are widely explored for summarizing a corpus of documents. Recent advances in variational autoencoder (VAE) have enabled the development of black-box inference methods for topic modeling in order to alleviate the drawbacks of classical statistical inference. Most existing VAE based approaches assume a unimodal Gaussian distribution for the approximate posterior of latent variables, which limits the flexibility in encoding the latent space. In addition, the unsupervised architecture hinders the incorporation of extra label information, which is ubiquitous in many applications. In this paper, we propose a semi-supervised topic model under the VAE framework. We assume that a document is modeled as a mixture of classes, and a class is modeled as a mixture of latent topics. A multimodal Gaussian mixture model is adopted for latent space. The parameters of the components and the mixing weights are encoded separately. These weights, together with partially labeled data, also contribute to the training of a classifier. The objective is derived under the Gaussian mixture assumption and the semi-supervised VAE framework. Modules of the proposed framework are appropriately designated. Experiments performed on three benchmark datasets demonstrate the effectiveness of our method, comparing to several competitive baselines.
An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, stro...
详细信息
An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this principle. In our prior work, we proposed a cross-domain VAE-VC (CDVAE-VC) framework, which utilized acoustic features of different properties, to improve the performance of VAE-VC. We believed that the success came from more disentangled latent representations. In this article, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech. More specifically, we first investigate the effectiveness of incorporating the generative adversarial networks (GANs) with CDVAE-VC. Then, we consider the concept of domain adversarial training and acid an explicit constraint to the latent representation, realized by a speaker classifier, to explicitly eliminate the speaker information that resides in the latent code. Experimental results confirm that the degree of disentanglement of the learned latent representation can he enhanced by both GANs and the speaker classifier. Meanwhile, subjective evaluation results in terms of quality and similarity scores demonstrate the effectiveness of our proposed methods.
Online behavior recommendation is an attractive research topic related to social media mining. This topic focuses on suggesting suitable behaviors for users in online platforms, including music listening, video watchi...
详细信息
Online behavior recommendation is an attractive research topic related to social media mining. This topic focuses on suggesting suitable behaviors for users in online platforms, including music listening, video watching, e-commerce, to name but a few to improve the user experience, an essential factor for the success of online services. A successful online behavior recommendation system should have the ability to predict behaviors that users used to performs and also suggest behaviors that users never performed before. In this paper, we develop a mixture model that contains two components to address this problem. The first component is the user-specific preference component that represents the habits of users based on their behavior history. The second component is the latent group preference component based on variational autoencoder, a deep generative neural network. This component corresponds to the hidden interests of users and allows us to discover the unseen behavior of users. We conduct experiments on various real-world datasets with different characteristics to show the performance of our model in different situations. The result indicates that our proposed model outperforms the previous mixture models for recommendation problem.
This paper describes a statistically-principled semi-supervised method of automatic chord estimation (ACE) that can make effective use of music signals regardless of the availability of chord annotations. The typical ...
详细信息
This paper describes a statistically-principled semi-supervised method of automatic chord estimation (ACE) that can make effective use of music signals regardless of the availability of chord annotations. The typical approach to ACE is to train a deep classification model (neural chord estimator) in a supervised manner by using only annotated music signals. In this discriminative approach, prior knowledge about chord label sequences (model output) has scarcely been taken into account. In contrast, we propose a unified generative and discriminative approach in the framework of amortized variational inference. More specifically, we formulate a deep generative model that represents the generative process of chroma vectors (observed variables) from discrete labels and continuous features (latent variables), which are assumed to follow a Markov model favoring self-transitions and a standard Gaussian distribution, respectively. Given chroma vectors as observed data, the posterior distributions of the latent labels and features are computed approximately by using deep classification and recognition models, respectively. These three models form a variational autoencoder and can be trained jointly in a semi-supervised manner. The experimental results show that the regularization of the classification model based on the Markov prior of chord labels and the generative model of chroma vectors improved the performance of ACE even under the supervised condition. The semi-supervised learning using additional non-annotated data can further improve the performance.
Image captioning, i.e., generating the natural semantic descriptions of given image, is an essential task for machines to understand the content of the image. Remote sensing image captioning is a part of the field. Mo...
详细信息
Image captioning, i.e., generating the natural semantic descriptions of given image, is an essential task for machines to understand the content of the image. Remote sensing image captioning is a part of the field. Most of the current remote sensing image captioning models suffered the overfitting problem and failed to utilize the semantic information in images. To this end, we propose a variational autoencoder and Reinforcement Learning based Two-stage Multi-task Learning Model (VRTMM) for the remote sensing image captioning task. In the first stage, we finetune the CNN jointly with the variational autoencoder. In the second stage, the Transformer generates the text description using both spatial and semantic features. Reinforcement Learning is then applied to enhance the quality of the generated sentences. Our model surpasses the previous state of the art records by a large margin on all seven scores on Remote Sensing Image Caption Dataset. The experiment result indicates our model is effective on remote sensing image captioning and achieves the new state-of-the-art result. (C) 2020 Elsevier B.V. All rights reserved.
Recently, deep generative models have become increasingly popular in unsupervised anomaly detection. However, deep generative models aim at recovering the data distribution rather than detecting anomalies. Moreover, d...
详细信息
Recently, deep generative models have become increasingly popular in unsupervised anomaly detection. However, deep generative models aim at recovering the data distribution rather than detecting anomalies. Moreover, deep generative models have the risk of overfitting training samples, which has disastrous effects on anomaly detection performance. To solve the above two problems, we propose a self-adversarial variational autoencoder (adVAE) with a Gaussian anomaly prior assumption. We assume that both the anomalous and the normal prior distribution are Gaussian and have overlaps in the latent space. Therefore, a Gaussian transformer net T is trained to synthesize anomalous but near-normal latent variables. Keeping the original training objective of a variational autoencoder, a generator G tries to distinguish between the normal latent variables encoded by E and the anomalous latent variables synthesized by T, and the encoder E is trained to discriminate whether the output of G is real. These new objectives we added not only give both G and E the ability to discriminate, but also become an additional regularization mechanism to prevent overfitting. Compared with other competitive methods, the proposed model achieves significant improvements in extensive experiments. The employed datasets and our model are available in a Github repository. (C) 2019 Elsevier B.V. All rights reserved.
暂无评论