The curse of dimensionality is a fundamental difficulty in anomaly detection for high dimensional data. To deal with this problem, the autoencoder based approach is an elegant solution. However, existing works require...
详细信息
ISBN:
(纸本)9781728173030
The curse of dimensionality is a fundamental difficulty in anomaly detection for high dimensional data. To deal with this problem, the autoencoder based approach is an elegant solution. However, existing works require a clean training dataset that is not always guaranteed in real scenarios. In this paper, we propose a novel anomaly detection method named RVAE-ABFA (robust variational autoencoder with attention based feature adaptation for high dimensional data anomaly detection), which significantly improves the anomaly detection performance when training data is contaminated. Rather than only utilize reconstruction error, we take the learned low dimensional embeddings generated by variational autoencoder into consideration. In RVAE-ABFA, the learned low dimensional embeddings are helpful to detect anomalies in contaminated data because of the ability of variational inference. We also propose an ABFA (attention based feature adaptation) mechanism to adjust the weights of low dimensional embeddings and reconstruction error. Furthermore, we adopt the adversarial training criterion to perform variational inference by the adversarial network named RAAE-ABFA (robust adversarial autoencoder with attention based feature adaptation for high dimensional data anomaly detection) in which we can generate extra samples when training data is not enough. Experimental results on several benchmark datasets show that the proposed method significantly outperforms state-of-the-art unsupervised anomaly detection methods and is more robust when training data is contaminated.
Molecule generation is to design new molecules with specific chemical properties and further to optimize the desired chemical properties. Following previouswork, we encode molecules into continuous vectors in the late...
详细信息
ISBN:
(纸本)9781450379649
Molecule generation is to design new molecules with specific chemical properties and further to optimize the desired chemical properties. Following previouswork, we encode molecules into continuous vectors in the latent space and then decode the embedding vectors into molecules under the variational autoencoder (VAE) framework. We investigate the posterior collapse problem of the current widely-used RNN-based VAEs for the molecule sequence generation. For the first time, we point out that the underestimated reconstruction loss of VAEs leads to the posterior collapse, and we also provide both analytical and experimental evidences to support our findings. To fix the problem and avoid the posterior collapse, we propose an effective and efficient solution in this work. Without bells and whistles, our method achieves the state-of-the-art reconstruction accuracy and competitive validity score on the ZINC 250K dataset. When generating 10,000 unique valid molecule sequences from the random prior sampling, it costs the JT-VAE 1450 seconds while our method only needs 9 seconds on a regular desktop machine.
Deep learning (DL) has been recently used in several applications of machine health monitoring systems. Unfortunately, most of these DL models are considered as black-boxes with low interpretability. In this research,...
详细信息
ISBN:
(纸本)9781728156750
Deep learning (DL) has been recently used in several applications of machine health monitoring systems. Unfortunately, most of these DL models are considered as black-boxes with low interpretability. In this research, we propose an original PHM framework based on visual data analysis. The most suitable space dimension for the data visualization is the 2D-space, which necessarily involves a significant reduction from a high-dimensional to a low-dimensional data space. To perform the data analysis and the diagnostic interpretation in a PHM framework, a variational autoencoder (VAE) is used jointly with a classifier. The proposed model was evaluated to automatically recognize individual Partial Discharge (PD) sources for hydro generators monitoring.
Social media became popular and percolated almost all aspects of our daily lives. While online posting proves very convenient for individual users, it also fosters fast-spreading of various rumors. The rapid and wide ...
详细信息
ISBN:
(纸本)9781450370233
Social media became popular and percolated almost all aspects of our daily lives. While online posting proves very convenient for individual users, it also fosters fast-spreading of various rumors. The rapid and wide percolation of rumors can cause persistent adverse or detrimental impacts. Therefore, researchers invest great efforts on reducing the negative impacts of rumors. Towards this end, the rumor classification system aims to to detect, track, and verify rumors in social media. Such systems typically include four components: (i) a rumor detector, (ii) a rumor tracker, (iii) a stance classifier, and (iv) a veracity classifier. In order to improve the state-of-the-art in rumor detection, tracking, and verification, we propose VRoC, a tweet-level variational autoencoder-based rumor classification system. VRoC consists of a co-train engine that trains variational autoencoders (VAEs) and rumor classification components. The co-train engine helps the VAEs to tune their latent representations to be classifier-friendly. We also show that VRoC is able to classify unseen rumors with high levels of accuracy. For the PHEME dataset, VRoC consistently outperforms several state-of-the-art techniques, on both observed and unobserved rumors, by up to 26.9%, in terms of macro-F1 scores.
We investigate the effect of variational autoencoder (VAE) based data anonymization and its ability to preserve anomalous subgroup properties. We present a Utility Guaranteed Deep Privacy (UGDP) system which casts exi...
详细信息
ISBN:
(纸本)9781509066315
We investigate the effect of variational autoencoder (VAE) based data anonymization and its ability to preserve anomalous subgroup properties. We present a Utility Guaranteed Deep Privacy (UGDP) system which casts existing anomalous pattern detection methods as a new utility measure for data synthesis. UGDP's approach shows that properties of an anomalous subset of records, identified in the original data set, are preserved through the anonymization of a VAE. This is despite the newly generated records being completely synthetic. More specifically, the Bias-Scan algorithm identifies a subgroup of records that are consistently over- (or under-) risked by a black-box classifier as an area of 'poor fit'. This scanning process is applied on both pre- and post- VAE synthesized data. The areas of poor fit (i.e. anomalous records) persist in both settings. We evaluate our approach using publicly available datasets from the financial industry. Our evaluation confirmed that the approach is able to produce synthetic datasets that preserved a high level of subgroup differentiation as identified initially in the original dataset. Such a distinction was maintained while having distinctly different records between the synthetic and original dataset.
Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other...
详细信息
ISBN:
(纸本)9783030630065;9783030630072
Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other sounds. To separate the speech from a complicated signal, we propose a more powerful method by using a VAE model and then postprocessing with a bandpass filter. This combination can use to extract the original human speech in the mixture with not only high-frequency noise but also many different sounds. Our approach can be flexibly applied for the new background sounds.
In this paper an optimization algorithm for time synchronization in telecommunication network is proposed based on VAE(variational Auto Encoder)framework. Firstly features are represented in latent space under propose...
详细信息
ISBN:
(纸本)9781728160429
In this paper an optimization algorithm for time synchronization in telecommunication network is proposed based on VAE(variational Auto Encoder)framework. Firstly features are represented in latent space under proposed framework while performance of synchronization network is measured and evaluated. Secondly optimization algorithm is further designed with which feature of abnormal samples and benchmark are adaptively merged for smooth adjustment with low risk in practical network operation. Meanwhile considering the characteristics as domain knowledge of synchronization network, a novel metric is adopted to reduce the fluctuation of adjustment. The simulation results verified that performance of synchronization network is significantly improved by optimization templates reconstructed through decoding part of VAE model. It is implied that prior knowledge of synchronization in latent space is introduced with certain interpret-ability for assessment of monitoring performance while optimization adjustment can be properly operated through novel metric proposed in this algorithm.
The missing data issue is often found in real-world datasets and it is usually handled with imputation strategies that replace the missing values with new data. Recently, generative models such as variational Autoenco...
详细信息
ISBN:
(纸本)9781728169262
The missing data issue is often found in real-world datasets and it is usually handled with imputation strategies that replace the missing values with new data. Recently, generative models such as variational autoencoders have been applied for this imputation task. However, they were always used to perform the entire imputation, which has presented limited results when comparing to other state-of-the-art methods. In this work, a new approach called variational autoencoder Filter for Bayesian Ridge Imputation is introduced. It uses a variational autoencoder at the beginning of the imputation pipeline to filter the instances that are later fitted to a Bayesian ridge regression used to predict the new values. The approach was compared to four state-of-the-art imputation methods using 10 datasets from the healthcare context covering clinical trials, all injected with missing values under different rates. The proposed approach significantly outperformed the remaining methods in all settings, achieving an overall improvement between 26% and 67%.
Grasp planning and most specifically the grasp space exploration is still an open issue in robotics. This article presents a data-driven oriented methodology to model the grasp space of a multi-fingered adaptive gripp...
详细信息
Grasp planning and most specifically the grasp space exploration is still an open issue in robotics. This article presents a data-driven oriented methodology to model the grasp space of a multi-fingered adaptive gripper for known objects. This method relies on a limited dataset of manually specified expert grasps, and uses variational autoencoder to learn grasp intrinsic features in a compact way from a computational point of view. The learnt model can then be used to generate new non-learnt gripper configurations to explore the grasp space.
Uncertainty in observations about the state of affairs is unavoidable, and generally undesirable, so we are motivated to try to minimize its effect on data analysis. Detection of anomalies in data has become an import...
详细信息
ISBN:
(纸本)9781728185262
Uncertainty in observations about the state of affairs is unavoidable, and generally undesirable, so we are motivated to try to minimize its effect on data analysis. Detection of anomalies in data has become an important research area. In this paper, we propose a novel approach to anomaly detection based on the variational autoencoder method with a Mish activation function and a Negative Log-Likelihood loss function. The proposed method is validated with ten standard datasets, comparing performance on each of the various activation functions and loss functions. Experimental results show that our proposed method offers an improvement over existing methods. Statistical properties (i.e., F1 score, AUC, and ROC) of the method are also examined in light of the experimental results.
暂无评论