Generative models, such as variational autoencoders, are being increasingly utilized for various acoustic modeling tasks, such as anomaly detection from audio signals. Motivated by this, in this work we propose a Conv...
详细信息
ISBN:
(纸本)9789082797091
Generative models, such as variational autoencoders, are being increasingly utilized for various acoustic modeling tasks, such as anomaly detection from audio signals. Motivated by this, in this work we propose a Convolutional variational autoencoder (CVAE), in order to detect and predict the appearance of relapses in patients with psychotic disorders, such as schizophrenia and bipolar disorder. The proposed system utilizes speech segments of patients, isolated from interviews conducted with their clinicians containing spontaneous speech, and represented as log-mel spectrograms. The results from the analysis of each segment are then aggregated in a perinterview basis. We explore the performance of our system in both a personalized and a universal (patient-independent) setup. Evaluation of our method in data from 13 patients and 375 interviews, with a total duration of 30509 sec of isolated speech, indicate that the CVAE achieves similar results to a Convolutional autoencoder (CAE) baseline in a personalized setup. Furthermore, the proposed model significantly outperforms the CAE baseline when considering a universal relapse detection setup.
Nature has spent billions of years perfecting our genetic representations, making them evolvable and expressive. Generative machine learning offers a shortcut: learn an evolvable latent space with implicit biases towa...
详细信息
ISBN:
(纸本)9783031147142;9783031147135
Nature has spent billions of years perfecting our genetic representations, making them evolvable and expressive. Generative machine learning offers a shortcut: learn an evolvable latent space with implicit biases towards better solutions. We present SOLVE: Search space Optimization with Latent Variable Evolution, which creates a dataset of solutions that satisfy extra problem criteria or heuristics, generates a new latent search space, and uses a genetic algorithm to search within this new space to find solutions that meet the overall objective. We investigate SOLVE on five sets of criteria designed to detrimentally affect the search space and explain how this approach can be easily extended as the problems become more complex. We show that, compared to an identical GA using a standard representation, SOLVE with its learned latent representation can meet extra criteria and find solutions with distance to optimal up to two orders of magnitude closer. We demonstrate that SOLVE achieves its results by creating better search spaces that focus on desirable regions, reduce discontinuities, and enable improved search by the genetic algorithm.
variational autoencoders have been recently proposed for the problem of process monitoring. While these works show impressive results over classical methods, the proposed monitoring statistics often ignore the inconsi...
详细信息
variational autoencoders have been recently proposed for the problem of process monitoring. While these works show impressive results over classical methods, the proposed monitoring statistics often ignore the inconsistencies in learned lower-dimensional representations and computational limitations in high-dimensional approximations. In this work, we first manifest these issues and then overcome them with a novel statistic formulation that increases out-of-control detection accuracy without compromising computational efficiency. We demonstrate our results on a simulation study with explicit control over latent variations, and a real-life example of image profiles obtained from a hot steel rolling process.
We present a machine learning approach that expedites structure-property analysis in materials, bypassing traditional feature extraction and exploratory data analysis techniques. This objective is accomplished by empl...
详细信息
We present a machine learning approach that expedites structure-property analysis in materials, bypassing traditional feature extraction and exploratory data analysis techniques. This objective is accomplished by employing a variational autoencoder (VAE) structure that is modified to include a regressor network for property prediction (VAE-Regression). This modification allows for direct linkage of imaged features and quantitative part properties within the VAE latent space. We first demonstrate our approach using 2D optical micrographs and corresponding four -point bend fatigue life data from laser beam powder bed fusion additively manufactured Ti-6Al-4V coupons. The VAE-Regression model extracts spatial features, predicts fatigue life, and identifies features of porosity defect governing fatigue behavior such as pore clusters, pores near sample edges, and jagged pore morphologies. These features corroborate fatigue literature on physics -based modeling and experimentation. We then demonstrate the versatility of our methodology using binder jet additively manufactured WC -Co coupons, where porosity and microstructural discontinuities are known to lower the three-point bend transverse rupture strength, but the interaction between the WC and Co are yet to be completely understood. We attempted to understand these interactions using our VAE-Regression architecture. Within our dataset, we show that coarser WC grains surrounded by larger Co pools indicate lower strength, while finer WC grains with smaller Co pools indicate higher strength. This machine learning approach using image -based data will likely prove to be critical in understanding and identifying structure-property relationships in new materials and manufacturing processes.
Agricultural image recognition tasks are becoming increasingly dependent on systems based on deep learning (DL);however, despite the excellent performance of DL, it is difficult to comprehend the type of logic or feat...
详细信息
Agricultural image recognition tasks are becoming increasingly dependent on systems based on deep learning (DL);however, despite the excellent performance of DL, it is difficult to comprehend the type of logic or features of the input image it uses during decision making. Knowing the logic or features is highly crucial for result verification, algorithm improvement, training data improvement, and knowledge extraction. However, the explanations from the current heatmap-based algorithms are insufficient for the abovementioned requirements. To address this, this paper details the development of a classification and explanation method based on a variational autoencoder (VAE) architecture, which can visualize the variations of the most important features by visualizing the generated images that correspond to the variations of those features. Using the PlantVillage dataset, an acceptable level of explainability was achieved without sacrificing the classification accuracy. The proposed method can also be extended to other crops as well as other image classification tasks. Further, application systems using this method for disease identification tasks, such as the identification of potato blackleg disease, potato virus Y, and other image classification tasks, are currently being developed.
This paper proposes an anomaly detection scheme for multilevel converters based on a wavelet packet transform and variational autoencoder (WPT-VAE). The wavelet packet transform is used for dimensionality reduction an...
详细信息
ISBN:
(数字)9781728193878
ISBN:
(纸本)9781728193878
This paper proposes an anomaly detection scheme for multilevel converters based on a wavelet packet transform and variational autoencoder (WPT-VAE). The wavelet packet transform is used for dimensionality reduction and feature extraction of raw signals. The extracted features are normalized and then sent to the VAE to perform further feature extraction and waveform regeneration. Based on a five-level nested neutral-point-piloted (NNPP) converter, the effectiveness of the proposed method is verified by experiments. The normal dataset is used for model training, while a mixed dataset composed of normal and abnormal data is used for testing. The results show that the proposed WPT-VAE exhibits superior performances in anomaly detection compared with a widely used classification algorithm. Abnormal data can be quickly and accurately distinguished from normal data for early intervention to prevent serious faults, which has good practical value.
Zero-Shot Cross-Modal Retrieval (ZS-CMR) has recently drawn increasing attention as it focuses on a practical retrieval scenario, i.e, the multimodal test set consists of unseen classes that are disjoint with seen cla...
详细信息
ISBN:
(纸本)9781450387323
Zero-Shot Cross-Modal Retrieval (ZS-CMR) has recently drawn increasing attention as it focuses on a practical retrieval scenario, i.e, the multimodal test set consists of unseen classes that are disjoint with seen classes in the training set. The recently proposed methods typically adopt the generative model as the main framework to learn a joint latent embedding space to alleviate the modality gap. Generally, these methods largely rely on auxiliary semantic embeddings for knowledge transfer across classes and unconsciously neglect the effect of the data reconstruction manner in the adopted generative model. To address this issue, we propose a novel ZS-CMR model termed Multimodal Disentanglement variational autoencoders (MD-VAE), which consists of two coupled disentanglement variational autoencoders (DVAEs) and a fusion-exchange VAE (FVAE). Specifically, DVAE is developed to disentangle the original representations of each modality into modality-invariant and modality-specific features. FVAE is designed to fuse and exchange information of multimodal data by the reconstruction and alignment process without pre-extracted semantic embeddings. Moreover, an advanced counter-intuitive cross-reconstruction scheme is further proposed to enhance the informativeness and generalizability of the modality-invariant features for more effective knowledge transfer. The comprehensive experiments on four image-text retrieval and two image-sketch retrieval datasets consistently demonstrate that our method establishes the new state-of-the-art performance.
The rapid synthesis of radar waveform modulations is key to enabling a radar to react to the environment in order to optimize performance. This paper proposes the use of generative models for radar waveform generation...
详细信息
ISBN:
(纸本)9781728153681
The rapid synthesis of radar waveform modulations is key to enabling a radar to react to the environment in order to optimize performance. This paper proposes the use of generative models for radar waveform generation. Specifically, variational autoencoders (VAEs) comprising neural networks that are trained with a novel reconstruction loss are proposed. It is shown for simple classes of non-linear FM waveforms that the decoder from the proposed VAE can generate new radar waveform modulations that possess required ambiguity function characteristics, even though they were not represented in the training data.
To enhance flexibility and facilitate resource cooperation, a novel fully-decoupled radio access network (FD-RAN) architecture is proposed for 6G. However, the decoupling of uplink (UL) and downlink (DL) in FD-RAN mak...
详细信息
To enhance flexibility and facilitate resource cooperation, a novel fully-decoupled radio access network (FD-RAN) architecture is proposed for 6G. However, the decoupling of uplink (UL) and downlink (DL) in FD-RAN makes the existing feedback mechanism ineffective. To this end, we propose an end-to-end data-driven MIMO solution without the conventional channel feedback procedure. Data-driven MIMO can alleviate the drawbacks of feedback including overheads and delay, and can provide customized precoding design for different BSs based on their historical channel data. It essentially learns a mapping from geolocation to MIMO transmission parameters. We first present a codebook-based approach, which selects transmission parameters from the statistics of discrete channel state information (CSI) values and utilizes nearest neighbor interpolation for spatial inference. We further present a non-codebook-based approach, which 1) derives the optimal precoder from the singular value decomposition (SVD) of the channel;2) utilizes variational autoencoder (VAE) to select the representative precoder from the latent Gaussian representations;and 3) exploits Gaussian process regression (GPR) to predict unknown precoders in the space domain. Extensive simulations are performed on a link-level 5G simulator using realistic ray-tracing channel data. The results demonstrate the effectiveness of data-driven MIMO, showcasing its potential for application in FD-RAN and 6G.
While vast amounts of personal data are shared daily on public online platforms and used by companies and analysts to gain valuable insights, privacy concerns are also on the rise: Modern authorship attribution techni...
详细信息
ISBN:
(纸本)9781450390965
While vast amounts of personal data are shared daily on public online platforms and used by companies and analysts to gain valuable insights, privacy concerns are also on the rise: Modern authorship attribution techniques have proven effective at identifying individuals from their data, such as their writing style or behavior of picking and judging movies. It is hence crucial to develop data sanitization methods that allow sharing of users' data while protecting their privacy and preserving quality and content of the original data. In this paper, we tackle anonymization of textual data and propose an end-to-end differentially private variational autoencoder architecture. Unlike previous approaches that achieve differential privacy on a per-word level through individual perturbations, our solution works at an abstract level by perturbing the latent vectors that provide a global summary of the input texts. Decoding an obfuscated latent vector thus not only allows our model to produce coherent, high-quality output text that is human-readable, but also results in strong anonymization due to the diversity of the produced data. We evaluate our approach on IMDb movie and Yelp business reviews, confirming its anonymization capabilities and preservation of the semantics and utility of the original sentences.
暂无评论