Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the...
详细信息
ISBN:
(纸本)9783030880521;9783030880514
Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the techniques underlying ASR models suffer from adversarial example (AE) attacks. Audio AEs resemble non-suspicious audio to the casual listener, but will be incorrectly transcribed by an ASR system. Existing black-box AE techniques require excessive requests sent to a targeted system. Such suspicious behavior can potentially trigger a threat alert on the system. This paper proposes a method of generating black-box AEs in a way that significantly reduces the required amount of requests. We describe our proposed method and presents experimental results demonstrating its effectiveness in generating word-level and sentence-level AEs that are incorrectly transcribed by an ASR system.
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spect...
详细信息
ISBN:
(纸本)9781728170664
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several works. As the spectral model, previous works used either factorial generative models of the mixed speech or discriminative neural networks. In our work, we combine the strengths of both approaches, by building a factorial model based on a generative neural network, a variational autoencoder. By doing so, we can exploit the modeling power of neural networks, but at the same time, keep a structured model. Such a model can be advantageous when adapting to new noise conditions as only the noise part of the model needs to be modified. We show experimentally, that our model significantly outperforms previous factorial model based on Gaussian mixture model (DOLPHIN), performs comparably to integration of permutation invariant training with spatial clustering, and enables us to easily adapt to new noise conditions.
variational autoencoder (VAE) is considered as an emerging model for ensuring competitive performance in recommender systems. However, its performance is severely limited by the amount of training examples and, as a r...
详细信息
ISBN:
(纸本)9781665408981
variational autoencoder (VAE) is considered as an emerging model for ensuring competitive performance in recommender systems. However, its performance is severely limited by the amount of training examples and, as a result, existing VAE models may fail to provide satisfactory recommendation results in presence of highly sparse user-item interactions. In this paper, we propose a self-supervised VAE model, SSVAE in short, to improve the generalization ability of VAE model on the sparse interaction datasets. Concretely, we first build multiple views for each user by data augmentation, and then design a pretext task to align the representations learned from different views of each user. Particularly, SSVAE aims to optimize a combined objective of recommendation task and pretext task, making them to reinforce each other during the learning process. Our encouraging experimental results on three real-world benchmarks validate the superiority of our SSVAE model to state-of-the-art VAE style recommendation techniques.
Grasp planning and most specifically the grasp space exploration is still an open issue in robotics. This article presents an efficient procedure for exploring the grasp space of a multifingered adaptive gripper for g...
详细信息
ISBN:
(纸本)9781728190778
Grasp planning and most specifically the grasp space exploration is still an open issue in robotics. This article presents an efficient procedure for exploring the grasp space of a multifingered adaptive gripper for generating reliable grasps given a known object pose. This procedure relies on a limited dataset of manually specified expert grasps, and use a mixed analytic and data-driven approach based on the use of a grasp quality metric and variational autoencoders. The performances of this method are assessed by generating grasps in simulation for three different objects. On this grasp planning task, this method reaches a grasp success rate of 99.91% on 7000 trials.
Real-world data are typically described using multiple modalities or multiple types of descriptors that are considered as multiple views. The data from different modalities locate in different subspaces, therefore the...
详细信息
ISBN:
(纸本)9783030863623;9783030863616
Real-world data are typically described using multiple modalities or multiple types of descriptors that are considered as multiple views. The data from different modalities locate in different subspaces, therefore the representations associated with similar semantics would be different. To solve this problem, many approaches have been proposed for fusion representation using data from multiple views. Although effectiveness achieved, most existing models lack precision for gradient diffusion. We proposed Asymmetric Multimodal variational autoencoder (AMVAE) to reduce the effect. The proposed model has two key components: multiple autoencoders and multimodal variational autoencoder. Multiple autoencoders are responsible for encoding view-specific data, while the multimodal variational autoencoder guides the generation of fusion representation. The proposed model effectively solves the problem of low precision. The experimental results show that our method is state of the art on several benchmark datasets for both clustering and classification tasks.
In this paper, we present an end-to-end unsupervised anomaly detection framework for 3D point clouds. To the best of our knowledge, this is the first work to tackle the anomaly detection task on a general object repre...
详细信息
ISBN:
(纸本)9781665441155
In this paper, we present an end-to-end unsupervised anomaly detection framework for 3D point clouds. To the best of our knowledge, this is the first work to tackle the anomaly detection task on a general object represented by a 3D point cloud. We propose a deep variational autoencoder based unsupervised anomaly detection network adapted to the 3D point cloud and an anomaly score specifically for 3D point clouds. To verify the effectiveness of the model, we conducted extensive experiments on ShapeNet dataset. Through quantitative and qualitative evaluation, we demonstrate that the proposed method outperforms the baseline method.
Recommender Systems (RSs) are valuable technologies that help users in their decision-making process. Generally, RSs are designed with the assumption that a central server stores and manages historical users' beha...
详细信息
ISBN:
(纸本)9780738133669
Recommender Systems (RSs) are valuable technologies that help users in their decision-making process. Generally, RSs are designed with the assumption that a central server stores and manages historical users' behaviors. However, users are nowadays more aware of privacy issues leading to a higher demand for privacy-preserving technologies. To cope with this issue, the Federated Learning (FL) paradigm can provide good performance without harming the users' privacy. Some efforts have been devoted to adapt standard collaborative filtering methods (e.g., matrix factorization) into the FL framework in recent years. In this paper, we present a Federated variational autoencoder for Collaborative Filtering (FedVAE), which extends the state-of-the-art MultVAE model. Additionally, we propose an adaptive learning rate schedule to accelerate learning. We also discuss the potential privacy-preserving capabilities of FedVAE. An extensive experimental evaluation on five benchmark data sets shows that our proposal can achieve performance close to MultVAE in a reasonable number of iterations. We also empirically demonstrate that the adaptive learning rate guarantees both accelerated learning and good stability.
This work considers industrial process monitoring using a variational autoencoder (VAE). As a powerful deep generative model, the variational autoencoder and its variants have become popular for process monitoring. Ho...
详细信息
This work considers industrial process monitoring using a variational autoencoder (VAE). As a powerful deep generative model, the variational autoencoder and its variants have become popular for process monitoring. However, its monitoring ability, especially its fault diagnosis ability, has not been well investigated. In this paper, the process modeling and monitoring capabilities of several VAE variants are comprehensively studied. First, fault detection schemes are defined in three distinct ways, considering latent, residual, and the combined domains. Afterwards, to conduct the fault diagnosis, we first define the deep contribution plot, and then a deep reconstruction-based contribution diagram is proposed for deep domains under the fault propagation mechanism. In a case study, the performance of the process monitoring capability of four deep VAE models, namely, the static VAE model, the dynamic VAE model, and the recurrent VAE models (LSTM-VAE and GRU-VAE), has been comparatively evaluated on the industrial benchmark Tennessee Eastman process. Results show that recurrent VAEs with a deep reconstruction-based diagnosis mechanism are recommended for industrial process monitoring tasks.
The importance of proper hygienical behaivour is essential in today's word especially during an ongoing pandemic. Wearing mask became mandatory in many countries during the COVID-19 Pandemic. Recognizing whether p...
详细信息
ISBN:
(纸本)9781728195438
The importance of proper hygienical behaivour is essential in today's word especially during an ongoing pandemic. Wearing mask became mandatory in many countries during the COVID-19 Pandemic. Recognizing whether people are wearing masks is complicated image recognition task which could be facilitated and automated with machine learning techniques. Camera streams are widely available in indoor environments which can be used for object detection and image processing. Convolutional Neural Networks have been successfully applied in image classification and object recognition task in various application areas. There are already trained and openly available general purpose convolutional neural networks which can be used as an initial version for specific applications. A number of different image datasets are also available for research and industrial purposes. The InceptionV3 Neural Network architecture was used to tailored to determine whether a mask is being worn or not using transfer learning techniques, and convolutional neural networks. A variational autoencoder has also been trained to normalize the dataset with respect to skin colour, angle of the head and among other parameters. This paper describes the implementation of a mask recognition software using transfer learning, a convolutional neural network and a variational autoencoder.
Generative Models have always attracted the attention of Machine Learning research community;they are useful and also generally harder than their discriminative counterparts. In these models, we would be looking into ...
详细信息
ISBN:
(纸本)9780738133669
Generative Models have always attracted the attention of Machine Learning research community;they are useful and also generally harder than their discriminative counterparts. In these models, we would be looking into learning the probability distribution of the input and sampling from that to generate new data samples. Since quantum computing and algorithms are inherently random, they can facilitate a natural framework in this situation. But, getting a suitable gate circuit to achieve the requisite quantum state which by repeated preparation and measurement leads to the sought-after data samples is not trivial. In this paper, we propose a quantum circuit which has a flavor of variational autoencoder with the usual visible and hidden nodes for input data and latent distribution. The encoder portion comprises of a suitably chosen parameterized phase ansatz and Inverse Quantum Fourier Transform blocks. Depending on whether the measurement is carried out on the hidden nodes or not, the decoder circuit, which is just not the inverse of the encoder in our case, is configured. The Kullback-Leibler Divergence is used train the circuit towards the required input distribution. Numerical results presented demonstrate the correct functionality of the approach.
暂无评论