Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the...
详细信息
ISBN:
(纸本)9783030880521;9783030880514
Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the techniques underlying ASR models suffer from adversarial example (AE) attacks. Audio AEs resemble non-suspicious audio to the casual listener, but will be incorrectly transcribed by an ASR system. Existing black-box AE techniques require excessive requests sent to a targeted system. Such suspicious behavior can potentially trigger a threat alert on the system. This paper proposes a method of generating black-box AEs in a way that significantly reduces the required amount of requests. We describe our proposed method and presents experimental results demonstrating its effectiveness in generating word-level and sentence-level AEs that are incorrectly transcribed by an ASR system.
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spect...
详细信息
ISBN:
(纸本)9781728170664
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several works. As the spectral model, previous works used either factorial generative models of the mixed speech or discriminative neural networks. In our work, we combine the strengths of both approaches, by building a factorial model based on a generative neural network, a variational autoencoder. By doing so, we can exploit the modeling power of neural networks, but at the same time, keep a structured model. Such a model can be advantageous when adapting to new noise conditions as only the noise part of the model needs to be modified. We show experimentally, that our model significantly outperforms previous factorial model based on Gaussian mixture model (DOLPHIN), performs comparably to integration of permutation invariant training with spatial clustering, and enables us to easily adapt to new noise conditions.
variational autoencoder (VAE) is considered as an emerging model for ensuring competitive performance in recommender systems. However, its performance is severely limited by the amount of training examples and, as a r...
详细信息
ISBN:
(纸本)9781665408981
variational autoencoder (VAE) is considered as an emerging model for ensuring competitive performance in recommender systems. However, its performance is severely limited by the amount of training examples and, as a result, existing VAE models may fail to provide satisfactory recommendation results in presence of highly sparse user-item interactions. In this paper, we propose a self-supervised VAE model, SSVAE in short, to improve the generalization ability of VAE model on the sparse interaction datasets. Concretely, we first build multiple views for each user by data augmentation, and then design a pretext task to align the representations learned from different views of each user. Particularly, SSVAE aims to optimize a combined objective of recommendation task and pretext task, making them to reinforce each other during the learning process. Our encouraging experimental results on three real-world benchmarks validate the superiority of our SSVAE model to state-of-the-art VAE style recommendation techniques.
Grasp planning and most specifically the grasp space exploration is still an open issue in robotics. This article presents an efficient procedure for exploring the grasp space of a multifingered adaptive gripper for g...
详细信息
ISBN:
(纸本)9781728190778
Grasp planning and most specifically the grasp space exploration is still an open issue in robotics. This article presents an efficient procedure for exploring the grasp space of a multifingered adaptive gripper for generating reliable grasps given a known object pose. This procedure relies on a limited dataset of manually specified expert grasps, and use a mixed analytic and data-driven approach based on the use of a grasp quality metric and variational autoencoders. The performances of this method are assessed by generating grasps in simulation for three different objects. On this grasp planning task, this method reaches a grasp success rate of 99.91% on 7000 trials.
Real-world data are typically described using multiple modalities or multiple types of descriptors that are considered as multiple views. The data from different modalities locate in different subspaces, therefore the...
详细信息
ISBN:
(纸本)9783030863623;9783030863616
Real-world data are typically described using multiple modalities or multiple types of descriptors that are considered as multiple views. The data from different modalities locate in different subspaces, therefore the representations associated with similar semantics would be different. To solve this problem, many approaches have been proposed for fusion representation using data from multiple views. Although effectiveness achieved, most existing models lack precision for gradient diffusion. We proposed Asymmetric Multimodal variational autoencoder (AMVAE) to reduce the effect. The proposed model has two key components: multiple autoencoders and multimodal variational autoencoder. Multiple autoencoders are responsible for encoding view-specific data, while the multimodal variational autoencoder guides the generation of fusion representation. The proposed model effectively solves the problem of low precision. The experimental results show that our method is state of the art on several benchmark datasets for both clustering and classification tasks.
In this paper, we present an end-to-end unsupervised anomaly detection framework for 3D point clouds. To the best of our knowledge, this is the first work to tackle the anomaly detection task on a general object repre...
详细信息
ISBN:
(纸本)9781665441155
In this paper, we present an end-to-end unsupervised anomaly detection framework for 3D point clouds. To the best of our knowledge, this is the first work to tackle the anomaly detection task on a general object represented by a 3D point cloud. We propose a deep variational autoencoder based unsupervised anomaly detection network adapted to the 3D point cloud and an anomaly score specifically for 3D point clouds. To verify the effectiveness of the model, we conducted extensive experiments on ShapeNet dataset. Through quantitative and qualitative evaluation, we demonstrate that the proposed method outperforms the baseline method.
Recommender Systems (RSs) are valuable technologies that help users in their decision-making process. Generally, RSs are designed with the assumption that a central server stores and manages historical users' beha...
详细信息
ISBN:
(纸本)9780738133669
Recommender Systems (RSs) are valuable technologies that help users in their decision-making process. Generally, RSs are designed with the assumption that a central server stores and manages historical users' behaviors. However, users are nowadays more aware of privacy issues leading to a higher demand for privacy-preserving technologies. To cope with this issue, the Federated Learning (FL) paradigm can provide good performance without harming the users' privacy. Some efforts have been devoted to adapt standard collaborative filtering methods (e.g., matrix factorization) into the FL framework in recent years. In this paper, we present a Federated variational autoencoder for Collaborative Filtering (FedVAE), which extends the state-of-the-art MultVAE model. Additionally, we propose an adaptive learning rate schedule to accelerate learning. We also discuss the potential privacy-preserving capabilities of FedVAE. An extensive experimental evaluation on five benchmark data sets shows that our proposal can achieve performance close to MultVAE in a reasonable number of iterations. We also empirically demonstrate that the adaptive learning rate guarantees both accelerated learning and good stability.
Purpose Prior studies on the application of deep-learning techniques have focused on enhancing computation algorithms. However, the amount of data is also a key element when attempting to achieve a goal using a quanti...
详细信息
Purpose Prior studies on the application of deep-learning techniques have focused on enhancing computation algorithms. However, the amount of data is also a key element when attempting to achieve a goal using a quantitative approach, which is often underestimated in practice. The problem of sparse sales data is well known in the valuation of commercial properties. This study aims to expand the limited data available to exploit the capability inherent in deep learning techniques. Design/methodology/approach The deep learning approach is used. Seoul, the capital of South Korea is selected as a case study area. Second, data augmentation is performed for properties with low trade volume in the market using a variational autoencoder (VAE), which is a generative deep learning technique. Third, the generated samples are added into the original dataset of commercial properties to alleviate data insufficiency. Finally, the accuracy of the price estimation is analyzed for the original and augmented datasets to assess the model performance. Findings The results using the sales datasets of commercial properties in Seoul, South Korea as a case study show that the augmented dataset by a VAE consistently shows higher accuracy of price estimation for all 30 trials, and the capabilities inherent in deep learning techniques can be fully exploited, promoting the rapid adoption of artificial intelligence skills in the real estate industry. Originality/value Although deep learning-based algorithms are gaining popularity, they are likely to show limited performance when data are insufficient. This study suggests an alternative approach to overcome the lack of data problem in property valuation.
This work considers industrial process monitoring using a variational autoencoder (VAE). As a powerful deep generative model, the variational autoencoder and its variants have become popular for process monitoring. Ho...
详细信息
This work considers industrial process monitoring using a variational autoencoder (VAE). As a powerful deep generative model, the variational autoencoder and its variants have become popular for process monitoring. However, its monitoring ability, especially its fault diagnosis ability, has not been well investigated. In this paper, the process modeling and monitoring capabilities of several VAE variants are comprehensively studied. First, fault detection schemes are defined in three distinct ways, considering latent, residual, and the combined domains. Afterwards, to conduct the fault diagnosis, we first define the deep contribution plot, and then a deep reconstruction-based contribution diagram is proposed for deep domains under the fault propagation mechanism. In a case study, the performance of the process monitoring capability of four deep VAE models, namely, the static VAE model, the dynamic VAE model, and the recurrent VAE models (LSTM-VAE and GRU-VAE), has been comparatively evaluated on the industrial benchmark Tennessee Eastman process. Results show that recurrent VAEs with a deep reconstruction-based diagnosis mechanism are recommended for industrial process monitoring tasks.
The importance of proper hygienical behaivour is essential in today's word especially during an ongoing pandemic. Wearing mask became mandatory in many countries during the COVID-19 Pandemic. Recognizing whether p...
详细信息
ISBN:
(纸本)9781728195438
The importance of proper hygienical behaivour is essential in today's word especially during an ongoing pandemic. Wearing mask became mandatory in many countries during the COVID-19 Pandemic. Recognizing whether people are wearing masks is complicated image recognition task which could be facilitated and automated with machine learning techniques. Camera streams are widely available in indoor environments which can be used for object detection and image processing. Convolutional Neural Networks have been successfully applied in image classification and object recognition task in various application areas. There are already trained and openly available general purpose convolutional neural networks which can be used as an initial version for specific applications. A number of different image datasets are also available for research and industrial purposes. The InceptionV3 Neural Network architecture was used to tailored to determine whether a mask is being worn or not using transfer learning techniques, and convolutional neural networks. A variational autoencoder has also been trained to normalize the dataset with respect to skin colour, angle of the head and among other parameters. This paper describes the implementation of a mask recognition software using transfer learning, a convolutional neural network and a variational autoencoder.
暂无评论