Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the...
详细信息
ISBN:
(纸本)9783030880521;9783030880514
Automatic speech recognition (ASR) applications are ubiquitous these days. A variety of commercial products utilize powerful ASR capabilities to transcribe user speech. However, as with other deep learning models, the techniques underlying ASR models suffer from adversarial example (AE) attacks. Audio AEs resemble non-suspicious audio to the casual listener, but will be incorrectly transcribed by an ASR system. Existing black-box AE techniques require excessive requests sent to a targeted system. Such suspicious behavior can potentially trigger a threat alert on the system. This paper proposes a method of generating black-box AEs in a way that significantly reduces the required amount of requests. We describe our proposed method and presents experimental results demonstrating its effectiveness in generating word-level and sentence-level AEs that are incorrectly transcribed by an ASR system.
In this paper, we propose a method for performing electricity price execution inspection by using a variational autoencoder technology in deep learning. The variational autoencoder based anomaly detection algorithm(VA...
详细信息
ISBN:
(纸本)9781538685495
In this paper, we propose a method for performing electricity price execution inspection by using a variational autoencoder technology in deep learning. The variational autoencoder based anomaly detection algorithm(VABAD) can be used both as a discriminant model and as a feature of the generation model, which effectively solves the calculation problem of multiple heterogeneous parameters of current electricity price inspection implementation. The reconstruction probability is a probabilistic measure that takes into account the variability of the distribution of variables. It is used by autoencoder based anomaly detection methods. Experimental results show that the proposed method has been validated and compared to the existing approaches. The databases used in this paper come from Power Marketing System that occurred in Liaoning, China in 2015.
Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE...
详细信息
ISBN:
(纸本)9781665405409
Recently, variational autoencoder (VAE), a deep representation learning (DRL) model, has been used to perform speech enhancement (SE). However, to the best of our knowledge, current VAE-based SE methods only apply VAE to model speech signal, while noise is modeled using the traditional non-negative matrix factorization (NMF) model. One of the most important reasons for using NMF is that these VAE-based methods cannot disentangle the speech and noise latent variables from the observed signal. Based on Bayesian theory, this paper derives a novel variational lower bound for VAE, which ensures that VAE can be trained in supervision, and can disentangle speech and noise latent variables from the observed signal. This means that the proposed method can apply the VAE to model both speech and noise signals, which is totally different from the previous VAE-based SE works. More specifically, the proposed DRL method can learn to impose speech and noise signal priors to different sets of latent variables for SE. The experimental results show that the proposed method can not only disentangle speech and noise latent variables from the observed signal, but also obtain a higher scale-invariant signal-to-distortion ratio and speech quality score than the similar deep neural network-based (DNN) SE method.
Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emoti...
详细信息
ISBN:
(纸本)9781665488679
Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emotions to create beautiful accompaniments. Existing models struggle to effectively characterize human emotions within neural network models while composing music. To address this issue, we propose the use of an easy-to-represent emotion flow model, the Valence/Arousal Curve, which allows for the compatibility of emotional information within the model through data transformation and enhances interpretability of emotional factors by utilizing a variational autoencoder as the model structure. Further, we used relative self-attention to maintain the structure of the music at music phrase level and to generate a richer accompaniment when combined with the rules of music theory. Our experimental results indicate that the emotional flow of the music generated by our model has a strong correlation with the input emotion, demonstrating the model's strong interpretability and control of emotional flow. The generated music is also well-structured, diverse, and dynamic, outperforming the baseline models.
Linear Motion (LM) is a linear motion guide that helps directional moving of machine. It is important to judge the anomaly state of LM guides because LM guides are used in various industries to support various task in...
详细信息
ISBN:
(纸本)9781479975143
Linear Motion (LM) is a linear motion guide that helps directional moving of machine. It is important to judge the anomaly state of LM guides because LM guides are used in various industries to support various task in industry application. In this paper, we proposed a machine learning algorithm for determining the anomaly state of LM guide. Considering that it is difficult to actually generate the anomaly signal, we trained model with only healthy state data. One of the generative models, variational autoencoder, is used for training healthy state data and the distribution of healthy state data is trained. Our trained model determines whether or not anomaly state has occurred based on a reconstruction error of the trained network.
Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other...
详细信息
ISBN:
(纸本)9783030630065;9783030630072
Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other sounds. To separate the speech from a complicated signal, we propose a more powerful method by using a VAE model and then postprocessing with a bandpass filter. This combination can use to extract the original human speech in the mixture with not only high-frequency noise but also many different sounds. Our approach can be flexibly applied for the new background sounds.
Ultrasound imaging has become a preferred medical diagnostics tool for many applications due to its cost-effectiveness, non-ionizing nature, and real-time capabilities. There has been a significant progress in the dev...
详细信息
ISBN:
(纸本)9798350317107;9798350317114
Ultrasound imaging has become a preferred medical diagnostics tool for many applications due to its cost-effectiveness, non-ionizing nature, and real-time capabilities. There has been a significant progress in the development of new ultrasound probes and systems, particularly portable and wearable devices, incorporating new transducer technologies, sophisticated electronics integration, artificial intelligence and advanced beamforming strategies. Wearable ultrasound systems, equipped with wireless data transfer interfaces, offer unique advantages for continuous signal monitoring of the patients for their critical conditions both in and out-of-hospital settings. Many challenges specifically in data rate reduction for wireless real-time systems needs to be explored. To address this issue, in this paper, we present a vector quantized variational autoencoder model to effectively compress ultrasound RF signals without compromising image quality. We tested and evaluated the performance of the model on real ultrasound datasets. The experimental results demonstrate 92% of data reduction enabling achievable real-time imaging speeds over wireless channels.
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spect...
详细信息
ISBN:
(纸本)9781728170664
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several works. As the spectral model, previous works used either factorial generative models of the mixed speech or discriminative neural networks. In our work, we combine the strengths of both approaches, by building a factorial model based on a generative neural network, a variational autoencoder. By doing so, we can exploit the modeling power of neural networks, but at the same time, keep a structured model. Such a model can be advantageous when adapting to new noise conditions as only the noise part of the model needs to be modified. We show experimentally, that our model significantly outperforms previous factorial model based on Gaussian mixture model (DOLPHIN), performs comparably to integration of permutation invariant training with spatial clustering, and enables us to easily adapt to new noise conditions.
Dimensionality reduction using variational autoencoder (VAE) is widely employed in learning diverse state representations, such as in autonomous driving tasks. Conventional VAE-based dimensionality reduction is a comm...
详细信息
ISBN:
(纸本)9798331505356;9798331505349
Dimensionality reduction using variational autoencoder (VAE) is widely employed in learning diverse state representations, such as in autonomous driving tasks. Conventional VAE-based dimensionality reduction is a commonly used method for reducing the computational cost associated with learning from high-dimensional data, particularly image data, while achieving high performance. In this paper, we investigate the impact of integrating VAE with Squeeze-and-Excitation Networks (SENet), referred to as SENet-VAE, on the accuracy of learning driving behaviors in deep reinforcement learning. We conduct a series of experiments comparing three setups: raw image data, conventional VAE, and SENet-VAE. Additionally, we explore the effect of applying hyperparameters to the Kullback-Leibler (KL) divergence term in the objective function of the SENet-VAE to further optimize performance. Our results demonstrate that SENet-VAE outperforms the conventional VAE in terms of learning accuracy, with hyperparameter tuning leading to performance gains.
In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited...
详细信息
ISBN:
(纸本)9781728100647
In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited-sample size (HDLSS) problem. Due to the limited-sample-size, there is a lack of enough training data to train classification models. Also, the `curse of dimensionality' aspect is often a restriction on the effectiveness of many methods for solving HDLSS problem. Classification model with limited-sample dataset lead to overfitting and cannot achieve a satisfactory result. Thus, the unsupervised method is a better choice to solve such problems. Due to the emergence of deep learning, their plenty of applications and promising outcome, it is required an extensive analysis of the deep learning technique on HDLSS dataset. This paper aims at evaluating the performance of variational autoencoder (VAE) based dimensionality reduction and unsupervised classification on the HDESS dataset. The performance of VAE is compared with two existing techniques namely PCA and NMF on fourteen datasets in term of three evaluation metrics namely purity, Rand index, and NMI. The experimental result shows the superiority of VAE over the traditional methods on the HDLSS dataset.
暂无评论