automated generation of speech audio that closely resembles human emotional speech has garnered significant attention from the society and the engineering academia. This attention is due to its diverse applications, i...
详细信息
automated generation of speech audio that closely resembles human emotional speech has garnered significant attention from the society and the engineering academia. This attention is due to its diverse applications, including audiobooks, podcasts, and the development of empathetic home assistants. In the scope of this study, it is introduced a novel approach to emotional speech transfer utilizing generative models and a selected emotional target desired for the output speech. The natural speech has been extended with contextual information data related with emotional speech cues. The generative models used for pursuing this task are a variational autoencoder model and a conditional generative adversarial network model. In this case study, an input voice audio, a desired utterance, and user-selected emotional cues, are used to produce emotionally expressive speech audio, transferring an ordinary speech audio with added contextual cues, into a happy emotional speech audio by a variational autoencoder model. The model try to reproduce in the ordinary speech, the emotion present in the emotional contextual cues used for training. The results show that, the proposed unsupervised VAE model with custom dataset for generating emotional data reach an MSE lower than 0.010 and an SSIM almost reaching the 0.70, while most of the values are greater than 0.60, respect to the input data and the generated data. CGAN and VAE models when generating new emotional data on demand, show a certain degree of success in the evaluation of an emotion classifier that determines the similarity with real emotional audios.
We present a multi-module framework based on Conditional variational autoencoder (CVAE) to detect anomalies in the power signals coming from multiple High Voltage Converter Modulators (HVCMs). We condition the model w...
详细信息
We present a multi-module framework based on Conditional variational autoencoder (CVAE) to detect anomalies in the power signals coming from multiple High Voltage Converter Modulators (HVCMs). We condition the model with the specific modulator type to capture different representations of the normal waveforms and to improve the sensitivity of the model to identify a specific type of fault when we have limited samples for a given module type. We studied several Artificial Neural Network (ANN) architectures for our CVAE model and evaluated the model performance by looking at their loss landscape for stability and generalization. Our results for the Spallation Neutron Source (SNS) experimental data show that the trained model generalizes well to detecting multiple fault types for several HVCM module types. The results of this study can be used to improve the HVCM reliability and overall SNS uptime.
Just-in-time learning(JITL) is a widely used online soft sensing method for time-varying *** the increase of dimensionality of industrial dataset,both reliability and usability of JITL would be seriously damaged due t...
详细信息
Just-in-time learning(JITL) is a widely used online soft sensing method for time-varying *** the increase of dimensionality of industrial dataset,both reliability and usability of JITL would be seriously damaged due to the sparseness caused in the high-dimensional data *** this paper,variational autoencoder is introduced as a generative model to provide virtual samples for augmenting high-dimensional sparse *** on this,a data augmentation just-in-time learning framework is formulated and implemented with two different ***,we carefully discuss the effect of virtual data on just-in-time learning,as well as the influence of virtual data volume,virtual data ratio and other factors on the proposed ***,a real industrial example is applied to verify the effectiveness of the proposed method,in which two indicators(RMSE and R2) have been improved by an average of 22%,15%,respectively,compared to the traditional JITL approach.
Xi’an Drum Music is a traditional form of Chinese music and its notes are recorded by Chinese *** Xi’an Drum Music is composed and translated by the elder musicians,Xi’an Drum Music becomes difficult to be *** this...
详细信息
Xi’an Drum Music is a traditional form of Chinese music and its notes are recorded by Chinese *** Xi’an Drum Music is composed and translated by the elder musicians,Xi’an Drum Music becomes difficult to be *** this article,we use sparse coding and compressed coding to transfer the Chinese Character recording to genre and lyrics of Xi’ an Drum *** on our dataset of Xi’an Drum Music,we set up a method to generate Xi’ an Drum Music similar to Huffman coding,a model named Xi’an Drum Music Generation via variational autoencoder(DMGVAE) and the accuracy of Xi’ an Drum Music generation increases to *** coding on Xi’an Drum Music shows a novel method to generate Xi’ an Drum Music by compressed format,making potential application for generating the sparse traditional Chinese Music such as Xi’an Drum Music.
Modeling the speech generation process can provide flexible and interpretable ways to generate intended synthetic speech. In this paper, we present a deep generative model of fundamental frequency (F_0) contours of no...
详细信息
ISBN:
(纸本)9781538646595
Modeling the speech generation process can provide flexible and interpretable ways to generate intended synthetic speech. In this paper, we present a deep generative model of fundamental frequency (F_0) contours of normal speech and singing voices. The generative model we propose in this paper 1) is able to accurately decompose an F_0 contour into the sum of phrase and accent components of the Fujisaki model, a mathematical model describing the control mechanism of vocal fold vibration, without an iterative algorithm, and 2) can represent/generate F_0 contours of both normal speech and singing voices reasonably well.
Modeling musical timbre is critical for various music information retrieval (MIR) tasks. This work addresses the task of classifying playing techniques, which involves extremely subtle variations of timbre among diffe...
详细信息
Modeling musical timbre is critical for various music information retrieval (MIR) tasks. This work addresses the task of classifying playing techniques, which involves extremely subtle variations of timbre among different categories. A deep collaborative learning framework is proposed to represent a music with greater discriminative power than previously achieved. Firstly, a novel variational autoencoder (VAE) is developed to eliminate the variation of acoustic features within a class. Secondly, a Gaussian process classifier is jointly learned to distinguish the variations of timbres between classes, which increases the discriminative power of the learned representations. We derive a new lower bound that guides a VAE-based representation. Experiments were conducted on a database of seven classes of guitar playing techniques. The experimental results demonstrated that the proposed method outperforms baselines in terms of the F1-score and accuracy.
In purpose of detecting the inner and outer ring faults of tractor motor,one Feature Learning method,Variationa autoencoder,which based on Tensorflow,was cited to process the motor vibration *** method firstly normali...
详细信息
In purpose of detecting the inner and outer ring faults of tractor motor,one Feature Learning method,Variationa autoencoder,which based on Tensorflow,was cited to process the motor vibration *** method firstly normalized all data sets Next,these data sets were input into the built variational autoencoder model to train the weights and biases as the feature learning is going ***,a Softmax Regression model is used for multi-faults *** final results showed that this method can be used fo finishing multi-faults detecting missions excellently,and for every metric,the results are betterthan traditional Back Propagation Neura Network,from 87.51% to 93.61%.Hence,this unsupervised feature learning method decreased lots of Machine learning model's dependency on feature *** would be a good guidance of actual projects.
Human-robot interaction (HRI) is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research in this area is still at an early stage for...
详细信息
ISBN:
(纸本)9781450394321
Human-robot interaction (HRI) is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research in this area is still at an early stage for human-robot collaboration (HRC). The intervention of a robot in human collaboration could be helpful to handle mutual disturbances of workers operating at the same time on the same target object. Therefore, this work outlines design methodologies of non-dyadic human-robot collaborations to address concurrent human-human tasks in manufacturing applications. After this, preliminary results regarding a robotic agent's high-level understanding of such scenarios realised through a variational autoencoder trained by means of transfer learning are shown.
Constrained optimization problems can be difficult because their search spaces have properties not conducive to search, e.g., multimodality, discontinuities, or deception. To address such difficulties, considerable re...
详细信息
ISBN:
(纸本)9781450392686
Constrained optimization problems can be difficult because their search spaces have properties not conducive to search, e.g., multimodality, discontinuities, or deception. To address such difficulties, considerable research has been performed on creating novel evolutionary algorithms or specialized genetic operators. However, if the representation that defined the search space could be altered such that it only permitted valid solutions that satisfied the constraints, the task of finding the optimal would be made more feasible without any need for specialized optimization algorithms. We propose Constrained Optimization in Latent Space (COIL), which uses a VAE to generate a learned latent representation from a dataset comprising samples from the valid region of the search space according to a constraint, thus enabling the optimizer to find the objective in the new space defined by the learned representation. Preliminary experiments show promise: compared to an identical GA using a standard representation that cannot meet the constraints or find fit solutions, COIL with its learned latent representation can perfectly satisfy different types of constraints while finding high-fitness solutions.
暂无评论