Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emoti...
详细信息
ISBN:
(纸本)9781665488679
Music accompaniment generation is a crucial aspect in the composition process. Deep neural networks have made significant strides in this field, but it remains a challenge for AI to effectively incorporate human emotions to create beautiful accompaniments. Existing models struggle to effectively characterize human emotions within neural network models while composing music. To address this issue, we propose the use of an easy-to-represent emotion flow model, the Valence/Arousal Curve, which allows for the compatibility of emotional information within the model through data transformation and enhances interpretability of emotional factors by utilizing a variational autoencoder as the model structure. Further, we used relative self-attention to maintain the structure of the music at music phrase level and to generate a richer accompaniment when combined with the rules of music theory. Our experimental results indicate that the emotional flow of the music generated by our model has a strong correlation with the input emotion, demonstrating the model's strong interpretability and control of emotional flow. The generated music is also well-structured, diverse, and dynamic, outperforming the baseline models.
Linear Motion (LM) is a linear motion guide that helps directional moving of machine. It is important to judge the anomaly state of LM guides because LM guides are used in various industries to support various task in...
详细信息
ISBN:
(纸本)9781479975143
Linear Motion (LM) is a linear motion guide that helps directional moving of machine. It is important to judge the anomaly state of LM guides because LM guides are used in various industries to support various task in industry application. In this paper, we proposed a machine learning algorithm for determining the anomaly state of LM guide. Considering that it is difficult to actually generate the anomaly signal, we trained model with only healthy state data. One of the generative models, variational autoencoder, is used for training healthy state data and the distribution of healthy state data is trained. Our trained model determines whether or not anomaly state has occurred based on a reconstruction error of the trained network.
Ultrasound imaging has become a preferred medical diagnostics tool for many applications due to its cost-effectiveness, non-ionizing nature, and real-time capabilities. There has been a significant progress in the dev...
详细信息
ISBN:
(纸本)9798350317107;9798350317114
Ultrasound imaging has become a preferred medical diagnostics tool for many applications due to its cost-effectiveness, non-ionizing nature, and real-time capabilities. There has been a significant progress in the development of new ultrasound probes and systems, particularly portable and wearable devices, incorporating new transducer technologies, sophisticated electronics integration, artificial intelligence and advanced beamforming strategies. Wearable ultrasound systems, equipped with wireless data transfer interfaces, offer unique advantages for continuous signal monitoring of the patients for their critical conditions both in and out-of-hospital settings. Many challenges specifically in data rate reduction for wireless real-time systems needs to be explored. To address this issue, in this paper, we present a vector quantized variational autoencoder model to effectively compress ultrasound RF signals without compromising image quality. We tested and evaluated the performance of the model on real ultrasound datasets. The experimental results demonstrate 92% of data reduction enabling achievable real-time imaging speeds over wireless channels.
Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other...
详细信息
ISBN:
(纸本)9783030630065;9783030630072
Speech separation plays an important role in a speech-related system since it can denoise, extract, and enhance speech signals. In recent years, many methods are proposed to separate the human voice of noise and other sounds. To separate the speech from a complicated signal, we propose a more powerful method by using a VAE model and then postprocessing with a bandpass filter. This combination can use to extract the original human speech in the mixture with not only high-frequency noise but also many different sounds. Our approach can be flexibly applied for the new background sounds.
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spect...
详细信息
ISBN:
(纸本)9781728170664
In this paper, we propose a method combining variational autoencoder model of speech with a spatial clustering approach for multi-channel speech separation. The advantage of integrating spatial clustering with a spectral model was shown in several works. As the spectral model, previous works used either factorial generative models of the mixed speech or discriminative neural networks. In our work, we combine the strengths of both approaches, by building a factorial model based on a generative neural network, a variational autoencoder. By doing so, we can exploit the modeling power of neural networks, but at the same time, keep a structured model. Such a model can be advantageous when adapting to new noise conditions as only the noise part of the model needs to be modified. We show experimentally, that our model significantly outperforms previous factorial model based on Gaussian mixture model (DOLPHIN), performs comparably to integration of permutation invariant training with spatial clustering, and enables us to easily adapt to new noise conditions.
In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited...
详细信息
ISBN:
(纸本)9781728100647
In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited-sample size (HDLSS) problem. Due to the limited-sample-size, there is a lack of enough training data to train classification models. Also, the `curse of dimensionality' aspect is often a restriction on the effectiveness of many methods for solving HDLSS problem. Classification model with limited-sample dataset lead to overfitting and cannot achieve a satisfactory result. Thus, the unsupervised method is a better choice to solve such problems. Due to the emergence of deep learning, their plenty of applications and promising outcome, it is required an extensive analysis of the deep learning technique on HDLSS dataset. This paper aims at evaluating the performance of variational autoencoder (VAE) based dimensionality reduction and unsupervised classification on the HDESS dataset. The performance of VAE is compared with two existing techniques namely PCA and NMF on fourteen datasets in term of three evaluation metrics namely purity, Rand index, and NMI. The experimental result shows the superiority of VAE over the traditional methods on the HDLSS dataset.
Dimensionality reduction using variational autoencoder (VAE) is widely employed in learning diverse state representations, such as in autonomous driving tasks. Conventional VAE-based dimensionality reduction is a comm...
详细信息
ISBN:
(纸本)9798331505356;9798331505349
Dimensionality reduction using variational autoencoder (VAE) is widely employed in learning diverse state representations, such as in autonomous driving tasks. Conventional VAE-based dimensionality reduction is a commonly used method for reducing the computational cost associated with learning from high-dimensional data, particularly image data, while achieving high performance. In this paper, we investigate the impact of integrating VAE with Squeeze-and-Excitation Networks (SENet), referred to as SENet-VAE, on the accuracy of learning driving behaviors in deep reinforcement learning. We conduct a series of experiments comparing three setups: raw image data, conventional VAE, and SENet-VAE. Additionally, we explore the effect of applying hyperparameters to the Kullback-Leibler (KL) divergence term in the objective function of the SENet-VAE to further optimize performance. Our results demonstrate that SENet-VAE outperforms the conventional VAE in terms of learning accuracy, with hyperparameter tuning leading to performance gains.
To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (...
详细信息
ISBN:
(纸本)9781665405409
To address the issue of one-to-many mapping from phoneme sequences to acoustic features in expressive speech synthesis, this paper proposes a method of discourse-level prosody modeling with a variational autoencoder (VAE) based on the non-autoregressive architecture of FastSpeech. In this method, phone-level prosody codes are extracted from prosody features by combining VAE with Fast-Speech, and are predicted using discourse-level text features together with BERT embeddings. The continuous wavelet transform (CWT) in FastSpeech2 for F0 representation is not necessary anymore. Experimental results on a Chinese audiobook dataset show that our proposed method can effectively take advantage of discourse-level linguistic information and has outperformed FastSpeech2 on the naturalness and expressiveness of synthetic speech.
Deep learning (DL) has been recently used in several applications of machine health monitoring systems. Unfortunately, most of these DL models are considered as black-boxes with low interpretability. In this research,...
详细信息
ISBN:
(纸本)9781728156750
Deep learning (DL) has been recently used in several applications of machine health monitoring systems. Unfortunately, most of these DL models are considered as black-boxes with low interpretability. In this research, we propose an original PHM framework based on visual data analysis. The most suitable space dimension for the data visualization is the 2D-space, which necessarily involves a significant reduction from a high-dimensional to a low-dimensional data space. To perform the data analysis and the diagnostic interpretation in a PHM framework, a variational autoencoder (VAE) is used jointly with a classifier. The proposed model was evaluated to automatically recognize individual Partial Discharge (PD) sources for hydro generators monitoring.
In conventional deep learning fine tuning or transfer learning, the synaptic weights of neural networks that have been previously learned and acquired in a particular task solution are transferred to the learning of a...
详细信息
ISBN:
(纸本)9798350359374
In conventional deep learning fine tuning or transfer learning, the synaptic weights of neural networks that have been previously learned and acquired in a particular task solution are transferred to the learning of a similar task to achieve efficient learning of that task. In this case, both tasks are assumed to be similar, and the function to be transferred to solve a task is a single one. On the other hand, organisms can abstract multiple experiences acquired in the past and adapt them as knowledge to new tasks. This requires a new neural network architecture that fuses neural networks that have learned individual problem-solving functions and is capable of transfer learning of multiple functions. To acquire such an architecture, we aim to develop a framework to fuse neural networks with multiple individual functions to acquire a new network topology. Based on this, in this study, we develop a Developmental Artificial Neural Networks (DANN) capable of simultaneously transferring multiple functions that generate a neural network with new functions that integrate multiple functions. we use the framework of Weight Agnostic Neural Networks (WANN), in which the topology is acquired evolutionarily rather than by the synaptic weight of the networks, and the obtained topology is transformed into a matrix by using an embedded representation of the generation rules, which is then used as input to the encoder-decoder model based on Grammar variational autoencoder (GVAE). Through the experimental results, it was confirmed that multiple neural network models that solve individual tasks can be represented in the latent space, and that capturing the relationships among tasks.
暂无评论