The class of disentangled sequential auto-encoders factorises speech into time-invariant (global) and time-variant (local) representations for speaker identity and linguistic content, respectively. Many of the existin...
详细信息
ISBN:
(纸本)9798350344868;9798350344851
The class of disentangled sequential auto-encoders factorises speech into time-invariant (global) and time-variant (local) representations for speaker identity and linguistic content, respectively. Many of the existing models employ this assumption to tackle zero-shot voice conversion (VC), which converts speaker characteristics of any given utterance to any novel speakers while preserving the linguistic content. However, balancing capacity between the two representations is intricate, as the global representation tends to collapse due to its lower information capacity along the time axis than that of the local representation. We propose a simple and effective dropout technique that applies an information bottleneck to the local representation via multiplicative Gaussian noise, in order to encourage the usage of the global one. We endow existing zero-shot VC models with the proposed method and show significant improvements in speaker conversion in terms of speaker verification acceptance rate and comparable or better intelligibility measured in character error rate.
Globally, forests are net carbon sinks that partly mitigates anthropogenic climate change. However, there is evidence of increasing weather-induced tree mortality, which needs to be better understood to improve forest...
详细信息
Globally, forests are net carbon sinks that partly mitigates anthropogenic climate change. However, there is evidence of increasing weather-induced tree mortality, which needs to be better understood to improve forest management under future climate conditions. Disentangling drivers of tree mortality is challenging because of their interacting behavior over multiple temporal scales. In this study, we take a data-driven approach to the problem. We generate hourly temperate weather data using a stochastic weather generator to simulate 160,000 years of beech, pine, and spruce forest dynamics with a forest gap model. These data are used to train a generative deep learning model (a modified variational autoencoder) to learn representations of three-year-long monthly weather conditions (precipitation, temperature, and solar radiation) in an unsupervised way. We then associate these weather representations with years of high biomass loss in the forests and derive weather prototypes associated with such years. The identified prototype weather conditions are associated with 5 -22% higher median biomass loss compared to the median of all samples, depending on the forest type and the prototype. When prototype weather conditions co-occur, these numbers increase to 10 -25%. Our research illustrates how generative deep learning can discover compounding weather patterns associated with extreme impacts. Impact Statement Tree mortality is a complex phenomenon involving multiple processes at different temporal scales. Here, we rely on very long simulations of a forest model and develop a method based on generative deep learning to find the relationship between complex weather patterns and tree mortality. The generative nature of the method allows for the generation of new realistic weather conditions outside of the provided samples, which are associated with high biomass loss in a forest. Furthermore, the method can be applied to different weather-driven impacts, adding to the growin
Unknown smart contract vulnerabilities objectively exist in addition to common vulnerabilities, and their potential risks cannot be ignored. Therefore, it is crucial to enhance the model's ability to detect these ...
详细信息
ISBN:
(纸本)9798350368529;9798350368512
Unknown smart contract vulnerabilities objectively exist in addition to common vulnerabilities, and their potential risks cannot be ignored. Therefore, it is crucial to enhance the model's ability to detect these unknown threats. Detection models are typically trained for specific types of vulnerability data. As a result, their effectiveness in detecting new vulnerabilities can be unsatisfactory. When a vulnerability is not defined in the model, it is considered an unknown vulnerability. Simulating attack scenarios or manual auditing is typically required for unknown threat detection. However, this results in a limited number of unknown threat samples available in the smart contract dataset, which greatly hinders the effectiveness of unknown threat detection due to dataset imbalance. In this paper, in order to improve the detection ability of unknown vulnerabilities of smart contracts, we propose a new method of unknown threat detection, which firstly expands the unknown threat samples by equalizing the data features of opcodes and source codes through a variational autoencoder. Then, a domain-adaptive training model DSN is used to learn the private and public features of the source and target domains of smart contracts, respectively. Transferring features and adversarial learning between the source and target domains is achieved through domain classification and reconfiguration tasks. The experiments demonstrate that this method can significantly enhance the ability to identify unknown threats in smart contracts.
Current data-driven methods for gas path fault diagnosis in aero-engines often require extensive fault sample sets, which are challenging and expensive to obtain in practice. Additionally, even with limited samples, t...
详细信息
ISBN:
(纸本)9780791887929
Current data-driven methods for gas path fault diagnosis in aero-engines often require extensive fault sample sets, which are challenging and expensive to obtain in practice. Additionally, even with limited samples, these methods face issues of interclass imbalance, including imbalances in normal and fault data quantities, imbalances in the quantities of different fault classes, and imbalances in fault severity. Additionally, current methods for simulating gas path faults solely focus on the degradation of gas path component efficiency, neglecting the non-linear alterations in component characteristics caused by faults and the interconnected effects between components. Therefore, we propose to use a transfer learning-based variational autoencoder (TL-VAE) approach to generate fault samples and optimize the accuracy of gas path fault diagnosis. First, we train the VAE using normal engine operation samples. Then, by incorporating transfer learning, we retrain the VAE using a small number of fault samples by fine-tuning certain weights of VAE. This allows us to combine the operational state features from the source domain with the fault features from the target domain, fitting the distribution of the measured parameters of the faulty engine. This enables the TL-VAE to function as a generator of fault samples. Subsequent fault diagnosis strategies rely on mature classification methods and the generated samples, including Softmax and SVM classifiers. We validate the effectiveness and superiority of the proposed method through simulation. The experimental results demonstrate that the proposed method significantly improves fault diagnosis results with limited samples. Especially within the coverage of the generated samples, fault diagnosis accuracy (FDA) of the Softmax and SVM classifiers significantly improved from 73.3% and 66.5%, respectively, to a perfect 100% after employing the proposed approach. The FDA of Softmax and SVM classifiers with the proposed method is impr
Trajectory anomaly detection is crucial for effective decision-making in urban and human mobility management. Existing methods of trajectory anomaly detection generally focus on training a trajectory generative model ...
详细信息
ISBN:
(纸本)9798400711442
Trajectory anomaly detection is crucial for effective decision-making in urban and human mobility management. Existing methods of trajectory anomaly detection generally focus on training a trajectory generative model and evaluating the likelihood of reconstructing a given trajectory. However, previous work often lacks important contextual information on the trajectory, such as the agent's information (e.g., agent ID) or geographic information (e.g., Points of Interest (POI)), which could provide additional information on accurately capturing anomalous behaviors. To fill this gap, we propose a context-aware anomaly detection approach that models contextual information related to trajectories. The proposed method is based on a trajectory reconstruction framework guided by contextual factors such as agent ID and contextual POI embedding. The injection of contextual information aims to improve the performance of anomaly detection. We conducted experiments in two cities and demonstrated that the proposed approach significantly outperformed existing methods by effectively modeling contextual information. Overall, this paper paves a newdirection for advancing trajectory anomaly detection.
Understanding the latent representation of speech obtained by a deep unsupervised model is a key to powerful signal analysis, transformation, and generation. A number of studies have identified the directions of varia...
详细信息
ISBN:
(纸本)9798350374520;9798350374513
Understanding the latent representation of speech obtained by a deep unsupervised model is a key to powerful signal analysis, transformation, and generation. A number of studies have identified the directions of variation of individual speech acoustic features such as fundamental frequency or formant frequency in a deep latent space, but it is not well understood why the variation of such one-dimensional feature is often explained by multiple latent dimensions. This paper proposes a methodology for interpreting these dimensions, in the latent space of variational autoencoders trained on multi-speaker datasets. We show that for each acoustic feature, its distribution in the training set is encoded by one dedicated latent space direction. When the distribution is multimodal, different modes of the acoustic feature are encoded in separate dimensions. In that case, we also have identified the directions that explain the variation of the feature within and across modes, which paves the way to a finer control of such models.
With the rapid development of technologies such as autonomous driving, vehicle-to-everything communication, and edge computing, an increasing number of vehicles are equipped with multiple sensors to perceive the surro...
详细信息
ISBN:
(纸本)9798400712470
With the rapid development of technologies such as autonomous driving, vehicle-to-everything communication, and edge computing, an increasing number of vehicles are equipped with multiple sensors to perceive the surroundings. As a result, the amount of sensing data has exploded, and the communication pressure on the in-vehicle network becomes severe. In-sensor or near-sensor computation is considered an effective method to address these issues. However, current multi-modal fusion frameworks are challenging to be modularised and trained in a distributed manner across multiple devices. In this paper, we propose a variational autoencoder (VAE) based multi-modal fusion solution with its theoretical analysis framework. Notably, we design two auxiliary tasks to utilize data from a single modality to discover the joint distribution of multiple modalities. Compared to traditional algorithms, the proposed solution is able to use unlabeled data for self-supervised learning and has the added advantage of modularity, which helps to reduce the communication overhead in in-vehicle networks. Experiments show that, compared to single-modality algorithms, our multi-modal fusion framework increases average precision by over 10% on the KITTI dataset.
Unraveling the intricacies of Quadruple-Negative Breast Cancer (QNBC), this study leverages advanced analytics on RNAseq gene expression data. Employing unsupervised clustering techniques, our robust methodology encom...
详细信息
ISBN:
(纸本)9789819751273;9789819751280
Unraveling the intricacies of Quadruple-Negative Breast Cancer (QNBC), this study leverages advanced analytics on RNAseq gene expression data. Employing unsupervised clustering techniques, our robust methodology encompasses data preprocessing for interpretability, dimensionality reduction via variational autoencoders and Principal Component Analysis (PCA), and optimization of k-means clustering using internal validation indices. The analysis unveils two distinct QNBC subtypes, substantiated by high Silhouette (0.24) and Calinski-Harabasz (28.81) scores. Statistical profiling elucidates the genetic signatures characterizing these clusters, with Cluster 1 exhibiting genes like OR6P1 and TMEM247, while Cluster 2 displays distinct markers such as RNF17 and PRAC1. These data-driven patient stratifications hold promise for personalized assessments and targeted interventions, contingent upon clinical validation. This research highlights the synergy of machine learning and statistical analysis in charting a course toward more effective QNBC management strategies.
automated generation of speech audio that closely resembles human emotional speech has garnered significant attention from the society and the engineering academia. This attention is due to its diverse applications, i...
详细信息
automated generation of speech audio that closely resembles human emotional speech has garnered significant attention from the society and the engineering academia. This attention is due to its diverse applications, including audiobooks, podcasts, and the development of empathetic home assistants. In the scope of this study, it is introduced a novel approach to emotional speech transfer utilizing generative models and a selected emotional target desired for the output speech. The natural speech has been extended with contextual information data related with emotional speech cues. The generative models used for pursuing this task are a variational autoencoder model and a conditional generative adversarial network model. In this case study, an input voice audio, a desired utterance, and user-selected emotional cues, are used to produce emotionally expressive speech audio, transferring an ordinary speech audio with added contextual cues, into a happy emotional speech audio by a variational autoencoder model. The model try to reproduce in the ordinary speech, the emotion present in the emotional contextual cues used for training. The results show that, the proposed unsupervised VAE model with custom dataset for generating emotional data reach an MSE lower than 0.010 and an SSIM almost reaching the 0.70, while most of the values are greater than 0.60, respect to the input data and the generated data. CGAN and VAE models when generating new emotional data on demand, show a certain degree of success in the evaluation of an emotion classifier that determines the similarity with real emotional audios.
Vertical Federated Learning (VFL) is becoming a standard collaborative learning paradigm with various practical applications. Randomness is essential to enhancing privacy in VFL, but introducing too much external rand...
详细信息
ISBN:
(纸本)9783031708893;9783031708909
Vertical Federated Learning (VFL) is becoming a standard collaborative learning paradigm with various practical applications. Randomness is essential to enhancing privacy in VFL, but introducing too much external randomness often leads to an intolerable performance loss. Instead, as it was demonstrated for other federated learning settings, leveraging internal randomness - as provided by variational autoencoders (VAEs) -can be beneficial. However, the resulting privacy has never been quantified so far, nor has the approach been investigated for VFL. We therefore propose a novel differential privacy (DP) estimate, denoted as distance-based empirical local differential privacy (dELDP). It allows us to empirically bound DP parameters of models or model components, quantifying the internal randomness with appropriate distance and sensitivity metrics. We apply dELDP to investigate the DP of VAEs and observe values up to epsilon approximate to 6.4 and delta = 2(-32). Based on this, to link the dELDP parameters to the privacy of VAE-including VFL systems in practice, we conduct comprehensive experiments on the robustness against state-of-the-art privacy attacks. The results illustrate that the VAE system is robust against feature reconstruction attacks and outperforms other privacy-enhancing methods for VFL, especially when the adversary holds 75% of the features during label inference attacks.
暂无评论