Effective multi-modal integration of single cell datasets is critical for uncovering the biological properties of cells from different molecular perspectives. However, this poses significant challenges, including how ...
详细信息
Effective multi-modal integration of single cell datasets is critical for uncovering the biological properties of cells from different molecular perspectives. However, this poses significant challenges, including how to preserve shared information and account for differences between differently distributed datasets, how to integrate datasets linked by different anchors (cells or features) and how to improve the quality of datasets for integration. In this dissertation, we introduce two novel models that address these challenges. First, we present scDMVAE, a neural network model that can capture both shared and data-specific aspects of datasets in a latent space. scDMVAE can handle both cell-linked and feature-linked datasets through its embedding learning and attention-based matching components, respectively. We demonstrate the effectiveness of scDMVAE on a cell-linked CITE-seq dataset to reveal different cell type relations between mRNA and protein, and on feature-linked SCoPE2 proteomics and scRNA-Seq mRNA human testis datasets to transfer labels from mRNA to protein. Additionally, we present PCRID, a principal curve based model that aligns the retention time of peptides to improve confidence estimates of peptide-spectrum-matches (PSMs) in SCoPE2 technology. PCRID outperforms existing models like DART-ID by handling non-linearities in retention time more effectively, increasing the identification rate of peptides by 154.53 % at a PEP threshold of 0.01 while controlling false discoveries. Together, these models represent significant advances in single cell data analysis and have broad applications across related fields.
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of ...
详细信息
ISBN:
(纸本)9781713836902
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient. In light of this, we suggest a novel, two-stage approach for DVC, which is highly flexible in that no normal speech of the patient is required. First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality. We investigate several design options. Experimental evaluation results demonstrate the potential of our approach to improving the quality of the dysarthric speech while maintaining the speaker identity.
Predicting future scenes based on historical frames is challenging, especially when it comes to the complex uncertainty in nature. We observe that there is a divergence between spatial-temporal variations of active pa...
详细信息
ISBN:
(纸本)9781450392037
Predicting future scenes based on historical frames is challenging, especially when it comes to the complex uncertainty in nature. We observe that there is a divergence between spatial-temporal variations of active patterns and non-active patterns in a video, where these patterns constitute visual content and the former ones implicate more violent movement. This divergence enables active patterns the higher potential to act with more severe future uncertainty. Meanwhile, the existence of non-active patterns provides an opportunity for machines to examine some underlying rules with a mutual constraint between non-active patterns and active patterns. In order to solve this divergence, we provide a method called active patterns-perceived stochastic video prediction (ASVP) which allows active patterns to be perceived by neural networks during training. Our method starts with separating active patterns along with non-active ones from a video. Then, both scene-based prediction and active pattern-perceived prediction are conducted to respectively capture the variations within the whole scene and active patterns. Specially for active pattern-perceived prediction, a conditional generative adversarial network (CGAN) is exploited to model active patterns as conditions, with a variational autoencoder (VAE) for predicting the complex dynamics of active patterns. Additionally, a mutual constraint is designed to improve the learning procedure for the network to better understand underlying interacting rules among these patterns. Extensive experiments are conducted on both KTH human action and BAIR action-free robot pushing datasets with comparison to state-of-the-art works. Experimental results demonstrate the competitive performance of the proposed method as we expected. The released code and models are at https://***/tolearnmuch/ASVP.
Data linkage plays a crucial role in realizing big data's value but is often regarded as a threat to personal privacy. Regulations like GDPR requires users' consent on each specific use of data, which is not p...
详细信息
ISBN:
(纸本)9781665424769
Data linkage plays a crucial role in realizing big data's value but is often regarded as a threat to personal privacy. Regulations like GDPR requires users' consent on each specific use of data, which is not practical for data analyzers. In this study, we propose a way to address the problem by having a trustworthy third party collect data from two or more parties, then use the data to train one or more variational autoencoder (VAE) models to remove privacy and send them to the data providers. Using this model, the users express their consent to share data with a trustworthy party. The third party links data from various datasets together to build a variational autoencoder model that allows all parties to generate datasets with full attributes without revealing sensitive personal data. System architectures and machine learning accuracy of generated data sets are measured in this study.
In this paper, we propose a rate controllable image compression framework, Rate Controllable variational autoencoder (RC-VAE), based on the Rate-Feature-Level (RFL) model established through our exploration on the cor...
详细信息
ISBN:
(纸本)9781665475921
In this paper, we propose a rate controllable image compression framework, Rate Controllable variational autoencoder (RC-VAE), based on the Rate-Feature-Level (RFL) model established through our exploration on the correlation among target rates, image features and quantization levels. Considering that, when meeting the same target rate, different images should be quantized in different levels, we focus on jointly utilizing the target rate and the extracted features of the image to predict the corresponding quantization level and propose the RFL model. Combining the proposed RFL model with a Hyperprior Continuously Variable Rate (HCVR) image compression network, we further propose the RC-VAE. By controlling information loss in quantization process, the RC-VAE can work at the target rate. Experimental results have demonstrated that one single RC-VAE model can adapt to multiple target rates with higher rate control accuracy and better R-D performance compared with the stateof-the-art rate controllable image compression networks.
Deep latent variable generative models based on variational autoencoder (VAE) have shown promising performance for audio-visual speech enhancement (AVSE). The underlying idea is to learn a VAE-based audio-visual prior...
详细信息
dB is a web-based interface that serves as a "drummer bot" for exploring interactive groove-making experiences with an AI percussion system. This system, leveraging variational autoencoders (VAEs), transform...
详细信息
In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion. We propose to first encode speech signals into a phoneme-rate latent representation with a variational autoencoder enhanced b...
详细信息
Recently, the real-time audio variational autoencoder (RAVE) method was developed for high-quality audio waveform synthesis. The RAVE method is based on a variational autoencoder and employs a two-stage training strat...
详细信息
To ensure the safety and reliability of complex industrial processes are very important. Therefore, extracting multiple features of data effectively is a great significance to improve the accuracy of modeling for faul...
详细信息
暂无评论