Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same ...
详细信息
ISBN:
(纸本)9781510848764
Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios. In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages. In this case, one possible, although indirect, solution is to build a generative model for speech. Generative models focus on explaining the observations with latent variables instead of learning a pairwise transformation function, thereby bypassing the requirement of speech frame alignment. In this paper, we propose a non-parallel VC framework with a variational autoencoding Wasserstein generative adversarial network (VAW-GAN) that explicitly considers a VC objective when building the speech model. Experimental results corroborate the capability of our framework for building a VC system from unaligned data, and demonstrate improved conversion quality.
Collecting high-quality medical image data for machine learning applications remains a significant challenge due to data scarcity, privacy concerns, and high annotation costs. To address these issues, vision generativ...
详细信息
Collecting high-quality medical image data for machine learning applications remains a significant challenge due to data scarcity, privacy concerns, and high annotation costs. To address these issues, vision generative models, particularly Latent Diffusion Models (LDMs), have emerged as state-of-the-art solutions that reduce computational demands while maintaining superior performance in data generation tasks. In this study, we propose an enhanced LDM-based approach that integrates separable self-attention mechanisms within the diffusion process, positioned after residual blocks, to improve the capture of detailed features and maintain spatial consistency. This modification reduces memory usage by 82.94% and decreases the Fréchet Inception Distance (FID) by 25.01% compared to traditional self-attention models, all while preserving image quality. Our method addresses critical challenges such as data scarcity and computational efficiency in medical imaging by combining variational autoencoders (VAEs) for latent space mapping with U-Net for noise prediction. Evaluations on five datasets — PneumoniaMNIST, BloodMNIST, ChestMNIST, Dental4k, and HandMNIST — demonstrate significant improvements in computational efficiency, memory usage, and the quality of generated images, showcasing the potential of our approach for scalable and effective medical image synthesis.
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic ...
详细信息
Following the widespread use of deep learning for genomics, deep generative modeling is also becoming a viable methodology for the broad field. Deep generative models (DGMs) can learn the complex structure of genomic data and allow researchers to generate novel genomic instances that retain the real characteristics of the original dataset. Aside from data generation, DGMs can also be used for dimensionality reduction by mapping the data space to a latent space, as well as for prediction tasks via exploitation of this learned mapping or supervised/semi-supervised DGM designs. In this review, we briefly introduce generative modeling and two currently prevailing architectures, we present conceptual applications along with notable examples in functional and evolutionary genomics, and we provide our perspective on potential challenges and future directions.
Text processing techniques in Natural Language Processing (NLP) find applications in many industries such as pharmaceutical, automation, and automotive. Drug design using variational autoencoders is a popular data-ass...
详细信息
Text processing techniques in Natural Language Processing (NLP) find applications in many industries such as pharmaceutical, automation, and automotive. Drug design using variational autoencoders is a popular data-assisted technique to design drug molecules with control over molecular properties. It generates continuous latent space, which can be optimized. This paper introduces a constrained variational autoencoder-based molecular generation structure using the SMILES format. The proposal is accompanied by the generation of molecules, filtering them based on scores, and subsequently determining the optimal molecules by using NLP matured techniques. To generate more meaningful latent space, a condition vector of molecular properties is combined with the SMILES representation of molecules. A tunable parameter (diversity,D) is also used to control the diversity in the generated molecules. The proposed architecture is evaluated using standard datasets. Validity, uniqueness, and FCD are evaluation matrices used to access the performance of model. The validity of proposed model is maximum (92.11%) at diversity level 1. As diversity level increases the validity of generated molecules decreases. This is intuitively consistent because increased diversity reduces replicas and improves variety in the generated molecules. Thus proposed model provide control over diversity of generated molecules. The results clearly indicate that the proposed method outperforms other SMILE based methods and gives a new direction for the generation of desired molecules.
暂无评论