Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues f...
详细信息
Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which includes Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local inter-point relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit inter-channel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% bitrate reductions on datasets 8iVFB, Owlii, 8iVSLF, Volograms, and MVUB compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained TSC-PCAC models are available at https://***/igizuxo/TSC-PCAC.
In industrial monitoring, although zero-shot learning successfully solves the problem of diagnosing unseen faults, it is difficult to diagnose both unseen and seen faults. Motivated by this, we propose a generalized z...
详细信息
In industrial monitoring, although zero-shot learning successfully solves the problem of diagnosing unseen faults, it is difficult to diagnose both unseen and seen faults. Motivated by this, we propose a generalized zero-shot semantic learning fault diagnosis model for batch processes called joint low-rank manifold distributional semantic embedding and multimodal variational autoencoder (mVAE). Firstly, joint low-rank representation and manifold learning makes the training samples map to the low-rank space, which obtains the global-local features of the samples while reducing the redundancy in the inputs for the training model;secondly, the bias of human-defined semantic attributes is corrected by predicting the attribute error rate;then, fault samples and corrected semantic vectors are embedded into the consistency space, in which the samples are reconstructed using the mVAE to fully integrate the cross-modal information, meanwhile, Barlow matrix is designed to measure the consistency between the fault samples and the attribute vectors, the higher the consistency, the higher the learning efficiency of attribute classifiers;finally, the generalized zero-shot fault diagnosis experiments are designed and conducted on the penicillin fermentation process and the semiconductor etching process to validate the effectiveness, the results show that the proposed model is indeed possible to diagnose target faults without their samples.
Experimental drug development is costly, complex, and time-consuming, and the number of drugs that have been put into application treatment is small. The identification of drug-disease correlations can provide importa...
详细信息
Experimental drug development is costly, complex, and time-consuming, and the number of drugs that have been put into application treatment is small. The identification of drug-disease correlations can provide important information for drug discovery and drug repurposing. Computational drug repurposing is an important and effective method that can be used to determine novel treatments for diseases. In recent years, an increasing number of large databases have been utilized for biological data research, particularly in the fields of drugs and diseases. Consequently, researchers have begun to explore the application of deep neural networks in biological data development. One particularly promising method for unsupervised learning is the deep generative model, with the variational autoencoder (VAE) being among the mainstream models. Here, we propose a drug indication prediction algorithm called DIDVAE (predicting new drug indications based on double variational autoencoders), which generates new data by learning the latent variable distribution of known data to achieve the goal of predicting drug-disease associations. In the experiment, we compared the DIDVAE algorithm with the BBNR, DrugNet, MBiRW and DRRS algorithms on a unified dataset. The comprehensive experimental results show that, compared with these prediction algorithms, the DIDVAE algorithm provides an overall improved prediction. In addition, further analysis and verification of the predicted unknown drug-disease association also proved the practicality of the method.
Research on variational autoencoders for collaborative filtering is gradually focusing on implicit feedback. However, most existing studies have two limitations: (1) they overlook the impact of user- item interaction ...
详细信息
Research on variational autoencoders for collaborative filtering is gradually focusing on implicit feedback. However, most existing studies have two limitations: (1) they overlook the impact of user- item interaction data in implicit feedback on the representations of both users and items, which can affect the latent representations;(2) their attention is mainly focused on the immediate feedback of recommended items, ignoring interactions between feedback and ground-truth values, and neglecting the difference on loss functions between different training processes. To address these limitations, we first propose a condition for variational autoencoders to control user and item representations to learn more useful information from the latent representations. Then, we train an adaptive loss critic ranking to directly provide ranking scores in collaborative filtering recommendations, which aims to minimize loss and improve interactions during different critic training processes. Extensive experiments on three big real-world social media datasets demonstrate that this approach outperforms the existing twelve models under NDCG and Recall metric estimation settings and significantly improves the performance of a variety of prediction models.& COPY;2023 Elsevier B.V. All rights reserved.
Fourier Transform infrared spectroscopy (FTIR) is an emerging cost effective and rapid mineralogical charac-terization technique being applied in the geosciences. Detecting anomalous FTIR spectra is especially relevan...
详细信息
Fourier Transform infrared spectroscopy (FTIR) is an emerging cost effective and rapid mineralogical charac-terization technique being applied in the geosciences. Detecting anomalous FTIR spectra is especially relevant to the geoscience domain, as it may indicate abrupt changes in geology or mineralogical composition of the rock sample being examined. Given a large volume of data, detecting anomalies that exhibit significant and abrupt spatial and compositional variability is a time-consuming and challenging task. This paper explores the use of an unsupervised variational autoencoder (VAE) for determining anomalies that may exist within a set of FTIR spectra collected from reverse circulation (RC) drill chip samples spanning several iron ore deposits from the Pilbara region in Western Australia. Diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) were measured from 1,579 two-metre composite samples. Our results showed that the VAE was effective in separating anomalous spectra from spectra typical of unmineralized banded iron formation by leveraging the probabilistic latent representation of the spectra in as few as two latent dimensions. To validate our results, detected anomalous samples were compared with their respective geochemical assays to analyse their mineralogical differences, which may have led to the anomalous spectra. In the iron ore sample data used in this study, the observed spectral anomalies were shown to have elevated concentrations of Al2O3 and TiO2 wt.% while being several standard deviations below the mean Fe2O3 wt.% indicating mineralogies rich in shale as opposed to iron oxide rich mineralogies. While the paper demonstrates the efficacy of the VAE in anomaly detection, it can also be effective in assuring the quality of the FTIR data as a pre-processing step, which is critically important for machine learning applications.
The application of machine learning is demonstrated for rapid and accurate extraction of plasmonic particles cluster geometries from hyperspectral image data via a dual variational autoencoder (dual-VAE). In this appr...
详细信息
The application of machine learning is demonstrated for rapid and accurate extraction of plasmonic particles cluster geometries from hyperspectral image data via a dual variational autoencoder (dual-VAE). In this approach, the information is shared between the latent spaces of two VAEs acting on the particle shape data and spectral data, respectively, but enforcing a common encoding on the shape-spectra pairs. It is shown that this approach can establish the relationship between the geometric characteristics of nanoparticles and their far-field photonic responses, demonstrating that hyperspectral darkfield microscopy can be used to accurately predict the geometry (number of particles, arrangement) of a multiparticle assemblies below the diffraction limit in an automated fashion with high fidelity (for monomers (0.96), dimers (0.86), and trimers (0.58). This approach of building structure-property relationships via shared encoding is universal and should have applications to a broader range of materials science and physics problems in imaging of both molecular and nanomaterial systems.
Recent advances in scanning tunneling and transmission electron microscopies (STM and STEM) have allowed routine generation of large volumes of imaging data containing information on the structure and functionality of...
详细信息
Recent advances in scanning tunneling and transmission electron microscopies (STM and STEM) have allowed routine generation of large volumes of imaging data containing information on the structure and functionality of materials. The experimental data sets contain signatures of long-range phenomena such as physical order parameter fields, polarization, and strain gradients in STEM, or standing electronic waves and carrier-mediated exchange interactions in STM, all superimposed onto scanning system distortions and gradual changes of contrast due to drift and/or mis-tilt effects. Correspondingly, while the human eye can readily identify certain patterns in the images such as lattice periodicities, repeating structural elements, or microstructures, their automatic extraction and classification are highly non-trivial and universal pathways to accomplish such analyses are absent. We pose that the most distinctive elements of the patterns observed in STM and (S)TEM images are similarity and (almost-) periodicity, behaviors stemming directly from the parsimony of elementary atomic structures, superimposed on the gradual changes reflective of order parameter distributions. However, the discovery of these elements via global Fourier methods is non-trivial due to variability and lack of ideal discrete translation symmetry. To address this problem, we explore the shift-invariant variational autoencoders (shift-VAEs) that allow disentangling characteristic repeating features in the images, their variations, and shifts that inevitably occur when randomly sampling the image space. Shift-VAEs balance the uncertainty in the position of the object of interest with the uncertainty in shape reconstruction. This approach is illustrated for model 1D data, and further extended to synthetic and experimental STM and STEM 2D data. We further introduce an approach for training shift-VAEs that allows finding the latent variables that comport to known physical behavior. In this specific case, t
The ability to translate Generative Adversarial Networks (GANs) and variational autoencoders (VAEs) into different modalities and data types is essential to improve Deep Learning (DL) for predictive medicine. This wor...
详细信息
The ability to translate Generative Adversarial Networks (GANs) and variational autoencoders (VAEs) into different modalities and data types is essential to improve Deep Learning (DL) for predictive medicine. This work presents DACMVA, a novel framework to conduct data augmentation in a cross-modal dataset by translating between modalities and oversampling imputations of missing data. DACMVA was inspired by previous work on the alignment of latent spaces in autoencoders. DACMVA is a DL data augmentation pipeline that improves the performance in a downstream prediction task. The unique DACMVA framework leverages a cross-modal loss to improve the imputation quality and employs training strategies to enable regularized latent spaces. Oversampling of augmented data is integrated into the prediction training. It is empirically demonstrated that the new DACMVA framework is effective in the often-neglected scenario of DL training on tabular data with continuous labels. Specifically, DACMVA is applied towards cancer survival prediction on tabular gene expression data where there is a portion of missing data in a given modality. DACMVA significantly (p << 0.001, one-sided Wilcoxon signed-rank test) outperformed the non-augmented baseline and competing augmentation methods with varying percentages of missing data (4%, 90%, 95% missing). As such, DACMVA provides significant performance improvements, even in very-low-data regimes, over existing state-of-the-art methods, including TDImpute and oversampling alone.
Knowledge Graph (KG) is an essential research field in graph theory, but its inherent incompleteness and sparsity influence its performance in several fields. Knowledge Graph Reasoning (KGR) aims to ameliorate those p...
详细信息
Knowledge Graph (KG) is an essential research field in graph theory, but its inherent incompleteness and sparsity influence its performance in several fields. Knowledge Graph Reasoning (KGR) aims to ameliorate those problems by mining new knowledge from subsistent knowledge. As one of the downstream tasks of KGR, link prediction is of great significance for improving the quality of KG. Recently, the Graph Neural Network (GNN)-based method became the most effective way to achieve the link prediction task. However, it still suffers from problems such as incomplete neighbor and relation-level information aggregation and unstable learning of the entity's features. To improve those issues, a Hierarchical and Interlamination Graph Self-attention Mechanism- based (HIGSM) plug-and-play architecture is proposed for KGR in this paper. It is composed of three-level layers: feature extractor, encoder, and decoder. The feature extractor makes our architecture more effective and stable for the retrieval of new features. The encoder is equipped with a two-stage encoding mechanism accompanied by two mixture-of-expert strategies, which enables our architecture to capture more practical reasoning information to improve prediction accuracy and generalization of the model. The decoder can use existing KGR models and compute the scores of triples in KG. The extensive experimental results and ablation studies on four KGs unambiguously demonstrate the state-of-the-art prediction performance of the proposed HIGSM architecture compared to current GNN-based methods.
With the continuous advancement of artificial intelligence (AI) and deep learning technologies, virtual image generation exhibits significant potential for application in photographic art creation. The primary objecti...
详细信息
With the continuous advancement of artificial intelligence (AI) and deep learning technologies, virtual image generation exhibits significant potential for application in photographic art creation. The primary objective of this study is to investigate the use of AI virtual image technology in photography, particularly focusing on achieving creative expression and artistic style transfer through deep learning models. Consequently, this study proposes a novel model that integrates conditional generative adversarial networks (cGANs) with variational autoencoders (VAEs). This model aims to effectively address the challenges associated with image generation and style conversion in photographic art by leveraging the realistic generation capabilities of cGANs alongside the diversity maintenance features of VAEs. In the experimental section, the proposed cGANs + VAEs model is systematically compared with traditional Deep Convolutional GANs (DCGAN) and Pix2Pix models through empirical analysis. The experimental results indicate that the cGANs + VAEs model significantly outperforms traditional models in terms of image quality, artistic expression, and user satisfaction. Expert reviews further confirm the model's superiority in artistic style imitation and creative generation. Additionally, user surveys reveal that most participants are highly satisfied with the images generated by the model, particularly regarding artistic perception and visual effects. Moreover, the cGANs + VAEs model demonstrates strong performance in Frechet Inception Distance (FID) and Inception Score (IS) across multiple datasets, yielding FID values of 13.67, 9.45, and 11.90 on the COCO, CelebA, and WikiArt datasets, respectively. In summary, the proposed cGANs + VAEs model not only achieves remarkable advancements in the technical performance of image generation but also exhibits considerable potential for practical applications in photographic art creation.
暂无评论