Generating novel drug molecules with desired biological properties is a time consuming and complex task. Conditional generative adversarial models have recently been proposed as promising approaches for de novo drug d...
详细信息
ISBN:
(纸本)9783030975463;9783030975456
Generating novel drug molecules with desired biological properties is a time consuming and complex task. Conditional generative adversarial models have recently been proposed as promising approaches for de novo drug design. In this paper, we propose a new generative model which extends an existing adversarial autoencoder (AAE) based model by stacking two models together. Our stacked approach generates more valid molecules, as well as molecules that are more similar to known drugs. We break down this challenging task into two sub-problems. A first stage model to learn primitive features from the molecules and gene expression data. A second stage model then takes these features to learn properties of the molecules and refine more valid molecules. Experiments and comparison to baseline methods on the LINCS L1000 dataset demonstrate that our proposed model has promising performance for molecular generation.
While vast amounts of personal data are shared daily on public online platforms and used by companies and analysts to gain valuable insights, privacy concerns are also on the rise: Modern authorship attribution techni...
详细信息
ISBN:
(纸本)9781450390965
While vast amounts of personal data are shared daily on public online platforms and used by companies and analysts to gain valuable insights, privacy concerns are also on the rise: Modern authorship attribution techniques have proven effective at identifying individuals from their data, such as their writing style or behavior of picking and judging movies. It is hence crucial to develop data sanitization methods that allow sharing of users' data while protecting their privacy and preserving quality and content of the original data. In this paper, we tackle anonymization of textual data and propose an end-to-end differentially private variational autoencoder architecture. Unlike previous approaches that achieve differential privacy on a per-word level through individual perturbations, our solution works at an abstract level by perturbing the latent vectors that provide a global summary of the input texts. Decoding an obfuscated latent vector thus not only allows our model to produce coherent, high-quality output text that is human-readable, but also results in strong anonymization due to the diversity of the produced data. We evaluate our approach on IMDb movie and Yelp business reviews, confirming its anonymization capabilities and preservation of the semantics and utility of the original sentences.
The production of numerous high fidelity simulations has been a key aspect of research for many-query problems in fluid dynamics. The computational resources and time required to generate these simulations can be so l...
详细信息
ISBN:
(纸本)9783031087578;9783031087561
The production of numerous high fidelity simulations has been a key aspect of research for many-query problems in fluid dynamics. The computational resources and time required to generate these simulations can be so large and impractical. With several successes of generative models, we explore the performance and powerful generative capabilities of both generative adversarial network (GAN) and adversarial autoencoder (AAE) to predict the evolution in time of a highly nonlinear fluid flow. These generative models are incorporated within a reduced-order model framework. The test case comprises two-dimensional Gaussian vortices governed by the time-dependent Navier-Stokes equation. We show that both the GAN and AAE are able to predict the evolution of the positions of the vortices forward in time, generating new samples that have never before been seen by the neural networks.
Fatigue plays a critical role in sports science, significantly affecting recovery, training effectiveness, and overall athletic performance. Understanding and predicting fatigue is essential to optimize training, prev...
详细信息
Fatigue plays a critical role in sports science, significantly affecting recovery, training effectiveness, and overall athletic performance. Understanding and predicting fatigue is essential to optimize training, prevent overtraining, and minimize the risk of injuries. The aim of this study is to leverage Human Activity Recognition (HAR) through deep learning methods for dimensionality reduction. The use of adversarial autoencoders (AAEs) is explored to assess and visualize fatigue in a two-dimensional latent space, focusing on both semi-supervised and conditional approaches. By transforming complex time-series data into this latent space, the objective is to evaluate motor changes associated with fatigue within the participants' motor control by analyzing shifts in the distribution of data points and providing a visual representation of these effects. It is hypothesized that increased fatigue will cause significant changes in point distribution, which will be analyzed using clustering techniques to identify fatigue-related patterns. The data were collected using a Wii Balance Board and three Inertial Measurement Units, which were placed on the hip and both forearms (distal part, close to the wrist) to capture dynamic and kinematic information. The participants followed a fatigue-inducing protocol that involved repeating sets of 10 repetitions of four different exercises (Squat, Right Lunge, Left Lunge, and Plank Jump) until exhaustion. Our findings indicate that the AAE models are effective in reducing data dimensionality, allowing for the visualization of fatigue's impact within a 2D latent space. The latent space representation provides insights into motor control variations, revealing patterns that can be used to monitor fatigue levels and optimize training or rehabilitation programs.
This work deals with taking an unsupervised approach to abstractive text summarization where a large set of sentences is converted into a concise summary highlighting the essential details. This is achieved with the u...
详细信息
ISBN:
(纸本)9781665473507
This work deals with taking an unsupervised approach to abstractive text summarization where a large set of sentences is converted into a concise summary highlighting the essential details. This is achieved with the use of an adversarial autoencoder model. The model encodes the input to a smaller latent vector and the decoder decodes this latent code to generate the higher dimensional output with some loss. Unlike variational autoencoders, AAE's use discriminators to learn using adversarial loss. K-Means clustering and language models are used to get the final summary. This model has been tested with different datasets like the Amazon, Rotten Tomatoes and Yelp reviews dataset to essentially do an opinion summarization task and this is finally evaluated using ROGUE-1, ROGUE-2,ROGUE-L and BLEU scores. The same task is also conducted on a dataset in Hindi. We obtain a ROGUE-1 score of around 24% for Amazon, Yelp and CNN/Daily Mail dataset and a score of 12% for Rotten Tomatoes while the score obtained for the Hindi news articles dataset is only 8%.
The accuracy of sensor's measurements is critical for the normal operation of nuclear power plants. However, the harsh operating environment of nuclear power plants increases the probability of sensor failure. The...
详细信息
The accuracy of sensor's measurements is critical for the normal operation of nuclear power plants. However, the harsh operating environment of nuclear power plants increases the probability of sensor failure. Therefore, it is necessary to detect and identify the fault of sensors in time. This paper proposes a fault detection model based on graph attention network, which is called ALPHA-GAT, to solve multiple fault detection and identification of sensors in the primary coolant loop of a nuclear power plant. ALPHA-GAT first uses GAT to obtain the correlation matrices of sensors, and the correlation matrices are fed into an adversarial autoencoder for reconstruction. The node features of the GAT are parallelly fed into a long short -term memory network to predict the ground truth of sensors. Subsequently, the anomaly score which is used for fault detection and identification, is calculated based on reconstruction error and prediction error. A strategy based on fault decoupling to identify multi -sensor faults has also been proposed, which replaces the data of fault sensor with the prediction of the model. This strategy effectively alleviates the negative impact of fault data on subsequent diagnosis. Finally, the data from a simulation model of a nuclear power plant is used to verify the efficiency of the proposed model. The results of experiments indicate that ALPHA-GAT effectively improves the accuracy and timeliness of multi -sensor fault detection and identification.
Background: Single-cell RNA sequencing (scRNA-seq) strives to capture cellular diversity with higher resolution than bulk RNA sequencing. Clustering analysis is critical to transcriptome research as it allows for furt...
详细信息
Background: Single-cell RNA sequencing (scRNA-seq) strives to capture cellular diversity with higher resolution than bulk RNA sequencing. Clustering analysis is critical to transcriptome research as it allows for further identification and discovery of new cell types. Unsupervised clustering cannot integrate prior knowledge where relevant information is widely available. Purely unsupervised clustering algorithms may not yield biologically interpretable clusters when confronted with the high dimensional-ity of scRNA-seq data and frequent dropout events, which makes identification of cell types more ***: We propose scSemiAAE, a semi-supervised clustering model for scRNA sequence analysis using deep generative neural networks. Specifically, scSemiAAE carefully designs a ZINB adversarial autoencoder-based architecture that inherently integrates adversarial training and semi-supervised modules in the latent space. In a series of experiments on scRNA-seq datasets spanning thousands to tens of thousands of cells, scSemiAAE can significantly improve clustering performance compared to dozens of unsupervised and semi-supervised algorithms, promoting clustering and interpretability of downstream ***: scSemiAAE is a Python-based algorithm implemented on the VSCode platform that provides efficient visualization, clustering, and cell type assignment for scRNA-seq data. The tool is available from https://***/WHang98/scSemiAAE.
Limited-view Computed Tomography (CT) can be used to efficaciously reduce radiation dose in clinical diagnosis, it is also adopted when encountering inevitable mechanical and physical limitation in industrial inspecti...
详细信息
Limited-view Computed Tomography (CT) can be used to efficaciously reduce radiation dose in clinical diagnosis, it is also adopted when encountering inevitable mechanical and physical limitation in industrial inspection. Nevertheless, limited-view CT leads to severe artifacts in its imaging, which turns out to be a major issue in the low dose protocol. Thus, how to exploit the limited prior information to obtain high-quality CT images becomes a crucial issue. We notice that almost all existing methods solely focus on a single CT image while neglecting the solid fact that, the scanned objects are always highly spatially correlated. Consequently, there lies bountiful spatial information between these acquired consecutive CT images, which is still largely left to be exploited. In this paper, we propose a novel hybrid-domain structure composed of fully convolutional networks that groundbreakingly explores the three-dimensional neighborhood and works in a "coarse-to-fine" manner. We first conduct data completion in the Radon domain, and transform the obtained full-view Radon data into images through FBP. Subsequently, we employ the spatial correlation between continuous CT images to productively restore them and then refine the image texture to finally receive the ideal high-quality CT images, achieving PSNR of 40.209 and SSIM of 0.943. Besides, unlike other current limited-view CT reconstruction methods, we adopt FBP (and implement it on GPUs) instead of SART-TV to significantly accelerate the overall procedure and realize it in an end-to-end manner.
The effective identification of geochemical anomalies is essential in mineral exploration. Recently, data-driven deep learning algorithms have gained popularity for recognizing the geochemical patterns linked to miner...
详细信息
The effective identification of geochemical anomalies is essential in mineral exploration. Recently, data-driven deep learning algorithms have gained popularity for recognizing the geochemical patterns linked to mineralization. While purely data-driven deep learning algorithms can exploit geochemical patterns well, but the predicted and extracted results may be inconsistent with the geologic knowledge. In this study, a geologicallyconstrained deep learning algorithm was proposed to extract multivariate geochemical anomalies associated with W polymetallic mineralization in the south Jiangxi Province, China. The construction of the proposed algorithm involved two steps: (1) quantifying the spatial distribution of the known mineral deposits via fractal analysis, and (2) using prior knowledge obtained by the fractal analysis as a geological constraint to restrain an adversarial autoencoder network for delineating geochemical anomalies associated with mineralization. We conducted a comparative study of geologically-constrained and purely data-driven deep learning algorithms. We found that the former obtained more reasonable and interpretable geochemical anomalies linked to W mineralization. The results obtained by a geologically-constrained deep learning algorithm were more consistent with the regional metallogenic law. Therefore, this geological constraint can improve the generalization ability of the deep learning algorithm and enhance the interpretation of the obtained results in geosciences.
作者:
Guo, QianXu, FengFudan Univ
Key Lab Informat Sci Electromagnet Waves MoE Shanghai 200433 Peoples R China
Due to the lack of raw data, difficulty in labeling as well as the sensor parameters limitation, few-shot learning in SAR images has become an important research direction. A deep feature transformation method based o...
详细信息
ISBN:
(纸本)9781665403696
Due to the lack of raw data, difficulty in labeling as well as the sensor parameters limitation, few-shot learning in SAR images has become an important research direction. A deep feature transformation method based on differential vector is proposed in this paper to alleviate the feature drift problem in few-shot learning. Firstly, samples generated by a modified adversarial Auto Encoder (AAE) are introduced as an auxiliary dataset for the few-shot training dataset. Secondly, differential vector is further proposed to alleviate the cross-class-bias between the generated and real data in deep feature space. It is defined as the mean difference of the low-dimensional feature vectors of the real and generated samples, which is considered to be able to describe the transformation direction from generated to real data. Experiments conducted on MSTAR dataset demonstrate the feasibility, effectiveness and superiority of proposed method.
暂无评论