Identifying cell-type-specific enhancers is critical for developing genetic tools to study the mammalian brain. We organized the "Brain Initiative Cell Census Network (BICCN) Challenge: Predicting Functional Cell...
详细信息
Identifying cell-type-specific enhancers is critical for developing genetic tools to study the mammalian brain. We organized the "Brain Initiative Cell Census Network (BICCN) Challenge: Predicting Functional Cell Type-Specific Enhancers from Cross-Species Multi-Omics" to evaluate machine learning and feature-based methods for nominating enhancer sequences targeting mouse cortical cell types. Methods were assessed using in vivo data from hundreds of adeno-associated virus (AAV)-packaged, retro-orbitally delivered enhancers. Open chromatin was the strongest predictor of functional enhancers, while sequence models improved prediction of non-functional enhancers and identified cell-type-specific transcription factor codes to inform in silico enhancer design. This challenge establishes a benchmark for enhancer prioritization and highlights computational and molecular features critical for identifying functional cortical enhancers, advancing efforts to map and manipulate gene regulation in the mammalian cortex.
Complex high dimensional stochastic dynamic systems arise in many applications in the natural sciences and especially biology. However, while these systems are difficult to describe analytically, "snapshot" ...
详细信息
ISBN:
(数字)9781728137414
ISBN:
(纸本)9781728137421
Complex high dimensional stochastic dynamic systems arise in many applications in the natural sciences and especially biology. However, while these systems are difficult to describe analytically, "snapshot" measurements that sample the output of the system are often available. In order to model the dynamics of such systems given snapshot data, or local transitions, we present a deep neural network framework we call Dynamics Modeling Network or DyMoN. DyMoN is a neural network framework trained as a deep generative Markov model whose next state is a probability distribution based on the current state. DyMoN is trained using samples of current and next-state pairs, and thus does not require longitudinal measurements. We show the advantage of DyMoN over shallow models such as Kalman filters and hidden Markov models, and other deep models such as recurrent neural networks in its ability to embody the dynamics (which can be studied via perturbation of the neural network) and generate longitudinal hypothetical trajectories. We perform three case studies in which we apply DyMoN to different types of biological systems and extract features of the dynamics in each case by examining the learned model.
Side effects from targeted drugs remain a serious conccrn. One reason is the nonselective binding of a drug to unintended proteins such as its paralogs, which arc highly homologous in sequences and have similar struct...
详细信息
Side effects from targeted drugs remain a serious conccrn. One reason is the nonselective binding of a drug to unintended proteins such as its paralogs, which arc highly homologous in sequences and have similar structures and drug-binding pockets. To identify targctablc differences between paralogs, we analyzed two types (type-I and type-ll) of functional divergence between two paralogs in the known target protein receptor family G-protein coupled receptors (GPCRs) at the amino acid level. Paralogous protein receptors in glucagon-like subfamily, glucagon receptor (GCGR) and glucagon-like peptide-I receptor (GLP-I R), exhibit divergence in ligands and are clinically validated drug targets for type 2 diabetes. Our data showed that type-ll alnino acids were significantly enriched in the binding sites of antagonist MK-0893 to GCGR. which had a radical shift in physicochemical properties between GCGR and GLP-1R. We also examined the role of type-I amino acids between GCGR and GLP-IR. The divergent features between GCGR and GLP-I R paralogs may be helpful in their discrimination, thus enabling the identification of binding sites to reduce undesirable side effects and increase the target specificity of drugs.
We propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing...
We propose a new type of generative model for high-dimensional data that learns a manifold geometry of the data, rather than density, and can generate points evenly along this manifold. This is in contrast to existing generative models that represent data density, and are strongly affected by noise and other artifacts of data collection. We demonstrate how this approach corrects sampling biases and artifacts, thus improves several downstream data analysis tasks, such as clustering and classification. Finally, we demonstrate that this approach is especially useful in biology where, despite the advent of single-cell technologies, rare subpopulations and gene-interaction relationships are affected by biased sampling. We show that SUGAR can generate hypothetical populations, and it is able to reveal intrinsic patterns and mutual-information relationships between genes on a single-cell RNA sequencing dataset of hematopoiesis.
Background IFITM3, an innate immune response protein and inhibitor of viral infection, was reported to modulate amyloid-β production in Alzheimer’s disease (AD). We aimed to identify single-nucleotide polymorphisms ...
Background IFITM3, an innate immune response protein and inhibitor of viral infection, was reported to modulate amyloid-β production in Alzheimer’s disease (AD). We aimed to identify single-nucleotide polymorphisms (SNPs) in IFITM3 associated with cognition and AD biomarkers. Method We used genetic, longitudinal cognition and AD biomarker data from Alzheimer’s Disease Neuroimaging Initiative (ADNI; N = 1,565) and AddNeuroMed (N = 633) as discovery and replication samples, respectively. First, we performed gene-based association analysis of SNPs in IFITM3 with cognitive performance. Second, we performed SNP-based association analysis in IFITM3 with cognitive decline and AD biomarkers from amyloid positron emission tomography (PET), cerebrospinal fluid (CSF), and magnetic resonance imaging (MRI). Result Gene-based association analysis showed that IFITM3 was significantly associated with cognitive performance (permutation-corrected p = 1.25×10 −3 ). Particularly, among two SNPs (rs10751647, rs2091850) in IFITM3 significantly associated with cognitive performance, rs10751647 was associated with cognitive decline in ADNI, which was replicated in AddNeuroMed. In addition, rs10751647 was significantly associated with amyloid-β deposition measured by amyloid PET scan, CSF phosphorylated tau levels, and entorhinal cortical thickness measured by MRI scan in ADNI. The association of rs10751647 with entorhinal cortical thickness was replicated in AddNeuroMed. Participants with minor alleles (C) of rs10751647 have less cognitive decline, less amyloid and tau burden, and less brain atrophy. eQTL analysis showed that rs10751647 is associated with IFITM3 expression levels in blood and brain. Conclusion This suggests that rs10751647 in IFITM3 is associated with less vulnerability for cognitive decline and AD biomarkers, providing mechanistic insight regarding involvement of immune activity and infection in AD.
Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number...
详细信息
Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest. Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features. Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases. Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package "viRandomForests" based on the original R package "randomForest" and it can be freely downloaded from http:// ***/software.
The analysis of cancer genomic data has long suffered "the curse of dimensionality". Sample sizes for most cancer genomic studies are a few hundreds at most while there are tens of thousands of genomic featu...
详细信息
暂无评论