The recombination-activating genes (RAGs) encode for V(D)J recombinases responsible for rearrangements of antigen-receptor genes during T and B cell development, and RAG expression is known to correlate strictly with ...
详细信息
The recombination-activating genes (RAGs) encode for V(D)J recombinases responsible for rearrangements of antigen-receptor genes during T and B cell development, and RAG expression is known to correlate strictly with the process of rearrangement. There have been several studies of RAG1 illustrating biochemical, physiological and immunological properties. Hitherto, there are limited studies on RAG1 focusing molecular phylogenetic analyses, evolutionary traits, and genetic variants in human populations. Hence, there is a need of a comprehensive study on this topic. In the current report, we have shed light into insights of evolutionary traits and genetic variants of human RAG1 gene using 1092 genomes from human populations. Syntenic analyses revealed that two RAG genes are physically linked and conserved on the same locus in head-to-head orientation from sea urchin to human for about 550 MY. Spliceosomal introns have been in invaded in fishes and sea urchin, whereas gene structures of RAG1 gene from tetrapods remained single exon architecture. We compiled 751 genetic variants in human RAG1 gene using 1092 human genomes;where major stockholders of variant classes are 79% single nucleotide polymorphisms (SNPs), 12.2% somatic single nucleotide variants (somatic SNVs) and 6.8% deletion. Out of 267 missense variants, 140 are deleterious mutations. We identified 284 non-coding variants with 94% regulatory in nature. (C) 2015 Elsevier Inc. All rights reserved.
Purpose: Identifying pathogenic noncodingvariants is challenging. A single protein-altering variant is often identified in a recessive gene in individuals with developmental disorders (DD), but the prevalence of path...
详细信息
Purpose: Identifying pathogenic noncodingvariants is challenging. A single protein-altering variant is often identified in a recessive gene in individuals with developmental disorders (DD), but the prevalence of pathogenic noncoding " second hits" in trans with these is unknown. Methods: In 4073 genetically undiagnosed rare-disease trio probands from the 100,000 Genomes project, we identified rare heterozygous protein-altering variants in recessive DD- associated genes. We identified rare noncodingvariants on the other haplotype in introns, untranslated regions, promoters, and candidate enhancer regions. We clinically evaluated the top candidates for phenotypic fi t and performed functional testing where possible. Results: We identified 3761 rare heterozygous loss-of-function or ClinVar pathogenic variants in recessive DD-associated genes in 2430 probands. For 1366 (36.3%) of these, we identified at least 1 rare noncoding variant in trans. Bioinformatic fi ltering and clinical review, revealed 7 to be a good clinical fi t. After detailed characterization, we identified likely diagnoses for 3 probands (in GAA, NPHP3, and PKHD1) and candidate diagnoses in a further 3 ( PAH, LAMA2, and IGHMBP2). Conclusion: We developed a systematic approach to uncover new diagnoses involving compound heterozygous coding/noncodingvariants and conclude that this mechanism is likely to be a rare cause of DDs. (c) 2024 The Authors. Published by Elsevier Inc. on behalf of American College of Medical Genetics and Genomics. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
Genome-wide association studies have shown that common genetic variants associated with complex diseases are mostly located in non-coding regions, which may not be causal. In addition, the limited number of validated ...
详细信息
Genome-wide association studies have shown that common genetic variants associated with complex diseases are mostly located in non-coding regions, which may not be causal. In addition, the limited number of validated non-coding functional variants makes it difficult to develop an effective supervised learning model. Therefore, improving the accuracy of predicting non-coding causal variants has become critical. This study aims to build a transfer learning-based machine learning method for predicting regulatory variants to overcome the problem of limited sample size. This paper presents a supervised learning method transfer support vector machine (TSVM) for massively parallel reporter assays (MPRA) validated regulatory variants prediction. First, uses a convolutional neural network to extract features with transfer learning. Second, the extracted features are selected by random forest method. Third, the selected features are used to train support vector machine for classification. We performed scale sensitivity experiments on the MPRA dataset and validated the effectiveness of transfer learning. The model achieves the Mcc of 0.326 and the AUC of 0.720, which are higher than the state-of-the-art method.
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. T...
详细信息
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://***/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
Background Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-sc...
详细信息
Background Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. Results We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. Conclusion Our work provides new strategies for predicting the functional impact of non-coding variants.
Background Visceral artery aneurysms (VAAs) can be fatal if ruptured. Although a relatively rare incident, it holds a contemporary mortality rate of approximately 12%. VAAs have multiple possible causes, one of which ...
详细信息
Background Visceral artery aneurysms (VAAs) can be fatal if ruptured. Although a relatively rare incident, it holds a contemporary mortality rate of approximately 12%. VAAs have multiple possible causes, one of which is genetic predisposition. Here, we present a striking family with seven individuals affected by VAAs, and one individual affected by a visceral artery pseudoaneurysm. Methods We exome sequenced the affected family members and the parents of the proband to find a possible underlying genetic defect. As exome sequencing did not reveal any feasible protein-codingvariants, we combined whole-genome sequencing of two individuals with linkage analysis to find a plausible non-coding culprit variant. variants were ranked by the deep learning framework DeepSEA. Results Two of seven top-ranking variants, NC_000013.11:g.108154659C>T and NC_000013.11:g.110409638C>T, were found in all VAA-affected individuals, but not in the individual affected by the pseudoaneurysm. The second variant is in a candidate cis-regulatory element in the fourth intron of COL4A2, proximal to COL4A1. Conclusions As type IV collagens are essential for the stability and integrity of the vascular basement membrane and involved in vascular disease, we conclude that COL4A1 and COL4A2 are strong candidates for VAA susceptibility genes.
Transcription factor (TF) proteins bind to DNA in a sequence specific manner to regulate gene expression. The binding affinity of TFs for individual sites is well characterized and can be represented using DNA motif m...
详细信息
Transcription factor (TF) proteins bind to DNA in a sequence specific manner to regulate gene expression. The binding affinity of TFs for individual sites is well characterized and can be represented using DNA motif models such as position weight matrices. However, there are many factors influencing TF-DNA recognition in the cell, leading to complexities than cannot be captured by motif models alone. Here, we present our studies on two factors: cooperative TF binding and alterations in TF binding due to DNA mutations. Both factors require quantitative and rigorous approaches to distinguish real effects from random noise. First, we present a new method for characterizing cooperative binding of TFs to DNA. This method addresses the issue that TF binding sites located in close proximity, which occurs frequently across the human genome, are not necessarily bound cooperatively. To distinguish between cooperative and independent binding, we developed a high-throughput on-chip binding assay designed specifically to measure TF binding to neighboring sites. Using the experimental data from our assay, we trained machine learning models to differentiate between cooperative and independent binding of TFs. This method enabled us to reveal molecular mechanisms used by TFs to bind DNA cooperatively. Second, we introduce QBiC-Pred (Quantitative Predictions of TF Binding Changes Due to Sequence variants), an ordinary least squares based method to predict the magnitude of the effect of DNA mutations on TF-DNA recognition. We implemented QBiC-Pred as a web service: ***, which allows users to run our models through a user-friendly web interface. We used this method to identify non-recurring putative regulatory driver mutations in cancer. Our approach is novel because we prioritize mutations based on their effects on transcription factor (TF) binding, instead of relying on the recurrence of the mutations among tumor samples---which is often difficult to perform as indiv
INTRODUCTION:Sensorineural hearing impairment (SNHI), a common childhood disorder with heterogeneous genetic causes, can lead to delayed language development and psychosocial problems. Next-generation sequencing (NGS)...
详细信息
INTRODUCTION:Sensorineural hearing impairment (SNHI), a common childhood disorder with heterogeneous genetic causes, can lead to delayed language development and psychosocial problems. Next-generation sequencing (NGS) offers high-throughput screening and high-sensitivity detection of genetic etiologies of SNHI, enabling clinicians to make informed medical decisions, provide tailored treatments, and improve prognostic outcomes.
AREAS COVERED:This review covers the diverse etiologies of HHI and the utility of different NGS modalities (targeted sequencing and whole exome/genome sequencing), and includes HHI-related studies on newborn screening, genetic counseling, prognostic prediction, and personalized treatment. Challenges such as the trade-off between cost and diagnostic yield, detection of structural variants, and exploration of the non-coding genome are also highlighted.
EXPERT OPINION:In the current landscape of NGS-based diagnostics for HHI, there are both challenges (e.g. detection of structural variants and non-coding genome variants) and opportunities (e.g. the emergence of medical artificial intelligence tools). The authors advocate the use of technological advances such as long-read sequencing for structural variant detection, multi-omics analysis for non-coding variant exploration, and medical artificial intelligence for pathogenicity assessment and outcome prediction. By integrating these innovations into clinical practice, precision medicine in the diagnosis and management of HHI can be further improved.
non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-...
详细信息
non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-coding variants on TF binding, their predictive power lacks systematic evaluation. Here we have evaluated 14 different models built on position weight matrices (PWMs), support vector machines, ordinary least squares and deep neural networks (DNNs), using large-scale in vitro (i.e. SNP-SELEX) and in vivo (i.e. allele-specific binding, ASB) TF binding data. Our results show that the accuracy of each model in predicting SNP effects in vitro significantly exceeds that achieved in vivo. For in vitro variant impact prediction, kmer/gkm-based machine learning methods (deltaSVM_HT-SELEX, QBiC-Pred) trained on in vitro datasets exhibit the best performance. For in vivo ASB variant prediction, DNN-based multitask models (DeepSEA, Sei, Enformer) trained on the ChIP-seq dataset exhibit relatively superior performance. Among the PWM-based methods, tRap demonstrates better performance in both in vitro and in vivo evaluations. In addition, we find that TF classes such as basic leucine zipper factors could be predicted more accurately, whereas those such as C2H2 zinc finger factors are predicted less accurately, aligning with the evolutionary conservation of these TF classes. We also underscore the significance of non-sequence factors such as cis-regulatory element type, TF expression, interactions and post-translational modifications in influencing the in vivo predictive performance of TFs. Our research provides valuable insights into selecting prioritization methods for non-coding variants and further optimizing such models.
Regulatory DNA provides a platform for transcription factor binding to encode cell-type-specific patterns of gene expression. However, the effects and programmability of regulatory DNA sequences remain difficult to ma...
详细信息
暂无评论