Background Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. variants shared between species in the state of identity-by-descent, here...
详细信息
Background Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. variants shared between species in the state of identity-by-descent, hereafter "trans-species polymorphisms", can result from LTBS, often due to host-pathogen interactions. For instance, the major histocompatibility complex (MHC) locus contains TSPs present across primates. Several hundred candidate LTBS regions have been identified in humans and chimpanzees;however, because many are in non-protein-coding regions of the genome, the functions and potential adaptive roles for most remain unknown. Results We integrated diverse genomic annotations to explore the functions of 60 previously identified regions with multiple shared polymorphisms (SPs) between humans and chimpanzees, including 19 with strong evidence of LTBS. We analyzed genome-wide functional assays, expression quantitative trait loci (eQTL), genome-wide association studies (GWAS), and phenome-wide association studies (PheWAS) for all the regions. We identify functional annotations for 59 regions, including 58 with evidence of gene regulatory function from GTEx or functional genomics data and 19 with evidence of trait association from GWAS or PheWAS. As expected, the SPs associate in humans with many immune system phenotypes, including response to pathogens, but we also find associations with a range of other phenotypes, including body size, alcohol intake, cognitive performance, risk-taking behavior, and urate levels. Conclusions The diversity of traits associated with non-coding regions with multiple SPs support previous hypotheses that functions beyond the immune system are likely subject to LTBS. Furthermore, several of these trait associations provide support and candidate genetic loci for previous hypothesis about behavioral diversity in human and chimpanzee populations, such as the importance of variation in risk sensitivity.
Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We per-formed...
详细信息
Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We per-formed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.
Pathogenic variants in the coding regions of the BRCA1/2 lead dysfunctional or nonfunctional BRCA proteins however the contribution of non-coding BRCA1/2 variants to BRCA-related disease risk has not been fully elucid...
详细信息
Pathogenic variants in the coding regions of the BRCA1/2 lead dysfunctional or nonfunctional BRCA proteins however the contribution of non-coding BRCA1/2 variants to BRCA-related disease risk has not been fully elucidated. Thus, we characterized the functional impact of both coding and non-coding BRCA1/2 variants identified in individuals with personal and/or family history of BRCA-related cancers. The data were produced by resequencing the exons and exon-intron junctions of the BRCA1/2 in 125 individuals and were comprehensively analyzed by using bioinformatics tools and databases. A total of 96 variants (59 coding and 37 non-coding) including 7 novel variants were identified and analyzed for their functional importance. We identified 11 missense variants that potentially affect protein function;22 variants were likely to alter different types of posttranslational modifications. Also, multiple non-coding BRCA1/2 variants were found to reside in the critical regulatory regions that have the potential to act as eQTLs and affect alternative splicing. The results of our study shed light on the possible contributions of not only codingvariants but also non-coding BRCA1/2 variants in BRCRA-related cancers. Further investigation is required to fully understand their potential associations with phenotypes which may ultimately lead their utilization on cancer management as a biomarker. (c) 2020 Elsevier B.V. All rights reserved.
Background:Common and rare variants of guanosine triphosphate cyclohydrolase 1(GCH1)gene may play important roles in Parkinson's disease(PD).However,there is a lack of comprehensive analysis of GCH1 genotypes,espe...
详细信息
Background:Common and rare variants of guanosine triphosphate cyclohydrolase 1(GCH1)gene may play important roles in Parkinson's disease(PD).However,there is a lack of comprehensive analysis of GCH1 genotypes,especially in non-coding *** aim of this study was to explore the genetic characteristics of GCH1,including rare and common variants in coding and non-coding regions,in a large population of PD patients in Chinese mainland,as well as the phenotypic characteristics of GCH1 variant ***:In the first cohort of this case-control study,we performed whole-exome sequencing in 1555 patients with early-onset or familial PD and 2234 healthy controls;then in the second cohort,whole-genome sequencing was performed in sporadic late-onset PD samples(1962 patients),as well as 1279 *** at target GCH1 regions were extracted,and then genetic and detailed phenotypic data were analyzed using regression models and the sequence kernel association *** also performed a meta-analysis to correlate deleterious GCH1 variants with age at onset(AAO)in PD ***:For codingvariants,we identified a significant burden of GCH1 deleterious variants in early-onset or familial PD cases compared to controls(1.2%VS 0.1%,P<0.0001).In the analysis of possible regulatory variants in GCH1 non-coding regions,rs12323905(P=0.001,odds ratio=1.19,95%CI 1.07-1.32)was significantly associated with PD,and variant sets in untranslated regions and intron regions,GCH1 brain-specific expression quantitative trait loci,and two possible promoter/enhancer(GH14J054857 and GH14J054880)were suggestively associated with *** phenotype correlation analysis revealed that the carriers of GCH1 deleterious variants manifested younger AAO(P<0.0001),and had milder motor symptoms,milder fatigue symptoms and more autonomic nervous ***-analysis of six studies demonstrated 6.4-year earlier onset in GCH1 deleterious variant carriers(P=0.0009).Conclusions:The results
Background Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational too...
详细信息
Background Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data. Results We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation. Conclusions The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases.
Alzheimer's disease (AD) is the most common type of dementia, affecting millions of people worldwide;however, no disease-modifying treatments are currently available. Genome-wide association studies (GWASs) have i...
详细信息
Alzheimer's disease (AD) is the most common type of dementia, affecting millions of people worldwide;however, no disease-modifying treatments are currently available. Genome-wide association studies (GWASs) have identified more than 40 loci associated with AD risk. However, most of the disease-associated variants reside in non-coding regions of the genome, making it difficult to elucidate how they affect disease susceptibility. nonetheless, identification of the regulatory elements, genes, pathways and cell type/tissue(s) impacted by these variants to modulate AD risk is critical to our understanding of disease pathogenesis and ability to develop effective therapeutics. In this review, we provide an overview of the methods and approaches used in the field to identify the functional effects of AD risk variants in the causal path to disease risk modification as well as describe the most recent findings. We first discuss efforts in cell type/tissue prioritization followed by recent progress in candidate causal variant and gene nomination. We discuss statistical methods for fine-mapping as well as approaches that integrate multiple levels of evidence, such as epigenomic and transcriptomic data, to identify causal variants and risk mechanisms of AD-associated loci. Additionally, we discuss experimental approaches and data resources that will be needed to validate and further elucidate the effects of these variants and genes on biological pathways, cellular phenotypes and disease risk. Finally, we discuss future steps that need to be taken to ensure that AD GWAS functional mapping efforts lead to novel findings and bring us closer to finding effective treatments for this devastating disease.
Background: The clinical genetics revolution ushers in great opportunities, accompanied by significant challenges. The fundamental mission in clinical genetics is to analyze genomes, and to identify the most relevant ...
详细信息
Background: The clinical genetics revolution ushers in great opportunities, accompanied by significant challenges. The fundamental mission in clinical genetics is to analyze genomes, and to identify the most relevant genetic variations underlying a patient's phenotypes and symptoms. The adoption of Whole Genome Sequencing requires novel capacities for interpretation of non-coding variants. Results: We present TGex, the Translational Genomics expert, a novel genome variation analysis and interpretation platform, with remarkable exome analysis capacities and a pioneering approach of non-coding variants interpretation. TGex's main strength is combining state-of-the-art variant filtering with knowledge-driven analysis made possible by VarElect, our highly effective gene-phenotype interpretation tool. VarElect leverages the widely used GeneCards knowledgebase, which integrates information from > 150 automatically-mined data sources. Access to such a comprehensive data compendium also facilitates TGex's broad variant annotation, supporting evidence exploration, and decision making. TGex has an interactive, user-friendly, and easy adaptive interface, ACMG compliance, and an automated reporting system. Beyond comprehensive whole exome sequence capabilities, TGex encompasses innovative non-coding variants interpretation, towards the goal of maximal exploitation of whole genome sequence analyses in the clinical genetics practice. This is enabled by GeneCards' recently developed GeneHancer, a novel integrative and fully annotated database of human enhancers and promoters. Examining use-cases from a variety of TGex users world-wide, we demonstrate its high diagnostic yields (42% for single exome and 50% for trios in 1500 rare genetic disease cases) and critical actionable genetic findings. The platform's support for integration with EHR and LIMS through dedicated APIs facilitates automated retrieval of patient data for TGex's customizable reporting engine, establishing a rapid
Background Genome-wide association studies (GWASs) have identified single-nucleotide polymorphisms (SNPs) that may be genetic factors underlying Alzheimer's disease (AD). However, how these AD-associated SNPs (AD ...
详细信息
Background Genome-wide association studies (GWASs) have identified single-nucleotide polymorphisms (SNPs) that may be genetic factors underlying Alzheimer's disease (AD). However, how these AD-associated SNPs (AD SNPs) contribute to the pathogenesis of this disease is poorly understood because most of them are located in non-coding regions, such as introns and intergenic regions. Previous studies reported that some disease-associated SNPs affect regulatory elements including enhancers. We hypothesized that non-coding AD SNPs are located in enhancers and affect gene expression levels via chromatin loops. Methods To characterize AD SNPs within non-coding regions, we extracted 406 AD SNPs with GWAS p-values of less than 1.00 x 10(- 6) from the GWAS catalog database. Of these, we selected 392 SNPs within non-coding regions. Next, we checked whether those non-coding AD SNPs were located in enhancers that typically regulate gene expression levels using publicly available data for enhancers that were predicted in 127 human tissues or cell types. We sought expression quantitative trait locus (eQTL) genes affected by non-coding AD SNPs within enhancers because enhancers are regulatory elements that influence the gene expression levels. To elucidate how the non-coding AD SNPs within enhancers affect the gene expression levels, we identified chromatin-chromatin interactions by Hi-C experiments. Results We report the following findings: (1) nearly 30% of non-coding AD SNPs are located in enhancers;(2) eQTL genes affected by non-coding AD SNPs within enhancers are associated with amyloid beta clearance, synaptic transmission, and immune responses;(3) 95% of the AD SNPs located in enhancers co-localize with their eQTL genes in topologically associating domains suggesting that regulation may occur through chromatin higher-order structures;(4) rs1476679 spatially contacts the promoters of eQTL genes via CTCF-CTCF interactions;(5) the effect of other AD SNPs such as rs7364180 is lik
BackgroundForty-two percent of patients experience disease comorbidity, contributing substantially to mortality rates and increased healthcare costs. Yet, the possibility of underlying shared mechanisms for diseases r...
详细信息
BackgroundForty-two percent of patients experience disease comorbidity, contributing substantially to mortality rates and increased healthcare costs. Yet, the possibility of underlying shared mechanisms for diseases remains not well established, and few studies have confirmed their molecular predictions with clinical *** this work, we integrated genome-wide association study (GWAS) associating diseases and single nucleotide polymorphisms (SNPs) with transcript regulatory activity from expression quantitative trait loci (eQTL). This allowed novel mechanistic insights for noncoding and intergenic regions. We then analyzed pairs of SNPs across diseases to identify shared molecular effectors robust to multiple test correction (False Discovery Rate FDReRNA<0.05). We hypothesized that disease pairs found to be molecularly convergent would also be significantly overrepresented among comorbidities in clinical datasets. To assess our hypothesis, we used clinical claims datasets from the Healthcare Cost and Utilization Project (HCUP) and calculated significant disease comorbidities (FDRcomorbidity<0.05). We finally verified if disease pairs resulting molecularly convergent were also statistically comorbid more than by chance using the Fisher's Exact *** approach integrates: (i) 6175 SNPs associated with 238 diseases from similar to 1000 GWAS, (ii) eQTL associations from 19 tissues, and (iii) claims data for 35 million patients from HCUP. Logistic regression (controlled for age, gender, and race) identified comorbidities in HCUP, while enrichment analyses identified cis- and trans-eQTL downstream effectors of GWAS-identified variants. Among similar to 16,000 combinations of diseases, 398 disease-pairs were prioritized by both convergent eQTL-genetics (RNA overlap enrichment, FDReRNA<0.05) and clinical comorbidities (OR>1.5, FDRcomorbidity<0.05). Case studies of comorbidities illustrate specific convergent noncoding regulatory elements. An intergenic
Background. Alzheimer's disease (AD), the most common form of dementia affects 24.3 million people worldwide. More than twenty genetic loci have been associated with AD and a significant number of genetic variants...
详细信息
Background. Alzheimer's disease (AD), the most common form of dementia affects 24.3 million people worldwide. More than twenty genetic loci have been associated with AD and a significant number of genetic variants were mapped within these loci. A large proportion of genome wide significant variants lie outside the coding region. However, the plausible function of these variants is still unexplored. Objective: The present study aimed to unravel the regulatory role of proxy single nucleotide polymorphisms (SNPs), to determine their risk of developing AD. Methods: The RegulomeDB was employed to predict the regulatory role of proxy SNPs. Protein association network and functional enrichment analysis was performed using String10.5 and gene ontology, respectively. Results: A total of 451 SNPs were examined through SNAP web portal (r(2) <= 0.80) which returned 2186 proxy SNPs in linkage disequilibrium (LD) with genome wide significant SNPs for AD. Out of 2186 SNPs analyzed in RegulomeDB, 151 had the scores < 3 that indicates the high degree of their potential regulatory function. Further analysis revealed that out of these 151 SNPs, 37 were genome wide significant for AD, 17 were significantly associated with diseases other than AD, 89 were proxy SNPs (not genome wide significant) for various diseases including AD while 8 SNPs were novel proxy SNPs for AD. Conclusion: These findings support the notion that the non-coding variants can be strongly associated with disease risk. Further validation through genome wide association studies will be helpful for the elucidation of their regulatory potential.\
暂无评论