Background Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. variants shared between species in the state of identity-by-descent, here...
详细信息
Background Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. variants shared between species in the state of identity-by-descent, hereafter "trans-species polymorphisms", can result from LTBS, often due to host-pathogen interactions. For instance, the major histocompatibility complex (MHC) locus contains TSPs present across primates. Several hundred candidate LTBS regions have been identified in humans and chimpanzees;however, because many are in non-protein-coding regions of the genome, the functions and potential adaptive roles for most remain unknown. Results We integrated diverse genomic annotations to explore the functions of 60 previously identified regions with multiple shared polymorphisms (SPs) between humans and chimpanzees, including 19 with strong evidence of LTBS. We analyzed genome-wide functional assays, expression quantitative trait loci (eQTL), genome-wide association studies (GWAS), and phenome-wide association studies (PheWAS) for all the regions. We identify functional annotations for 59 regions, including 58 with evidence of gene regulatory function from GTEx or functional genomics data and 19 with evidence of trait association from GWAS or PheWAS. As expected, the SPs associate in humans with many immune system phenotypes, including response to pathogens, but we also find associations with a range of other phenotypes, including body size, alcohol intake, cognitive performance, risk-taking behavior, and urate levels. Conclusions The diversity of traits associated with non-coding regions with multiple SPs support previous hypotheses that functions beyond the immune system are likely subject to LTBS. Furthermore, several of these trait associations provide support and candidate genetic loci for previous hypothesis about behavioral diversity in human and chimpanzee populations, such as the importance of variation in risk sensitivity.
non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-...
详细信息
non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-coding variants on TF binding, their predictive power lacks systematic evaluation. Here we have evaluated 14 different models built on position weight matrices (PWMs), support vector machines, ordinary least squares and deep neural networks (DNNs), using large-scale in vitro (i.e. SNP-SELEX) and in vivo (i.e. allele-specific binding, ASB) TF binding data. Our results show that the accuracy of each model in predicting SNP effects in vitro significantly exceeds that achieved in vivo. For in vitro variant impact prediction, kmer/gkm-based machine learning methods (deltaSVM_HT-SELEX, QBiC-Pred) trained on in vitro datasets exhibit the best performance. For in vivo ASB variant prediction, DNN-based multitask models (DeepSEA, Sei, Enformer) trained on the ChIP-seq dataset exhibit relatively superior performance. Among the PWM-based methods, tRap demonstrates better performance in both in vitro and in vivo evaluations. In addition, we find that TF classes such as basic leucine zipper factors could be predicted more accurately, whereas those such as C2H2 zinc finger factors are predicted less accurately, aligning with the evolutionary conservation of these TF classes. We also underscore the significance of non-sequence factors such as cis-regulatory element type, TF expression, interactions and post-translational modifications in influencing the in vivo predictive performance of TFs. Our research provides valuable insights into selecting prioritization methods for non-coding variants and further optimizing such models.
Alzheimer's disease (AD) is the most common type of dementia, affecting millions of people worldwide;however, no disease-modifying treatments are currently available. Genome-wide association studies (GWASs) have i...
详细信息
Alzheimer's disease (AD) is the most common type of dementia, affecting millions of people worldwide;however, no disease-modifying treatments are currently available. Genome-wide association studies (GWASs) have identified more than 40 loci associated with AD risk. However, most of the disease-associated variants reside in non-coding regions of the genome, making it difficult to elucidate how they affect disease susceptibility. nonetheless, identification of the regulatory elements, genes, pathways and cell type/tissue(s) impacted by these variants to modulate AD risk is critical to our understanding of disease pathogenesis and ability to develop effective therapeutics. In this review, we provide an overview of the methods and approaches used in the field to identify the functional effects of AD risk variants in the causal path to disease risk modification as well as describe the most recent findings. We first discuss efforts in cell type/tissue prioritization followed by recent progress in candidate causal variant and gene nomination. We discuss statistical methods for fine-mapping as well as approaches that integrate multiple levels of evidence, such as epigenomic and transcriptomic data, to identify causal variants and risk mechanisms of AD-associated loci. Additionally, we discuss experimental approaches and data resources that will be needed to validate and further elucidate the effects of these variants and genes on biological pathways, cellular phenotypes and disease risk. Finally, we discuss future steps that need to be taken to ensure that AD GWAS functional mapping efforts lead to novel findings and bring us closer to finding effective treatments for this devastating disease.
Many common diseases are characterized by polygenic architectures in which a single variant has only a small effect on ***-wide association studies and next generation sequencing have identified thousands of genetic v...
详细信息
Many common diseases are characterized by polygenic architectures in which a single variant has only a small effect on ***-wide association studies and next generation sequencing have identified thousands of genetic variants of disease ***,non-coding variants identified by genome-wide association studies have been systematically ***,we review disease-causing codingvariants and their relevance to clinical medicine.
Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We per-formed...
详细信息
Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We per-formed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.
The identification of non-coding drivers remains a challenge and bottleneck for the use of whole-genome sequencing in the clinic. FunSeq2 is a computational tool for annotation and prioritization of somatic mutations ...
详细信息
作者:
Stranger, B. E.Section of Genetic Medicine
Department of Medicine Institute of Genomics and Systems Biology Center for Data Intensive Sciences University of Chicago IL
Complex trait association mapping in humans has successfully identified genetic loci influencing trait variation for hundreds of different phenotypes, including disease. The vast majority of associated loci localize t...
详细信息
Complex trait association mapping in humans has successfully identified genetic loci influencing trait variation for hundreds of different phenotypes, including disease. The vast majority of associated loci localize to non-coding regions of the genome, suggesting possible effects on gene regulatory mechanisms. Without a clear understanding of the regulatory code of the human genome, deep characterization of the molecular function(s) of genetic variants in the human genome has become increasingly important for defining that code and for understanding genetic associations to complex traits. Studies of the human transcriptome, its complexity, and its relation to genetic variation in a variety of contexts have proven highly informative for understanding genome function and for suggesting testable hypotheses involving candidate genes for complex traits and the functional mechanisms though which they may act. These approaches are increasingly leading to successful functional characterization of trait-associated variants, in some cases, suggesting possible targets for trait manipulation. Finally, these characterizations are being used to build models predicting variant function, further extending possible applications.
暂无评论