At the end of 1996 we approximated the total number of protein coding ORFs in the Saccharomyces cerevisiae genome, based on their properties, as 4700-4800. The number is much smaller than the 5800 which is widely acce...
详细信息
At the end of 1996 we approximated the total number of protein coding ORFs in the Saccharomyces cerevisiae genome, based on their properties, as 4700-4800. The number is much smaller than the 5800 which is widely accepted. According to our calculations, there remain about 200-300 orphans-ORFs without known function or homology to already discovered genes, which is only about 5% of the total number of genes. Our results would be questionable if the analysed set of known genes was not a statistically representative sample of the whole set of protein coding genes in the S. cerevisiae genome. Therefore, we repeated our estimation using recently updated databases. In the course of the last 18 months, previously unknown functions of about 500 genes have been found. We have used these to check our method, former results and conclusions. Our previous estimation of the total number of coding ORFs was confirmed. Copyright (C) 1999 John Wiley & Sons, Ltd.
Background: The GENCODE consortium was formed to identify and map all protein- coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental v...
详细信息
Background: The GENCODE consortium was formed to identify and map all protein- coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results. Results: The GENCODE gene features are divided into eight different categories of which only the first two (known and novel coding sequence) are confidently predicted to be protein-coding genes. 5' rapid amplification of cDNA ends (RACE) and RT-PCR were used to experimentally verify the initial annotation. Of the 420 coding loci tested, 229 RACE products have been sequenced. They supported 5' extensions of 30 loci and new splice variants in 50 loci. In addition, 46 loci without evidence for a coding sequence were validated, consisting of 31 novel and 15 putative transcripts. We assessed the comprehensiveness of the GENCODE annotation by attempting to validate all the predicted exon boundaries outside the GENCODE annotation. Out of 1,215 tested in a subset of the ENCODE regions, 14 novel exon pairs were validated, only two of them in intergenic regions. Conclusions: In total, 487 loci, of which 434 are coding, have been annotated as part of the GENCODE reference set available from the UCSC browser. Comparison of GENCODE annotation with RefSeq and ENSEMBL show only 40% of GENCODE exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated. Over 50% of coding loci have been experimentally verified by 5' RACE for EGASP and the GENCODE collaboration is continuing to refine its annotation of 1% human genome with the aid of experimental validation.
In the human genome, it has been estimated that considerably more sequence is under natural selection in non-coding regions [such as transcription-factor binding sites (TF-binding sites) and non-coding RNAs (ncRNAs)] ...
详细信息
In the human genome, it has been estimated that considerably more sequence is under natural selection in non-coding regions [such as transcription-factor binding sites (TF-binding sites) and non-coding RNAs (ncRNAs)] compared to protein-coding ones. However, less attention has been paid to them. To study selective pressure on non-coding elements, we use next-generation sequencing data from the recently completed pilot phase of the 1000 Genomes Project, which, compared to traditional methods, allows for the characterization of a full spectrum of genomic variations, including single-nucleotide polymorphisms (SNPs), short insertions and deletions (indels) and structural variations (SVs). We develop a framework for combining these variation data with non-coding elements, calculating various population-based metrics to compare classes and subclasses of elements, and developing element-aware aggregation procedures to probe the internal structure of an element. Overall, we find that TF-binding sites and ncRNAs are less selectively constrained for SNPs than coding sequences (CDSs), but more constrained than a neutral reference. We also determine that the relative amounts of constraint for the three types of variations are, in general, correlated, but there are some differences: counter-intuitively, TF-binding sites and ncRNAs are more selectively constrained for indels than for SNPs, compared to CDSs. After inspecting the overall properties of a class of elements, we analyze selective pressure on subclasses within an element class, and show that the extent of selection is associated with the genomic properties of each subclass. We find, for instance, that ncRNAs with higher expression levels tend to be under stronger purifying selection, and the actual regions of TF-binding motifs are under stronger selective pressure than the corresponding peak regions. Further, we develop element-aware aggregation plots to analyze selective pressure across the linear structure of an eleme
The complete coding sequences were determined for RNA-1 and RNA-2 of five raspberry isolates of Raspberry bushy dwarf virus (RBDV) from Belarus (BY1, BY3, BY8, BY22) and Sweden (SE3). The analysed sequences for both R...
详细信息
The complete coding sequences were determined for RNA-1 and RNA-2 of five raspberry isolates of Raspberry bushy dwarf virus (RBDV) from Belarus (BY1, BY3, BY8, BY22) and Sweden (SE3). The analysed sequences for both RNA-1 and RNA-2 were highly conserved among these isolates. Phylogenetic analyses including available sequences for the CP gene and the MP gene showed that all analysed RBDV isolates from raspberry were closely related. However, there was no strong correlation between the grouping of raspberry isolates in the phylogenetic analyses and their geographical location. In contrast, RBDV isolates showed a host-dependent relationship with isolates from raspberry and grapevine, forming two distinct clades.
We recently reported that a sequence variant in the cell-cycle-checkpoint kinase CHEK2 (CHEK2 1100delC) is a low-penetrance breast cancer-susceptibility allele in noncarriers of BRCA1 or BRCA2 mutations. To investigat...
详细信息
We recently reported that a sequence variant in the cell-cycle-checkpoint kinase CHEK2 (CHEK2 1100delC) is a low-penetrance breast cancer-susceptibility allele in noncarriers of BRCA1 or BRCA2 mutations. To investigate whether other CHEK2 variants confer susceptibility to breast cancer, we screened the full CHEK2 coding sequence in BRCA1/2-negative breast cancer cases from 89 pedigrees with three or more cases of breast cancer. We identified one novel germline variant, R117G, in two separate families. To evaluate the possible association of R117G and two germline variants reported elsewhere, R145W and I157T with breast cancer, we screened 737 BRCA1/2-negative familial breast cancer cases from 605 families, 459 BRCA1/2-positive cases from 335 families, and 723 controls from the United Kingdom, the Netherlands, and North America. All three variants were rare in all groups, and none occurred at significantly elevated frequency in familial breast cancer cases compared with controls. These results indicate that 1100delC may be the only CHEK2 allele that makes an appreciable contribution to breast cancer susceptibility.
Genes of the vertebrate major histocompatibility complex (MHC) are crucial to defense against infectious disease, provide an important measure of functional genetic diversity, and have been implicated in mate choice a...
详细信息
Genes of the vertebrate major histocompatibility complex (MHC) are crucial to defense against infectious disease, provide an important measure of functional genetic diversity, and have been implicated in mate choice and kin recognition. As a result, MHC loci have been characterized for a number of vertebrate species, especially mammals;however, elephants are a notable exception. Our study is the first to characterize patterns of genetic diversity and natural selection in the elephant MHC. We did so using DNA sequences from a single, expressed DQA locus in elephants. We characterized six alleles in 30 African elephants (Loxodonta africana) and four alleles in three Asian elephants (Elephas maximus). In addition, for two of the African alleles and three of the Asian alleles, we characterized complete coding sequences (exons 1-5) and nearly complete non-coding sequences (introns 2-4) for the class II DQA loci. Compared to DQA in other wild mammals, we found moderate polymorphism and allelic diversity and similar patterns of selection;patterns of non-synonymous and synonymous substitutions were consistent with balancing selection acting on the peptides involved in antigen binding in the second exon. In addition, balancing selection has led to strong trans-species allelism that has maintained multiple allelic lineages across both genera of extant elephants for at least 6 million years. We discuss our results in the context of MHC diversity in other mammals and patterns of evolution in elephants.
The fragile X syndrome is due to a CGG triplet expansion in the first exon of FMR1, resulting in hypermethylation and extinction of gene expression. To further our understanding of the gene's involvement in the sy...
详细信息
The fragile X syndrome is due to a CGG triplet expansion in the first exon of FMR1, resulting in hypermethylation and extinction of gene expression. To further our understanding of the gene's involvement in the syndrome, we report the physical structure of this locus. A high resolution restriction map of the FRAX(A) locus has been prepared encompassing approximately 50 kb. Using exon - exon PCR and restriction analysis, the FMR1 gene has been determined to consist of 17 exons spanning 38 kb of Xq27.3. Each intron - exon boundary has been sequenced. In general, the splice donors and acceptors located in the 5' portion of the gene demonstrate greater adherence to consensus than those in the 3' end, providing a possible explanation for the finding of alternative splicing in FMR1. The elucidation of the exon composition of the FMR1 gene and its flanking region will enhance detection of coding sequence mutations possible in fragile X phenocopy individuals.
Eighteen cytidines are changed to uridines in the coding sequence of transcripts for cytochrome c oxidase subunit 2 (cox2) in maize mitochondria. The temporal relationship of editing and splicing was examined in cox2 ...
详细信息
Eighteen cytidines are changed to uridines in the coding sequence of transcripts for cytochrome c oxidase subunit 2 (cox2) in maize mitochondria. The temporal relationship of editing and splicing was examined in cox2 transcripts by sequence analysis of spliced and unspliced cDNAs. Cloned cDNAs of unspliced cox2 transcripts ranged from clones with no edited nucleotides to completely edited forms, while spliced cDNAs were nearly completely edited. Incompletely edited transcripts in the nascent pool of unspliced transcripts represent intermediates of the editing process. These results indicate that editing proceeds without a strong directional bias and suggest that RNA editing is a posttranscriptional process.
A candidate gene approach to identifying novel causes of disease is concept-limiting and in the new era of high throughput sequencing there is now no need to restrict the experiment to a few interesting genes. We have...
详细信息
A candidate gene approach to identifying novel causes of disease is concept-limiting and in the new era of high throughput sequencing there is now no need to restrict the experiment to a few interesting genes. We have recently completed a large-scale exon re-sequencing project using Sanger sequencing technology to analyse approximately 1 Mb of coding sequence of the X chromosome in probands from > 200 families with various forms of intellectual disability. We review the lessons learnt from this experience. Comparing large data sets will certainly reveal pathogenic mutations in genes that were not possible to identify previously. However, the task of distinguishing pathogenic mutations from rare sequence variants is not easy and is the most substantial challenge to the next decade. High-throughput technology has the attraction of being cheap, fast and comprehensive but for projects that require detailed coverage of a genomic region at an exhaustive level they may require a combination of large-scale with a small-scale follow-up of difficult regions to sequence. The number of rare truncating variants present in coding regions of the X chromosome that are not pathogenic was 1%. The importance of the quality of the starting material both clinically and molecularly and the number of sequence variants both rare and common that any one individual has across their coding sequence is discussed.
Loss of elastin due to aging, disease, or injury can lead to impaired tissue function. In this study, de novo tropoelastin (TE) synthesis is investigated in vitro and in vivo using different TE-encoding synthetic mRNA...
详细信息
Loss of elastin due to aging, disease, or injury can lead to impaired tissue function. In this study, de novo tropoelastin (TE) synthesis is investigated in vitro and in vivo using different TE-encoding synthetic mRNA variants after codon optimization and nucleotide modification. Codon optimization shows a strong effect on protein synthesis without affecting cell viability in vitro, whereas nucleotide modifica-tions strongly modulate translation and reduce cell toxicity. Selected TE mRNA variants (3, 10, and 30 & mu;g) are then analyzed in vivo in porcine skin after intradermal application. Administration of 30 & mu;g of native TE mRNA with a me1 111 modification or 10 and 30 & mu;g of unmodified codon-optimized TE mRNA is required to increase TE protein expression in vivo. In contrast, just 3 & mu;g of a codon-optimized TE mRNA variant with the me1 111 modification is able to increase protein expression. Furthermore, skin toxicity is investigated in vitro by injecting 30 & mu;g of mRNA of selected TE mRNA variants into a human full-thickness skin model, and no toxic effects are observed. Thereby, for the first time, an increased dermal TE synthesis by exogenous administration of synthetic mRNA is demonstrated in vivo. Codon optimization of a synthetic mRNA can significantly increase protein expression and therapeutic outcome.
暂无评论