To understand whether any human-specific new genes may be associated with human brain functions, we computationally screened the genetic vulnerable factors identified through Genome-Wide Association Studies and linkag...
详细信息
To understand whether any human-specific new genes may be associated with human brain functions, we computationally screened the genetic vulnerable factors identified through Genome-Wide Association Studies and linkage analyses of nicotine addiction and found one human-specific de novo protein-coding gene, FLJ33706 (alternative gene symbol C20orf203). Cross-species analysis revealed interesting evolutionary paths of how this gene had originated from noncoding DNA sequences: insertion of repeat elements especially Alu contributed to the formation of the first coding exon and six standard splice junctions on the branch leading to humans and chimpanzees, and two subsequent substitutions in the human lineage escaped two stop codons and created an open reading frame of 194 amino acids. We experimentally verified FLJ33706's mRNA and protein expression in the brain. Real-Time PCR in multiple tissues demonstrated that FLJ33706 was most abundantly expressed in brain. Human polymorphism data suggested that FLJ33706 encodes a protein under purifying selection. A specifically designed antibody detected its protein expression across human cortex, cerebellum and midbrain. Immunohistochemistry study in normal human brain cortex revealed the localization of FLJ33706 protein in neurons. Elevated expressions of FLJ33706 were detected in Alzheimer's brain samples, suggesting the role of this novel gene in human-specific pathogenesis of Alzheimer's disease. FLJ33706 provided the strongest evidence so far that human-specific de novo genes can have protein-coding potential and differential protein expression, and be involved in human brain functions.
Exonic splicing enhancers (ESEs) are pre-mRNA cis-acting elements required for splice-site recognition. We previously developed a web-based program called ESEfinder that scores any sequence for the presence of ESE mot...
详细信息
Exonic splicing enhancers (ESEs) are pre-mRNA cis-acting elements required for splice-site recognition. We previously developed a web-based program called ESEfinder that scores any sequence for the presence of ESE motifs recognized by the human SR proteins SF2/ASF, SRp40, SRp55 and SC35 (http://***/tools/ESE/). Using ESEfinder, we have undertaken a large-scale analysis of ESE motif distribution in human protein-coding genes. Significantly higher frequencies of ESE motifs were observed in constitutive internal protein-coding exons, compared with both their flanking intronic regions and with pseudo exons. Statistical analysis of ESE motif frequency distributions revealed a complex relationship between splice-site strength and increased or decreased frequencies of particular SR protein motifs. Comparison of constitutively and alternatively spliced exons demonstrated slightly weaker splice-site scores, as well as significantly fewer ESE motifs, in the alternatively spliced group. Our results underline the importance of ESE-mediated SR protein function in the process of exon definition, in the context of both constitutive splicing and regulated alternative splicing.
The Antennapedia (Antp) homeotic gene of Drosophila melanogaster regulates segmental identity in the thorax. Loss of Antp function results in altered development of the embryonic thoracic segments or can cause legs to...
详细信息
The Antennapedia (Antp) homeotic gene of Drosophila melanogaster regulates segmental identity in the thorax. Loss of Antp function results in altered development of the embryonic thoracic segments or can cause legs to be transformed into antennae. Certain combinations of Antp recessive lethal alleles complement to permit normal development. The structure of the Antp gene, analyzed by sequencing cDNA clones and exons and by transcript mapping, revealed some of the basis for its genetic complexity. It has two promoters governing two nested transcription units, one unit 36 and one 103 kilobase pairs (kb) long. Both units incorporated the same protein-coding exons, all of which are located in the 3''-most 13 kb of the gene. The two promoters resulted in the attachment of either of two long noncoding leader sequences (1.5 and 1.7 kb) to a 1.1-kb open reading frame. Both transcription units used the same pair of alternative polyadenylate sites 1.4 kb apart;the choice of sites was developmentally regulated. Some of the mutations that disrupt the larger transcription unit complemented a mutation affecting the smaller one. Dominant mutations that transform antennae into legs split the gene but left the coding exons intact. The encoded protein has unusually long runs of glutamine and a homeodomain near the C terminus.
Intron-encoded U17a and U17b RNAs are members of the H/ACA-box class of small nucleolar RNAs (snoRNAs) participating in rRNA processing and modification. We have investigated the organization and expression of the U17...
详细信息
Intron-encoded U17a and U17b RNAs are members of the H/ACA-box class of small nucleolar RNAs (snoRNAs) participating in rRNA processing and modification. We have investigated the organization and expression of the U17 locus in human cells and found that intronic U17a and U17b sequences are transcribed as part of the three exon transcription unit, named U17HG, positioned approximately 9 kb upstream of the RCC1 locus. Comparison of the human and mouse U17HG genes has revealed that snoRNA-encoding intron sequences but not exon sequences are conserved between the two species and that neither human nor mouse spliced U17HG poly(A)(+) RNAs have the potential to code for proteins. Analyses of polysome profiles and effects of translation inhibitors on the abundance of U17HG RNA in HeLa cells indicated that despite its cytoplasmic localization, little if any U17HG RNA is associated with polysomes. This distinguishes U17HG RNA from another non-protein-coding snoRNA host gene product, UHG RNA, described previously (K, T, Tycowski, M, D, Shu, and J, A. Steitz, Nature 379:464-466, 1996), Determination of the 5' terminus of the U17HG RNA revealed that transcription of the U17HG gene starts with a C residue followed by a polypyrimidine tract, making this gene a member of the 5'-terminal oligopyrimidine (5'TOP) family, which includes genes encoding ribosomal proteins and some translation factors. Interestingly, other known snoRNA host genes, including the UHG gene (Tycowski et al., op, cit.), have features of the 5'TOP genes. Similar characteristics of the transcription start site regions in snoRNA host and ribosomal protein genes raise the possibility that expression of components of ribosome biogenesis and translational machineries is coregulated.
Concerted evolution of multicopy gene families in vertebrates is recognized as an important force in the generation of biological novelty but has not been documented for the multicopy genes of protozoa, A multicopy lo...
详细信息
Concerted evolution of multicopy gene families in vertebrates is recognized as an important force in the generation of biological novelty but has not been documented for the multicopy genes of protozoa, A multicopy locus, Tpr, which consists of tandemly arrayed open reading frames (ORFs) containing several repeated elements has been described for Theileria parva, Herein we show that probes derived from the 5'/N-terminal ends of ORFs in the genomic DNAs of T. parva Uganda (1,108 codons) and Boleni (699 codons) hybridized with multicopy sequences in homologous DNA but did not detect similar sequences in the DNA of 14 heterologous T. parva stocks and clones, The probe sequences were, however, proteincoding according to predictive algorithms and codon usage, The 3'/C-terminal ends of the Uganda and Boleni ORFs exhibited 75% similarity and identity, respectively, to the previously identified Tpr1 and Tpi2 repetitive elements of T. parva Muguga, Tpr1-homologous sequences were detected in two additional species of Theileria. Eight different Tpr1-homol ogous transcripts were present in piroplasm mRNA from a single T. parva Muguga-infected animal, The Tpr1 and Tpr2 amino acid sequences contained six predicted membrane-associated segments, The ratio of synonymous to nonsynonymous substitutions indicates that Tpr1 evolves like protein-encoding DNA, The previously determined nucleotide sequence of the gene encoding the p67 antigen is completely identical in T. parva Muguga, Boleni, and Uganda, including the third base in codons, The data suggest that concerted evolution can lead to the radical divergence of coding sequences and that this can be a mechanism for the generation of novel genes.
Among progeny of a hybrid (Rana shqiperica X R. lessonae) X R lessonae, 14 of 22 loci form four linkage groups (LGs): (1) mitochondrial aspartate aminotransferase, carbonate dehydratase-2, esterase 4, peptidase D;(2) ...
详细信息
Among progeny of a hybrid (Rana shqiperica X R. lessonae) X R lessonae, 14 of 22 loci form four linkage groups (LGs): (1) mitochondrial aspartate aminotransferase, carbonate dehydratase-2, esterase 4, peptidase D;(2) mannosephosphate isomerase, lactate dehydrogenase-B, sex, hexokinase-1, peptidase B;(3) albumin, fructose-biphosphatnse-1, guanine deaminase;(4) mitochondrial superoxide dismutase, cytosolic malic enzyme xanthine oxidase. Fructose-biphosphate aldolase-2 and cytosolic aspartate aminotransferase possibly form a fifth LG. Mitochondrial aconitate hydratase, alpha-glucosidase, glyceraldehyde-3-phosphate dehydrogenase, phosphogluconate dehydrogenase, and phosphoglucomutase-2 are unlinked to other loci. All testable linkages (among eight loci of LGs 1, 2, 3, and 4) are shared with eastern Palearctic water frogs. Including published data, 44 protein loci can be assigned to 10 of the 13 chromosomes in Holarctic Rana. Of testable pairs among 18 protein loci, agreement between Palearctic and Nearctic Rana is complete (125 unlinked, 14 linked pairs among 14 loci of five syntenies), and Holarctic Rana and Xenopus laevis are highly concordant (125 shared nonlinkages, 13 shared linkages, three differences). Several Rana syntenies occur in mammals and fish. Many syntenies apparently have persisted for 60-140 x 10(6) years (frogs), some even for 350-400 x 10(6) years (mammals and teleosts).
OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag ( EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based ...
详细信息
OrfPredictor is a web server designed for identifying protein-coding regions in expressed sequence tag ( EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. OrfPredictor facilitates the annotation of EST-derived sequences, particularly, for large-scale EST projects. OrfPredictor is available at https://***/tools/***.
In RiboNucleic Acid (RNA) biology, it's been understood that cataloging the diversity of RNA transcript species in human cells is important before we can know the functions that these RNA molecules perform in the ...
详细信息
ISBN:
(纸本)9781538660164
In RiboNucleic Acid (RNA) biology, it's been understood that cataloging the diversity of RNA transcript species in human cells is important before we can know the functions that these RNA molecules perform in the human body. While some RNA transcripts are protein-coding, others are non-coding transcripts. All of these RNA transcripts perform some functions in the human cells and therefore contribute to a number of biological processes in the body. Discriminating protein-coding RNA is fundamental to determining the genes encoded by these transcripts but existing computational tools that solve this task do not generalize as much. We therefore use a computational approach to more accurately predict protein-coding regions in these transcripts and with a good generalization performance. In this work, we propose a new application for our prediction technique that allows an alignment-independent discrimination of the protein-coding regions of RNA transcripts for an elderly population. Following an accurate computational discrimination of these transcriptome regions, we can then perform experiments in the laboratory in order to precisely find out which regions encode specific genes responsible for phenotypes like ageing in elderly people. Insights obtained from this work have the potential of opening the way to identification of novel biological roles for RNA-Seq data thereby making this computational approach to lead to an improvement in transcriptome analysis.
Drosophila 3? Utrs Are More Complex Than protein-coding Sequences. by Algama, Manjula; Oldmeadow, Christopher; Tasker, Edward; Mengersen, Kerrie; Keith, Jonathan M.; published by
Drosophila 3? Utrs Are More Complex Than protein-coding Sequences. by Algama, Manjula; Oldmeadow, Christopher; Tasker, Edward; Mengersen, Kerrie; Keith, Jonathan M.; published by
Background: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable o...
详细信息
Background: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but often do not report back DNA sequences required for subsequent phylogenetic analysis. Amongst those that do, the Genewise algorithm is the most effective. However, it requires a homology wrapper to be used in this way, and here we demonstrate it perfectly corrects frame-shifts only 60 % of the time. Results: We therefore created AlignWise, a tool that combines Genewise with our own homology-based method, AlignFS, to identify protein-coding regions and correct erroneous frame-shifts, suitable for subsequent phylogenetic analysis. We compared AlignWise against other open reading frame finding software and demonstrate that the AlignFS algorithm is more accurate than Genewise at correcting frame-shifts within an order. We show that AlignWise provides the greatest accuracy at higher evolutionary distances, out-performing both AlignFS and Genewise individually. Conclusions: AlignWise produces a single ORF per transcript and identifies and corrects frame-shifts with high accuracy. It is therefore well suited for analysing novel transcriptome assemblies and EST sequences in the absence of a reference genome.
暂无评论