We have developed a new method for the prediction of the protein secondary structure from the amino acid sequence. The method is based on the most recent version (IV) of the standard gor (J Mol Biol 120 (1978) 97) alg...
详细信息
We have developed a new method for the prediction of the protein secondary structure from the amino acid sequence. The method is based on the most recent version (IV) of the standard gor (J Mol Biol 120 (1978) 97) algorithm. A significant improvement is obtained by combining multiple sequence alignments with the gor method. Additional improvement in the predictions is obtained by a simple correction of the results when helices or sheets are too short, or if helices and sheets are direct neighbors along the sequence (we require at least one residue of coil state between them). The imposition of the requirement that the prediction must be strong enough, i.e. that the difference between the probability of the predicted (most probable) state and the probability of the second most probable state must be larger than a certain minimum value also improves significantly secondary structure predictions. We have tested our method on 12 different proteins from the Protein Data Bank with known secondary structures. The average quality of the gor prediction of the secondary structure for these 12 proteins without multiple sequence alignment was 63.4%. The multiple sequence alignments improve the average prediction to 71.9%. The correction for short helices and sheets and coil states separating sheets and helices improve further the average prediction to 74.4%. Setting the 10% minimum difference between the most probable and the second probable conformation leads to 77.0% accuracy of the prediction, while increasing this limit to 20% increases the average accuracy of the secondary structure prediction to 81.2%. (C) 2001 Elsevier Science Ltd. All rights reserved.
We have modified and improved the gor algorithm for the protein secondary structure prediction by using the evolutionary information provided by multiple sequence alignments, adding triplet statistics, and optimizing ...
详细信息
We have modified and improved the gor algorithm for the protein secondary structure prediction by using the evolutionary information provided by multiple sequence alignments, adding triplet statistics, and optimizing various parameters. We have expanded the database used to include the 513 non-redundant domains collected recently by Cuff and Barton (Proteins 1999;34:508519;Proteins 2000;40:502-511). We have introduced a variable size window that allowed us to include sequences as short as 20-30 residues. A significant improvement over the previous versions of gor algorithm was obtained by combining the PSI-BLAST multiple sequence alignments with the gor method. The new algorithm will form the basis for the future gor V release on an online prediction server. The average accuracy of the prediction of secondary structure with multiple sequence alignment and full jack-knife procedure was 73.5%. The accuracy. of the prediction increases to 74.2% by limiting the prediction to 375 (of 513) sequences having at least 50 PSI-BLAST alignments. The average accuracy of the prediction of the new improved program without using multiple sequence alignments was 67.5%. This is approximately a 3% improvement over the preceding gor IV algorithm (Garnier J, Gibrat JF, Robson B. Methods Enzymol 1996;266:540-553;Kloczkowski A, Ting K-L, Jernigan RL, Garnier J. Polymer 2002;43:441-449). We have discussed alternatives to the segment overlap (Sov) coefficient proposed by Zemla et al. (Proteins 1999;34:220-223). (C) 2002 Wiley-Liss, Inc.*.
暂无评论