proteincomplexes involve in most if not all of essential biological processes in a living cell. Many attempts have been devoted to identify proteincomplexes using computational methods, most of which exploit protein...
详细信息
proteincomplexes involve in most if not all of essential biological processes in a living cell. Many attempts have been devoted to identify proteincomplexes using computational methods, most of which exploit protein-protein interaction networks to search intensively interacting proteins as a proteincomplex. Besides identifying proteincomplexes, knowing their biological functions may help unlock their molecular mechanisms and their roles in related biological processes. Therefore, it is also desirable to computationally predict the functions of proteincomplexes. However, no literature has been found to address such a problem. This paper attempts to address the problem by choosing yeast as the model organism, where total 50 proteincomplexes are collected and their functions are validated by solid experiments. Each of the complexes was encoded by a numeric vector based upon their graphic and functional properties. Feature selection techniques, including Minimum Redundancy Maximum Relevance and Incremental Feature Selection, were adopted to extract core features for the prediction. Three different prediction methods, Nearest Neighbor Algorithm, Bayesian network and Sequential Minimal Optimization, were utilized in this study and tested by jackknife cross-validation test. Consequently, 22 core features coupled with Nearest Neighbor Algorithm gain the highest accuracy. These core features are regarded as the most important features for the determination of the biological functions of proteincomplexes. 19 out of 22 core features were from functional properties, indicating that the functions of each protein component probably constrain the overall functions of the proteincomplex.
暂无评论