检索结果-内蒙古大学图书馆

Genome-wide sequence-based prediction of peripheral proteins using a novel semi-supervised learning technique

BMC bioinformatics 2010年第1期11卷 1-8页

作者： Bhardwaj, Nitin Gerstein, Mark Lu, Hui Univ Illinois Bioinformat Program Dept Bioengn Chicago IL 60607 USA Yale Univ Program Computat Biol & Bioinformat New Haven CT 06520 USA Yale Univ Dept Mol Biophys & Biochem New Haven CT 06520 USA Yale Univ Dept Comp Sci New Haven CT 06520 USA

Background: In supervised learning, traditional approaches to building a classifier use two sets of examples with pre-defined classes along with a learning algorithm. The main limitation of this approach is that examples from both classes are required which might be infeasible in certain cases, especially those dealing with biological data. Such is the case for membrane-binding peripheral domains that play important roles in many biological processes, including cell signaling and membrane trafficking by reversibly binding to membranes. For these domains, a well-defined positive set is available with domains known to bind membrane along with a large unlabeled set of domains whose membrane binding affinities have not been measured. The aforementioned limitation can be addressed by a special class of semi-supervised machine learning called positive-unlabeled (PU) learning that uses a positive set with a large unlabeled set. Methods: In this study, we implement the first application of PU-learning to a protein function prediction problem: identification of peripheral domains. PU-learning starts by identifying reliable negative (RN) examples iteratively from the unlabeled set until convergence and builds a classifier using the positive and the final RN set. A data set of 232 positive cases and similar to 3750 unlabeled ones were used to construct and validate the protocol. Results: Holdout evaluation of the protocol on a left-out positive set showed that the accuracy of prediction reached up to 95% during two independent implementations. Conclusion: These results suggest that our protocol can be used for predicting membrane-binding properties of a wide variety of modular domains. Protocols like the one presented here become particularly useful in the case of availability of information from one class only.

关键词： Decision Tree Algorithm Peripheral Protein Modular Domain bioinformatics problem Peripheral Domain

来源：评论

学校读者我要写书评

暂无评论

A space-efficient solution to find the maximum overlap using a compressed suffix array

A space-efficient solution to find the maximum overlap using...

引用

2nd Middle East Conference on Biomedical Engineering (MECBME)

作者： Rachid, Maan Haj Malluhi, Qutaibah Abouelhoda, Mohamed

ISBN: (纸本)9781479947997

Compressed indices are important data structures in stringology. Compressed versions of many wellknown data structures such as suffix tree and suffix array, which are used in string matching problems, have been studied and proposed. This paper takes advantage of a very recent compressed suffix array to build a space-economic solution for an important bioinformatics problem, namely the all-pairs suffix prefix problem. The paper also presents a simple technique for parallelizing the solution. Our results show that the proposed solution consumes less than one fifth of the space required by other solutions based on standard data structures. In addition, our results demonstrate that good performance scalability can be achieved by employing the proposed parallel algorithm.

关键词： bioinformatics data structures parallel algorithms string matching all-pair suffix prefix problem bioinformatics problem compressed indices compressed suffix array data structures maximum overlap parallel algorithm performance scalability space-economic solution space-efficient solution string matching problems stringology suffix tree Algorithm design and analysis Arrays Assembly bioinformatics Genomics Indexes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：