检索结果-内蒙古大学图书馆

A random forest based computational model for predicting novel lncRNA-disease associations

BMC bioinformatics 2020年第1期21卷 126-126页

作者： Yao, Dengju Zhan, Xiaojuan Zhan, Xiaorong Kwoh, Chee Keong Li, Peng Wang, Jinke Harbin Univ Sci & Technol Sch Software & Microelect Harbin 150080 Peoples R China Heilongjiang Inst Technol Coll Comp Sci & Technol Harbin 150050 Peoples R China Harbin Med Univ Dept Endocrinol & Metab Affiliated Hosp 1 Harbin Heilongjiang Peoples R China Nanyang Technol Univ Sch Comp Sci & Engn Singapore 639798 Singapore Harbin Univ Sci & Technol Dept Software Engn Rongcheng 264300 Peoples R China

Background Accumulated evidence shows that the abnormal regulation of long non-coding RNA (lncRNA) is associated with various human diseases. Accurately identifying disease-associated lncRNAs is helpful to study the mechanism of lncRNAs in diseases and explore new therapies of diseases. Many lncRNA-disease association (LDA) prediction models have been implemented by integrating multiple kinds of data resources. However, most of the existing models ignore the interference of noisy and redundancy information among these data resources. Results To improve the ability of LDA prediction models, we implemented a random forest and feature selection based LDA prediction model (RFLDA in short). First, the RFLDA integrates the experiment-supported miRNA-disease associations (MDAs) and LDAs, the disease semantic similarity (DSS), the lncRNA functional similarity (LFS) and the lncRNA-miRNA interactions (LMI) as input features. Then, the RFLDA chooses the most useful features to train prediction model by feature selection based on the random forest variable importance score that takes into account not only the effect of individual feature on prediction results but also the joint effects of multiple features on prediction results. Finally, a random forest regression model is trained to score potential lncRNA-disease associations. In terms of the area under the receiver operating characteristic curve (AUC) of 0.976 and the area under the precision-recall curve (AUPR) of 0.779 under 5-fold cross-validation, the performance of the RFLDA is better than several state-of-the-art LDA prediction models. Moreover, case studies on three cancers demonstrate that 43 of the 45 lncRNAs predicted by the RFLDA are validated by experimental data, and the other two predicted lncRNAs are supported by other LDA prediction models. Conclusions Cross-validation and case studies indicate that the RFLDA has excellent ability to identify potential disease-associated lncRNAs.

关键词： Random forest Variable importance Feature selection lncRNA-disease association prediction bioinformatics algorithm

来源：评论

学校读者我要写书评

暂无评论

A novel pattern matching algorithm for genomic patterns related to protein motifs

引用

JOURNAL OF bioinformatics AND COMPUTATIONAL BIOLOGY 2020年第1期18卷 2050011-2050011页

作者： Foroughmand-Araabi, Mohammad-Hadi Goliaei, Sama Goliaei, Bahram Sharif Univ Technol Dept Math Sci Tehran Iran Univ Tehran Fac New Sci & Technol Tehran Iran Univ Tehran Inst Biochem & Biophys Tehran Iran

Patterns on proteins and genomic sequences are vastly analyzed, extracted and collected in databases. Although protein patterns originate from genomic coding regions, very few works have directly or indirectly dealt with coding region patterns induced from protein patterns. Results: In this paper, we have defined a new genomic pattern structure suitable for representing induced patterns from proteins. The provided pattern structure, which is called "Consecutive Positions Scoring Matrix (CPSSM)", is a replacement for protein patterns and profiles in the genomic context. CPSSMs can be identified, discovered, and searched in genomes. Then, we have presented a novel pattern matching algorithm between the defined genomic pattern and genomic sequences based on dynamic programming. In addition, we have modified the provided algorithm to support intronic gaps and huge sequences. We have implemented and tested the provided algorithm on real data. The results on Saccharomyces cerevisiae's genome show 132% more true positives and no false negatives and the results on human genome show no false negatives and 10 times as many true positives as those in previous works. Conclusion: CPSSM and provided methods could be used for open reading frame detection and gene finding. The application is available with source codes to run and download at http://***. ir/cpssm/.

关键词： Genomic sequence pattern matching bioinformatics service dynamic programming bioinformatics algorithm

来源：评论

学校读者我要写书评

暂无评论

A fast exact sequential algorithm for the partial digest problem

引用

BMC bioinformatics 2016年第SUPPL 19期17卷 139-148页

作者： Abbas, Mostafa M. Bahig, Hazem M. Hamad Bin Khalifa Univ Qatar Comp Res Inst Doha Qatar Ain Shams Univ Fac Sci Dept Math Div Comp Sci Cairo 11566 Egypt

Background: Restriction site analysis involves determining the locations of restriction sites after the process of digestion by reconstructing their positions based on the lengths of the cut DNA. Using different reaction times with a single enzyme to cut DNA is a technique known as a partial digestion. Determining the exact locations of restriction sites following a partial digestion is challenging due to the computational time required even with the best known practical algorithm. Results: In this paper, we introduce an efficient algorithm to find the exact solution for the partial digest problem. The algorithm is able to find all possible solutions for the input and works by traversing the solution tree with a breadth-first search in two stages and deleting all repeated subproblems. Two types of simulated data, random and Zhang, are used to measure the efficiency of the algorithm. We also apply the algorithm to real data for the Luciferase gene and the E. coli K12 genome. Conclusion: Our algorithm is a fast tool to find the exact solution for the partial digest problem. The percentage of improvement is more than 75% over the best known practical algorithm for the worst case. For large numbers of inputs, our algorithm is able to solve the problem in a suitable time, while the best known practical algorithm is unable.

关键词： Restriction site analysis Digestion process Partial digest problem DNA bioinformatics algorithm Breadth first search

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：