Advanced satellite tracking technologies have collected huge amounts of wild bird migration data. Biologists use these data to understand dynamic migration patterns, study correlations between habitats, and predict gl...
详细信息
Advanced satellite tracking technologies have collected huge amounts of wild bird migration data. Biologists use these data to understand dynamic migration patterns, study correlations between habitats, and predict global spreading trends of avian influenza. The research discussed here transforms the biological problem into a machine learning problem by converting wild bird migratory paths into graphs. H5N1 outbreak prediction is achieved by discovering weighted closed cliques from the graphs using the mining algorithm High-wEight cLosed cliquE miNing (HELEN). The learning algorithm HELEN-p then predicts potential H5N1 outbreaks at habitats. This prediction method is more accurate than traditional methods used on a migration dataset obtained through a real satellite bird-tracking system. Empirical analysis shows that H5N1 spreads in a manner of high-weight closed cliques and frequent cliques.
Background: With the rapid accumulation of phosphoproteomics data, phosphorylation-site prediction is becoming an increasingly active research area. More than a dozen phosphorylation-site prediction tools have been re...
详细信息
Background: With the rapid accumulation of phosphoproteomics data, phosphorylation-site prediction is becoming an increasingly active research area. More than a dozen phosphorylation-site prediction tools have been released in the past decade. However, there is currently no open-source framework specifically designed for phosphorylation-site prediction except Musite. Results: Here we present the Musite open-source framework for building applications to perform machinelearning based phosphorylation-site prediction. Musite was implemented with six modules loosely coupled with each other. With its well-designed Java application programming interface (API), Musite can be easily extended to integrate various sources of biological evidence for phosphorylation-site prediction. Conclusions: Released under the GNU GPL open source license, Musite provides an open and extensible framework for phosphorylation-site prediction. The software with its source code is available at http://***.
machinelearning (ML) algorithms are data-driven and given a goal task and a prior experience dataset relevant to the task, one can attempt to solve the task using ML seeking to achieve high accuracy. There is usually...
详细信息
ISBN:
(纸本)9781728187082
machinelearning (ML) algorithms are data-driven and given a goal task and a prior experience dataset relevant to the task, one can attempt to solve the task using ML seeking to achieve high accuracy. There is usually a big gap in the understanding between an ML experts and the dataset providers due to limited expertise in cross disciplines. Narrowing down a suitable set of problems to solve using ML is possibly the most ambiguous yet important agenda for data providers to consider before initiating collaborations with ML experts. We proposed an ML-fueled pipeline to identify potential problems (i.e., the tasks) so data providers can, with ease, explore potential problem areas to investigate with ML. The autonomous pipeline integrates information theory and graph-based unsupervised learning paradigms in order to generate a ranked retrieval of top-k problems for the given dataset for a successful ML based collaboration. We conducted experiments on diverse real-world and well-known datasets, and from a supervised learning standpoint, the proposed pipeline achieved 72% top-5 task retrieval accuracy on an average, which surpasses the retrieval performance for the same paradigm using the popular exploratory data analysis tools. Detailed experiment results with our source codes are available at: https://***/jpastorino/heyml.
暂无评论