Searching and mining biomedical literature database, such as MEDLINE, is the main source of generating scientific hypothesis for biomedical researchers. Through grouping similar documents together, clustering techniqu...
详细信息
Searching and mining biomedical literature database, such as MEDLINE, is the main source of generating scientific hypothesis for biomedical researchers. Through grouping similar documents together, clustering techniques can facilitate user's need of effectively finding interested documents. Since non-negativematrixfactorization (NMF) can effectively capture the latent semantic space with non-negativefactorization in both the basis and the weight, it has been utilized to clustering general text documents. Considering the stochastic nature of NMF with respect to initialization, we propose to use ensemble NMF for biomedical document clustering. The performance of ensemble NMF was evaluated on clustering a large number of datasets generated from TREC Genomics track dataset. The experimental results show that our method outperforms classical clustering algorithms bisect k-means, k-means and hierarchical clustering significantly in most of the datasets.
暂无评论