咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >DBCURE-MR: An efficient densit... 收藏

DBCURE-MR: An efficient density-based clustering algorithm for large data using MapReduce

DBCURE-MR: 为用 MapReduce 的大数据的一个有效基于密度的聚类算法

作     者:Kim, Younghoon Shim, Kyuseok Kim, Min-Soeng Lee, June Sup 

作者机构:Seoul Natl Univ Sch Elect & Comp Engn Seoul 151600 South Korea SK Planet Seoul South Korea 

出 版 物:《INFORMATION SYSTEMS》 (信息系统)

年 卷 期:2014年第42卷第Jun.期

页      面:15-35页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:Basic Science Research Program [NRF-2009-0078828] Next-Generation Information Computing Development Program [NRF-2012M3C4A7033342] National Research Foundation of Korea (NRF) of the Ministry of Science, ICT & Future Planning (MSIP) Information Technology Research Center (ITRC) National IT Industry Promotion Agency (NIPA) of MSIP [NIPA-2013-H0301-13-4009] SK Planet Cooperation 

主  题:Clustering algorithm Density-based clustering Parallel algorithm MapReduce 

摘      要:Clustering is a useful data mining technique which groups data points such that the points within a single group have similar characteristics, while the points in different groups are dissimilar. Density-based clustering algorithms such as DBSCAN and OPTICS are one kind of widely used clustering algorithms. As there is an increasing trend of applications to deal with vast amounts of data, clustering such big data is a challenging problem. Recently, parallelizing clustering algorithms on a large cluster of commodity machines using the MapReduce framework have received a lot of attention. In this paper, we first propose the new density-based clustering algorithm, called DBCURE, which is robust to find clusters with varying densities and suitable for parallelizing the algorithm with MapReduce. We next develop DBCURE-MR, which is a parallelized DBCURE using MapReduce. While traditional density-based algorithms find each cluster one by one, our DBCURE-MR finds several clusters together in parallel. We prove that both DBCURE and DBCURE-MR find the clusters correctly based on the definition of density-based clusters. Our experimental results with various data sets confirm that DBCURE-MR finds clusters efficiently without being sensitive to the clusters with varying densities and scales up well with the MapReduce framework. (C) 2013 Published by Elsevier Ltd.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分