Massive amounts of entertainment information are generated by web every day. That is the reason we choose this field as the research object of data mining. We believe that the data of the entertainment industry is ver...
详细信息
Massive amounts of entertainment information are generated by web every day. That is the reason we choose this field as the research object of data mining. We believe that the data of the entertainment industry is very large-scale and very valuable. Being aware of capabilities of big data processing, we use a few mature frameworks of distributed computing to handle the problems may occur such as Hadoop and HBase. In this paper, we designed a data mining platform including a topic-sensitive crawler subsystem and a data analysis subsystem using Hadoop based on distributed computing which improve the capacity greatly compare to centralized system.
暂无评论