The data generated in real world applications in recent days are quite huge and processing and analyzing those data has become a challenging task. For any analytical and inference engine finding frequent itemsets turn...
详细信息
ISBN:
(纸本)9781538632437
The data generated in real world applications in recent days are quite huge and processing and analyzing those data has become a challenging task. For any analytical and inference engine finding frequent itemsets turns out to be a major functionality. Frequent itemset mining is performed usually with the help of association rule mining technique in data mining. Generally the results obtain from these techniques are large and diverse which makes it difficult for an inference engine to reach a conclusion. One of the ideal solutions for handling such datasets is by devising a parallel processing system to efficiently run the data mining approaches and obtain a more accurate and easily analyzable output. The research carried out here aims at designing and developing parallel analytical model for frequent itemset mining in big data by integrating R on Hadoop.
The problem of mining association rules and its relative parallel mining algorithms is presented. The problems that existing algorithms can cause low efficiency and information lost in heterogeneous distributed databa...
详细信息
The problem of mining association rules and its relative parallel mining algorithms is presented. The problems that existing algorithms can cause low efficiency and information lost in heterogeneous distributed databases is pointed out. An asynchronous parallel algorithm based on our HDDMiner system for mining association rules in heterogeneous distributed databases is given after a basic theory is proved. Some problems involved are discussed in detail as well. At the end of this paper, several key issues in the research of parallelalgorithms in heterogeneous distributed databases are introduced.
暂无评论