Searching frequent patterns in transactional databases is considered as one of the most important data mining problems and apriori is one of the typical algorithms for this task. Developing fast and efficient algorith...
详细信息
Searching frequent patterns in transactional databases is considered as one of the most important data mining problems and apriori is one of the typical algorithms for this task. Developing fast and efficient algorithms that can handle large volumes of data becomes a challenging task due to the large databases. In this paper, we implement a parallel apriori algorithm based on MapReduce, which is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes). The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.
Extracting packet signatures automatically and accurately are the foundation of traffic identification for most network monitoring and forensics application. The apriori algorithm is a common and useful method to fulf...
详细信息
ISBN:
(纸本)9781467390880
Extracting packet signatures automatically and accurately are the foundation of traffic identification for most network monitoring and forensics application. The apriori algorithm is a common and useful method to fulfill the task. For huge amount Internet traffic, the traditional apriori algorithm, produce huge candidate itemsets and will occupy large I/O costs in scanning database. An improvement method is proposed in this paper. Based on the pruning to the candidate and the public signature database, it dynamically reduced the number of the scanning itemsets to make the scanning efficient. The experiment proved that the proposed algorithm can also effectively improve the mining rate.
The data mining is a process of analyzing a huge data from different perspectives and summarizing it into useful information. The information can be converted into knowledge about historical patterns and future trends...
详细信息
ISBN:
(纸本)9781467358453
The data mining is a process of analyzing a huge data from different perspectives and summarizing it into useful information. The information can be converted into knowledge about historical patterns and future trends. Data mining plays a significant role in the field of information technology. Health care industry today generates large amounts of complex data about patients, hospitals resources, diseases, diagnosis methods, electronic patients records, etc,. The data mining techniques are very useful to make medicinal decisions in curing diseases. The healthcare industry collects huge amount of healthcare data which, unfortunately, are not "mined" to discover hidden information for effective decision making. The discovered knowledge can be used by the healthcare administrators to improve the quality of service. In this paper, authors developed a method to identify frequency of diseases in particular geographical area at given time period with the aid of association rule based apriori data mining technique.
Aiming at the performance bottleneck of traditional apriori algorithm when the data set is slightly large, this paper adopts the idea of parallelization and improves the apriori algorithm based on MapReduce model. Fir...
详细信息
Aiming at the performance bottleneck of traditional apriori algorithm when the data set is slightly large, this paper adopts the idea of parallelization and improves the apriori algorithm based on MapReduce model. Firstly, the local frequent itemsets on each sub node in the cluster are calculated, then all the local frequent itemsets are merged into the global candidate itemsets, and finally, the frequent itemsets that meet the conditions are filtered according to the minimum support threshold. The advantage of the improved algorithm is that it only needs to scan the transaction database twice and calculate the frequent item set in parallel, which improves the efficiency of the algorithm. (C) 2021 The Authors. Published by Elsevier B.V.
This paper presents an actor-based apriori algorithm enhanced with fault tolerance mechanism. All phases of the algorithm including candidate generation and support counting operations are performed by asynchronous ac...
详细信息
ISBN:
(纸本)9781510630666
This paper presents an actor-based apriori algorithm enhanced with fault tolerance mechanism. All phases of the algorithm including candidate generation and support counting operations are performed by asynchronous actors. When an error occurs during the execution of the algorithm, calculations are interrupted locally for specific actors. The actor state is restored from the snapshot and the operations that caused the failure are either repeated or skipped. Other actors progress with their current tasks. The algorithm can be executed in parallel and distributed environments. Proposed enhancements have been successfully implemented using JAVA and Akka library. This paper discusses the results of the performance of actor-based apriori algorithm against different datasets. The presented approach has been illustrated with many experiments and measurements performed using multiprocessor and multithreaded computer.
Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. Association rul...
详细信息
ISBN:
(纸本)9781424447053
Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner. Association rules are highly popular data mining method. Association rules show attributes value conditions that occur frequently together in a given dataset And apriori is an efficient association rule mining algorithm.
Association rule mining is an important topic in data mining field. On the basis of the association rule mining and apriori algorithm, this paper proposes an improved algorithm based on the directed network It reduces...
详细信息
ISBN:
(纸本)9780769538594
Association rule mining is an important topic in data mining field. On the basis of the association rule mining and apriori algorithm, this paper proposes an improved algorithm based on the directed network It reduces consumption and improve the efficiency of algorithms by reduce scanning datasets and improving the efficiency of the pruning step. Finally, this paper gives an experiment to analyze and compare the difference between the two algorithms and the result shows that the improved algorithm promotes the efficiency of computing.
Association rules model is widely used in data mining and the apriori is the most famous association rule mining algorithm. Based on the classic apriori association rules algorithm, this paper gives the UML class desi...
详细信息
ISBN:
(纸本)9781424455379
Association rules model is widely used in data mining and the apriori is the most famous association rule mining algorithm. Based on the classic apriori association rules algorithm, this paper gives the UML class design diagram based on the agile design principles and selects the popular OOP language Java to achieve. In practical applications, it can be used in a variety of data application.
The association rule from data mining technology was applied into transformer defect analysis so that the frequent pattern, the dependency and the causality between classification and decision attributes could be foun...
详细信息
ISBN:
(纸本)9781509004966
The association rule from data mining technology was applied into transformer defect analysis so that the frequent pattern, the dependency and the causality between classification and decision attributes could be found based on data of defects. As a result, correlation properties among grid fault elements were seized macroscopically. In this paper which focused on the frequent item mining algorithm research for transformer defect correlation analysis, the definitions related to the association rule were introduced. Specific to weaknesses of traditional apriori algorithm, an efficient analogous frequent item set mining algorithm was presented. With regard to the instance, association rule analysis was carried out for data of transformer defect in Shandong. Relevant results indicated that diverse attribute items were undoubtedly associated with each other to different degrees;in addition, the correlation obtained was adopted to perform operational maintenance for auxiliary equipment and parts, etc. that are vulnerable to defects.
Among a large number of association rule mining algorithms, apriori algorithm is the most classic one, but the apriori algorithm has three deficiencies, namely: the need for scanning databases many times, generating a...
详细信息
ISBN:
(纸本)9780769540207
Among a large number of association rule mining algorithms, apriori algorithm is the most classic one, but the apriori algorithm has three deficiencies, namely: the need for scanning databases many times, generating a large number of Candidate Anthology, as well as frequent itemsets iteratively. The paper presents a method that solves the maximal frequent itemsets through one intersection operation. The degree of support is obtained through the times of intersection without having to scan the transaction database, by numbering some of the properties to reduce memory space and search the candidate set list easily, thereby enhancing the efficiency of the algorithm. Finally, it can generate association rules for Intrusion Detection System. Experimental results show that the optimized algorithm can effectively improve the efficiency of mining association rules.
暂无评论