This paper proposes a fault-tolerant frequent itemset mining algorithm(FTTlist)based on the linear table when the fault-tolerance is 1. The algorithm uses the method of concatenating 1 in the highest bit of the bina...
详细信息
This paper proposes a fault-tolerant frequent itemset mining algorithm(FTTlist)based on the linear table when the fault-tolerance is 1. The algorithm uses the method of concatenating 1 in the highest bit of the binary number of the known fault-tolerant frequent patterns to generate the candidate fault_tolerant patterns, called FTandidate. The algorithm is based on the data structure of the linear table for fault-tolerant frequent itemset mining. This method does not need recursion, so it reduces the consumption of mining space. At the same time, the paper proposed a deduplication algorithm to remove the support for repeat calculations. So the algorithm has a strong advantage in spatial performance. In addition, the algorithm only needs to mine two horizontal chains of the FTandidate, thus reducing the consumption of mining time. Finally, the paper shows the time performance and space performance of the proposed algorithm under sparse datasets and dense datasets. The results show that our algorithm has better mining time than other algorithms, and the horizontal chain reduces the memory occupation of the algorithm.
The multi-dimensional association rule algorithm in data mining was used to quantitatively analyze the influence relationship between carbon emission futures price (CEFP) and ten influencing Europe economic factors in...
详细信息
ISBN:
(纸本)9781450398640
The multi-dimensional association rule algorithm in data mining was used to quantitatively analyze the influence relationship between carbon emission futures price (CEFP) and ten influencing Europe economic factors in four aspects: population and employment, consumption, domestic product, foreign trade. The results showed that, in terms of population and employment factors, unemployment was negatively correlated with CEFP. In terms of consumption, CEFP is positively correlated with consumption level. In terms of domestic product, CEFP price is positively correlated with gross domestic product, gross added value and total economy. Data mining technology is used to quantitatively analyze the correlation degree between CEFP and influencing factors, in order to provide scientific basis for the relevant departments to invest CEFP.
Because of the random distribution of microblog public opinion data, it is difficult to mine, this paper proposes a microblog public opinion data mining algorithm based on multi vision clustering model. It constructs ...
详细信息
Because of the random distribution of microblog public opinion data, it is difficult to mine, this paper proposes a microblog public opinion data mining algorithm based on multi vision clustering model. It constructs the phase space distribution structure model of microblog public opinion data, the fuzzy association rule distribution set of microblog public opinion data, analyses the high-order statistical characteristics of microblog public opinion data, and advances the data of fuzzy clustering center according to the different statistical characteristics Row partition block scheduling. The binary structure of microblog public opinion data is reconstructed in the virtual database, and multi angle fuzzy clustering is carried out according to the reconstruction results to realise the optimised mining of microblog public opinion data. The simulation results show that the mining time of this method is up to 4.13 MS and the mining accuracy is up to 100%.
Current risk data generated by network attack data is lack of predictability, in this paper, a fast association rule mining algorithm for network attack data is proposed. Based on the related data, the fuzzy theory is...
详细信息
Current risk data generated by network attack data is lack of predictability, in this paper, a fast association rule mining algorithm for network attack data is proposed. Based on the related data, the fuzzy theory is used to introduce the frequency of network attack events into the association rules, and based on the genetic algorithm, the concept of interest degree and approximation are introduced to improve the membership function which can establish the network attack data association rules to achieve rapid data mining. The experimental results showed that the proposed algorithm has certain accuracy and efficiency advantages.
This paper introduces the basic principles of association rule mining algorithms, and in turn studies association rule mining algorithms based on the number of variables(dimensions) involved in mining, the level of ...
详细信息
This paper introduces the basic principles of association rule mining algorithms, and in turn studies association rule mining algorithms based on the number of variables(dimensions) involved in mining, the level of abstraction of data, and the categories of processing variables(Boolean and numeric). This paper summarizes, analyzes and compares some typical algorithms. Finally, the research direction of association rule mining algorithms is prospected.
With purchase of new books every year, collection of books in a library will continuously increase. As circulation data is updated every day, database of a library will also increase while minimal support degree will ...
详细信息
With purchase of new books every year, collection of books in a library will continuously increase. As circulation data is updated every day, database of a library will also increase while minimal support degree will change due to different required correlation degree. To satisfy requirements of library readers as soon as possible with book allocation, mining result needs to be updated continually. Therefore, this thesis proposes an IM-Miner algorithm to realize synthesized update and mining of maximal frequent item sets with database and minimal support degree varying at the same time. IM-Miner algorithm makes the most of FP-Tree features and doesn’t need to generate maximal frequent candidate item sets during mining. As generation of maximal frequent item sets only occurs in FP-Tree, no scanning of transactional database is required. Experimental results showed that IM-Miner algorithm is more efficient than other algorithms.
This paper examines class schedules, precautions, and association rule algorithms and builds a more scientific class scheduling system. Data mining technology association rules handle scheduling conflicts. This method...
详细信息
This paper examines class schedules, precautions, and association rule algorithms and builds a more scientific class scheduling system. Data mining technology association rules handle scheduling conflicts. This method extracts efficient negative sequence rules from patterns. Using local utility value and utility confidence, e-HUNSR formalises the problem of efficient negative sequence rules, generates candidate rules and a pruning strategy quickly, designs a data structure to store the necessary information, and proposes an efficient way to compute the antecedent local utility value and a simplified utility value calculation. Association rules and mining are used to solve the scheduling problem. The system can conduct course queries, OSes, and performs well in data mining. After experimental verification, the hybrid method with different scheduling condition criteria obtains 98.12% course selection satisfaction. Rule satisfaction averages 94.98%, and intelligent scheduling system scheduling efficiency is 91.91%. Adding fresh ideas and methods to the intelligent scheduling system increases instructional resources and university scheduling. Smarter university timetable management allocates teaching resources and completes education and teaching plans.
Natural language processing is one of the important research in the field of artificial intelligence, and tacit knowledge expressed by natural language is the research hotspot. In order to provide a mathematical tool ...
详细信息
ISBN:
(纸本)9781467393232
Natural language processing is one of the important research in the field of artificial intelligence, and tacit knowledge expressed by natural language is the research hotspot. In order to provide a mathematical tool for mining tacit knowledge, we establish a concrete model of 6-ary linguistic truth-valued concept lattice to deal with natural language and introduce a mining algorithm of tacit knowledge through the structure consistency. Specifically, we utilize the attributes to depict knowledge, propose the 6-ary linguistic truth-valued object tacit context and homotype context to characterize tacit knowledge, and research the necessary and sufficient conditions of forming tacit knowledge. We respectively give the generating algorithm of the linguistic truth-valued homotype context and the constructing algorithm the 6-ary linguistic truth-valued concept lattice.
High utility itemsets can reveal combinations of items that have a high profit, expense, or importance. mining high utility itemsets in a database with n items generally results in a huge search space, composed of 2(n...
详细信息
High utility itemsets can reveal combinations of items that have a high profit, expense, or importance. mining high utility itemsets in a database with n items generally results in a huge search space, composed of 2(n )itemsets, and heavy utility calculations for the explored itemsets. Previous algorithms using prefix tree structures perform two phases, namely candidate generation and testing. To avoid generating candidate itemsets, one-phase algorithms use list or hyper-link structures and have been proven to be superior to two-phase algorithms. However, it should be noted that a prefix tree is still an efficient structure for itemset mining problems, and especially algorithms using prefix trees such as FP-Growth have shown excellent performance for mining frequent itemsets. This paper proposes Hamm, a High-performance algorithm for mining high utility itemsets. Hamm employs a novel TV (prefix Tree and utility Vector) structure and mines high utility itemsets in one phase without candidate generation. We also develop an efficient optimization which is incorporated into Hamm as a component. Using prefix trees and utility vectors, Hamm outperforms state-of-the-art algorithms on various databases in experiments. Experimental results also show that the proposed optimization remarkably reduces the search space and speeds up Hamm.
In real life, there exist a lot of attributed graphs each of which contains attribute information as well as structural information. As time goes on, a group of attributed graphs form an attributed graph sequence. Bei...
详细信息
In real life, there exist a lot of attributed graphs each of which contains attribute information as well as structural information. As time goes on, a group of attributed graphs form an attributed graph sequence. Being the generalization of single-attributed graph sequences, multi-attributed graph sequences are arising vastly and quickly. mining the temporal associations hidden in a multi-attributed graph sequence is in urgent need from data owners. To meet the need and fill the gap of research on mining such kind of temporal associations, we first give a definition of temporal association rules for describing temporal associations in a multi-attributed graph sequence, and then propose a fast algorithm for mining temporal association rules in a multi-attributed graph sequence which is based on the anti-monotonicity of support. The proposed algorithm is designed in two steps, namely finding frequent temporal association rules and verifying the credibility of these rules. Equipped with two novel joining and pruning strategies, the proposed algorithm exhibits much higher efficiency which is specially pursued in the process of rule mining. Experiments performed on synthetic datasets and real datasets show that the proposed algorithm is effective and more efficient than other existing algorithms.
暂无评论