In recent years, closed frequent itemsets mining has become a hot topic. In this paper, we present an algorithm BCTCF, which is based on Bit complementary tree (BCTree) in order to mine closed frequent itemsets effici...
详细信息
In recent years, closed frequent itemsets mining has become a hot topic. In this paper, we present an algorithm BCTCF, which is based on Bit complementary tree (BCTree) in order to mine closed frequent itemsets efficiently. First we adopt bit vectors to compress the database and define a novel structure, BCTree, in which a node stores two bit vectors that are complementary and each path is given a prime value. Based on the left-most bit in the bit vectors we adopt a divide-and-conquer strategy which handles the itemsets separately and then according to the prime unique feature we can get the closed frequent itemsets quickly and it makes us need not to mine all the frequent itemsets first. Both the divide-and-conquer strategy and prime unique can decrease the runtime. Experiment results show that BCTCF is very effective and scalable.
Due to the randomness of the partition of grids, the edge points of clusters might be partitioned into the sparse grids. These points would become noise information out of clusters when we cluster data stream by grid-...
详细信息
Due to the randomness of the partition of grids, the edge points of clusters might be partitioned into the sparse grids. These points would become noise information out of clusters when we cluster data stream by grid-density based algorithm. A data stream clustering algorithm based on spatial directed graph with core, SDGCStream, is proposed. It uses the spatial directed graph and the orthocenter of the sparse grids to handle the edge points of clusters. At first, the algorithm defines a structure SDGC (Spatial Directed Graph with Core) to store the summary statistics of data stream. The vertices of SDGC are maintained as the stream arriving. When the clustering quest comes, the edge information is generated. The initial clustering results are got through clustering on SDGC, then we judge whether the points of sparse grids which are adjacent to the border of a cluster belong to the cluster according to the orthocenter information and the border vertices of SDGC. At last, a strategy based on the distance between clusters is presented to adjust the clustering results after handling the border of clusters. The experimental results on synthetic and real datasets show the better validity of SDGCStream on handling the edge data points of clusters, and the scalability as the increasing of the length and dimensions of data stream.
Most of algorithms based on tree structure for mining frequent pattern on uncertain data streams always store a large number of tree nodes, and record the corresponding information of data streams which can cause mass...
详细信息
Most of algorithms based on tree structure for mining frequent pattern on uncertain data streams always store a large number of tree nodes, and record the corresponding information of data streams which can cause massive information storages. In this paper, an algorithm CTBVT based on compressed tree and bit vector table for mining frequent patterns on uncertain data streams, is proposed. The uncertain data streams are initialized to probability-vector table, in the table, the items are represented by transactions, unlike other bit vector tables the occurrence probabilities of items are stored in it. When the window slides, all the columns in probability-vector table are left shift m bits at the same time and m is the number of transactions in the window. We also propose compressed tree in which the items with different probabilities are stored in the same tree nodes, which will reduce the number of tree nodes significantly, then the items and its probability in the tree node correspond to the bit vector table are converted into binary bit vector, the number of 1s in the binary bit vector is the frequency of the tree node. Afterwards, each leaf node of the tree is connected to an array which is used to store the combination of all items and their expected support in the path. The leaf nodes are stored in the LeafList. Finally, we scan the arrays that are linked to the leaf nodes in the LeafList and compare the expected support that is stored in the array with a minimum support threshold minSup to get all the frequent itemsets, mining time will reduce dramatically. Experiment results show that CTBVT is very efficiency and scalable.
Traditionally, analysis of dynamic network has been focused only on a single snapshot or integrated network obtained over a period of time. However, the temporal feature in dynamic network has been ignored. In this pa...
详细信息
The form of books is constantly developing with the upgrading of carrying media, and the emergence of electronic books has greatly shaken the traditional paper books. In recent years, with the combination of artificia...
详细信息
ISBN:
(纸本)9781538684986;9781538684979
The form of books is constantly developing with the upgrading of carrying media, and the emergence of electronic books has greatly shaken the traditional paper books. In recent years, with the combination of artificial intelligence, virtual reality, high-speed network and digital reading, the concept of "VR" has been applied to more and more industries. The introduction of VReading multi-sensory reading platform will bring new ideas to digital reading industry.
In this letter, new methods of constructing quaternary sequence pairs are presented based on binary sequence pairs with two-level autocorrelation, almost perfect binary sequence pairs and cyclic shift sequences. The q...
详细信息
The algorithm of this paper inserts pseudo items which are converted from item interval to obtain equal extended sequence database;it defines item-interval constraints, which are relative to the item weight, to prune ...
详细信息
ISBN:
(纸本)9781510803084
The algorithm of this paper inserts pseudo items which are converted from item interval to obtain equal extended sequence database;it defines item-interval constraints, which are relative to the item weight, to prune the mining patterns. Through doing this, the algorithm avoids mining the patterns which users are not interested in and shortens the running time. It adopts histogram statistic pattern to get the standardization description to item interval of the mining patterns, making the mining sequences include the item interval information which is valuable to user decision.
In order to maximize the influence of commodity profits in e-commerce platforms, designing and improving the K-shell algorithm to select the more influential seed node sets in this paper. The new algorithm improves th...
详细信息
In order to maximize the influence of commodity profits in e-commerce platforms, designing and improving the K-shell algorithm to select the more influential seed node sets in this paper. The new algorithm improves the number of active nodes by setting node threshold and edge weight attributes. To obtain more commodity profits, a strategy IRDSN(Strategy for Improving Repeat Degree of Seed Nodes) is proposed to select initial seed nodes and improve the repeat degree of seed nodes. The profit maximization based on linear threshold model is realized by setting different propagation modes. The improved algorithm and strategy IRDSN are analysed and verified in real data set and e-commerce platform. The results show that the algorithm effectively improves the profit of commodities.
This paper discusses the simulated computation methods of remote sensing information model, and tries to put forward a more available solution. It presents our research works on the description and simulation methods ...
详细信息
With the research of influence maximization algorithm, many researchers have found that the existing algorithm has the problem of overlapping influence of seed nodes. In order to solve the problem of overlapping influ...
详细信息
With the research of influence maximization algorithm, many researchers have found that the existing algorithm has the problem of overlapping influence of seed nodes. In order to solve the problem of overlapping influence of seed nodes, this paper proposes an IMCS algorithm based on community structure. Firstly, we divide the community through the central node, and the quality of community division is ensured by defining community fitness and node contribution. Then through the analysis of the community division results, the seed node selects the one with the largest degree. Since most of the nodes activated by seed nodes of different communities also belong to different communities, this method solves the problem of overlapping influence to a certain extent. The experimental results show that the effectiveness of the IMCS algorithm is verified under the real network, cooperative network and artificial network, and the IMCS algorithm has a better effect than IEIR and Degree algorithms in most networks under the IC model.
暂无评论