General weighted sequential pattern mining algorithms ignore or do not make good use of the time and time-interval information of data elements. Besides some algorithms require to scan the database many times or build...
详细信息
As regard to the case of extending the lifetime of zigbee network, the defination of node's boundary is proposed. First, all the information for node's boundary is stored when zigbee network is built. Then, th...
详细信息
The current clustering algorithms for evolving uncertain data stream are sensitive to user specified threshold, and unstable in noise processing. In this paper, DUStream is presented, a density-based algorithm for dis...
详细信息
The most widely-used collaborative recommendation algorithms are vulnerable to shilling attacks. To this end, in this paper we propose a robust recommendation algorithm based on user rating matrix block and modified L...
详细信息
High dimensional data clustering is an important issue for data mining. Firstly, the records in the dataset are mapped to the vertices of hypergraph, the hyperedges of hypergraph are composed of the vertices which hav...
详细信息
Many of the previous incremental methods in data streams are deleting the old patterns and adding to the new patterns directly, which may delete useful patterns too early. Both different real data and the data occurri...
详细信息
In order to process the software bug feature sequences, this paper presents a gap-constrained sequential pattern mining algorithm, MEMIGCSP algorithm. The length of the interval between items is limited in the origina...
详细信息
Many researchers are devoted to find frequent pattern in static network. Due to these frequent patterns are usually defined as frequent existence patterns which satisfy given support threshold in network. However, the...
详细信息
The traditional clustering algorithms often fail to detect meaningful clusters in high-dimensional data space. To improve the above shortcoming, we propose GDRH-Stream, a clustering method based on the attribute relat...
详细信息
The traditional clustering algorithms often fail to detect meaningful clusters in high-dimensional data space. To improve the above shortcoming, we propose GDRH-Stream, a clustering method based on the attribute relativity and grid density for high-dimensional data stream, which consists of an online component and an offline component. First, the algorithm filters out redundant attributes by computing the relative entropy. Then we define a weighted attribute relativity measure and estimate the relativity of the non-redundant attributes, and form the attribute triple. At last, the best interesting subspaces are searched by the attribute triple. On the online component, GDRH-Stream maps each data object into a grid and updates the characteristic vector of the grid. On the offline component, when a clustering request arrives, the best interesting subspaces will be generated by attribute relativity. Then the original grid structure is projected to the subspace and a new grid structure is formed. The clustering will be performed on the new grid structure by adopting an approach based on the density grid. Experimental results show that GDRH-Stream algorithm has better quality and scalability.
In this paper, we present an algorithm TKBT(top-k closed frequent mining based on TKTT) to mine top-k closed frequent itemsets in data streams efficiently. First according to the consecutive and changeable characteris...
详细信息
In this paper, we present an algorithm TKBT(top-k closed frequent mining based on TKTT) to mine top-k closed frequent itemsets in data streams efficiently. First according to the consecutive and changeable characteristics of the data from data streams in sliding window, a novel structure, BWT(bit-vector window table) is defined. In BWT horizontal direction we use bit vectors to express the transactions, record the count of items in the oldest, the newest window and all the windows a t current time, which decreases the calculating time of the items count when a new window slides in. In BWT vertical direction we set window partition, which makes us just need replace the oldest window information with the corresponding newest window when a new window comes. The construction of TKTT (top-k temporary table) is based on BWT. The itemsets in TKTT are ranked in a descending count order. TKBT can get top-k closed frequent itemsets by connecting the candidates in TKTT using top-down strategy. The candidate number is reduced by using closed itemset displace its subset and less connection times are contributed to the less runtime. Experiment results show that TKBT is very effective and scalable.
暂无评论