Identifying influential nodes is crucial for understanding and improving the stability and robustness of complex software network. This paper presents a new method based on LeaderRank information to identify top-k inf...
详细信息
Certain characteristics of software are often hidden in its structure and can only be discovered when the software is executed dynamically. So mining important patterns from dynamic call graph of software plays an imp...
详细信息
The problems of AGMA (Automatic Graph Mining Algorithm) are improved and a novel algorithm, namely CRMA (Clustering Re-clustering Merging Algorithm) is proposed which can realize more reasonable community division for...
详细信息
Point-of-interest(POI) recommendation becomes an important research for location-based social networks, since it helps modern citizens to explore new locations in unvisited cites effectively according to their prefere...
详细信息
Point-of-interest(POI) recommendation becomes an important research for location-based social networks, since it helps modern citizens to explore new locations in unvisited cites effectively according to their preferences. However, the current POI recommendation methods are lack of a deep mining in all time slots features and their effects on recommendation. To this end, in this paper we propose a POI recommendation method(called UPT) by combining time slot features, user-based collaborative filtering and spatial influence. Firstly, we extract time interval feature and time slot based popularity feature from history check-in datasets on LBSNs using probability statistical analysis method. Then, we devise a POI recommendation method based on the proposed temporal features to achieve better performance. In UPT, user-based collaborative filtering and smoothing technique are used by adding each time slot influence, and the overall popularity of a location is combined with each time slot feature. Our experimental results on Foursquare and Gowalla datasets show that UPT outperforms baseline POI recommendation methods in precision and recall.
The Siamese network-based tracker has achieved competitive performance in the field of single target tracking because of its excellent tracking speed and tracking accuracy. When faced with target deformations, most Si...
详细信息
The traditional clustering algorithms often fail to detect meaningful clusters in high-dimensional data space. To improve the above shortcoming, we propose GDRH-Stream, a clustering method based on the attribute relat...
详细信息
The traditional clustering algorithms often fail to detect meaningful clusters in high-dimensional data space. To improve the above shortcoming, we propose GDRH-Stream, a clustering method based on the attribute relativity and grid density for high-dimensional data stream, which consists of an online component and an offline component. First, the algorithm filters out redundant attributes by computing the relative entropy. Then we define a weighted attribute relativity measure and estimate the relativity of the non-redundant attributes, and form the attribute triple. At last, the best interesting subspaces are searched by the attribute triple. On the online component, GDRH-Stream maps each data object into a grid and updates the characteristic vector of the grid. On the offline component, when a clustering request arrives, the best interesting subspaces will be generated by attribute relativity. Then the original grid structure is projected to the subspace and a new grid structure is formed. The clustering will be performed on the new grid structure by adopting an approach based on the density grid. Experimental results show that GDRH-Stream algorithm has better quality and scalability.
In recent years, closed frequent itemsets mining has become a hot topic. In this paper, we present an algorithm BCTCF, which is based on Bit complementary tree (BCTree) in order to mine closed frequent itemsets effici...
详细信息
In recent years, closed frequent itemsets mining has become a hot topic. In this paper, we present an algorithm BCTCF, which is based on Bit complementary tree (BCTree) in order to mine closed frequent itemsets efficiently. First we adopt bit vectors to compress the database and define a novel structure, BCTree, in which a node stores two bit vectors that are complementary and each path is given a prime value. Based on the left-most bit in the bit vectors we adopt a divide-and-conquer strategy which handles the itemsets separately and then according to the prime unique feature we can get the closed frequent itemsets quickly and it makes us need not to mine all the frequent itemsets first. Both the divide-and-conquer strategy and prime unique can decrease the runtime. Experiment results show that BCTCF is very effective and scalable.
Due to the introduction of weight of attributes, most of existing dissimilarity measures can not accurately reflect the difference between two heterogeneous objects, and then clustering quality was decreased. In this ...
详细信息
Due to the introduction of weight of attributes, most of existing dissimilarity measures can not accurately reflect the difference between two heterogeneous objects, and then clustering quality was decreased. In this paper, we present HIDK-means, an approach for clustering heterogeneous data based on information dissimilarity. At first, the algorithm defines heterogeneous information dissimilarity between two heterogeneous objects based on Kolmogorov information theory, and calculates approximately heterogeneous information dissimilarity by a universal probability of an object. Then, in the clustering process, the algorithm selects the initial cluster centers by maximum sum of dissimilarity. After that, each remaining object is assigned to a cluster center which has the smallest dissimilarity with it and the criterion function is calculated. Iteratively, cluster centers are updated and the process is ceased until the criterion function converges or the iteration number reaches the pre-set threshold. The experimental results show that the proposed algorithm HIDK-means is effective in clustering heterogeneous objects and also scalable to large datasets.
In this paper, we present an algorithm TKBT(top-k closed frequent mining based on TKTT) to mine top-k closed frequent itemsets in data streams efficiently. First according to the consecutive and changeable characteris...
详细信息
In this paper, we present an algorithm TKBT(top-k closed frequent mining based on TKTT) to mine top-k closed frequent itemsets in data streams efficiently. First according to the consecutive and changeable characteristics of the data from data streams in sliding window, a novel structure, BWT(bit-vector window table) is defined. In BWT horizontal direction we use bit vectors to express the transactions, record the count of items in the oldest, the newest window and all the windows a t current time, which decreases the calculating time of the items count when a new window slides in. In BWT vertical direction we set window partition, which makes us just need replace the oldest window information with the corresponding newest window when a new window comes. The construction of TKTT (top-k temporary table) is based on BWT. The itemsets in TKTT are ranked in a descending count order. TKBT can get top-k closed frequent itemsets by connecting the candidates in TKTT using top-down strategy. The candidate number is reduced by using closed itemset displace its subset and less connection times are contributed to the less runtime. Experiment results show that TKBT is very effective and scalable.
Due to the randomness of the partition of grids, the edge points of clusters might be partitioned into the sparse grids. These points would become noise information out of clusters when we cluster data stream by grid-...
详细信息
Due to the randomness of the partition of grids, the edge points of clusters might be partitioned into the sparse grids. These points would become noise information out of clusters when we cluster data stream by grid-density based algorithm. A data stream clustering algorithm based on spatial directed graph with core, SDGCStream, is proposed. It uses the spatial directed graph and the orthocenter of the sparse grids to handle the edge points of clusters. At first, the algorithm defines a structure SDGC (Spatial Directed Graph with Core) to store the summary statistics of data stream. The vertices of SDGC are maintained as the stream arriving. When the clustering quest comes, the edge information is generated. The initial clustering results are got through clustering on SDGC, then we judge whether the points of sparse grids which are adjacent to the border of a cluster belong to the cluster according to the orthocenter information and the border vertices of SDGC. At last, a strategy based on the distance between clusters is presented to adjust the clustering results after handling the border of clusters. The experimental results on synthetic and real datasets show the better validity of SDGCStream on handling the edge data points of clusters, and the scalability as the increasing of the length and dimensions of data stream.
暂无评论