In high-dimensional data space, because the data is sparse inherently, clusters tend to exist in different subspaces, which makes the traditional methods no longer suitable for use. In this paper, we present SCFES, a ...
详细信息
In high-dimensional data space, because the data is sparse inherently, clusters tend to exist in different subspaces, which makes the traditional methods no longer suitable for use. In this paper, we present SCFES, a subspace clustering algorithm based on finding effective spaces. First, we define the effective dimension. By calculating relative entropy we remove redundancy dimensions which affect clustering accuracy. Second, according to the data distribution in the effective dimensions, we get the effective intervals through merging adjacent intervals. The effective space is composed of effective intervals. Third, we extend the density estimator based on undirected acyclic connected graph by using weight so as to estimate the expectation of existing clusters in the space, at the same time combine it with the monotonicity of the clustering criterion mentioned in the CLIQUE algorithm to prune candidates. Consequently we get the effective spaces. Finally, we adopt the structure of sibling tree to store all the effective spaces and use DBSCAN algorithm based on density to generate maximal subspace clusters in some effective spaces. Experimental results show that SCFES effectively finds arbitrarily shaped and positioned clusters in different subspaces. Meanwhile SCFES has better clustering quality and scalability.
Many previous algorithms in data streams are about single stream, which can only process single items. The algorithms about data streams are always extended by sequential pattern algorithms about static database, they...
详细信息
The existing collaborative recommendation algorithms have lower robustness against shilling *** this problem in mind,in this paper we propose a robust collaborative recommendation algorithm based on k-distance and Tuk...
详细信息
The existing collaborative recommendation algorithms have lower robustness against shilling *** this problem in mind,in this paper we propose a robust collaborative recommendation algorithm based on k-distance and Tukey ***,we propose a k-distancebased method to compute user suspicion degree(USD).The reliable neighbor model can be constructed through incorporating the user suspicion degree into user neighbor *** influence of attack profiles on the recommendation results is reduced through adjusting similarities among ***,Tukey M-estimator is introduced to construct robust matrix factorization model,which can realize the robust estimation of user feature matrix and item feature matrix and reduce the influence of attack profiles on item feature ***,a robust collaborative recommendation algorithm is devised by combining the reliable neighbor model and robust matrix factorization *** results show that the proposed algorithm outperforms the existing methods in terms of both recommendation accuracy and robustness.
General weighted sequential pattern mining algorithms ignore or do not make good use of the time and time-interval information of data elements. Besides some algorithms require to scan the database many times or build...
详细信息
The existing grid-based uncertain data stream clustering algorithms are fast but low-accuracy, and sensitive to user-specified threshold. In order to solve the above problems, a density grid-based uncertain data strea...
详细信息
The current clustering algorithms for evolving uncertain data stream are sensitive to user specified threshold, and unstable in noise processing. In this paper, DUStream is presented, a density-based algorithm for dis...
详细信息
As regard to the case of extending the lifetime of zigbee network, the defination of node's boundary is proposed. First, all the information for node's boundary is stored when zigbee network is built. Then, th...
详细信息
Most existing vulnerability taxonomy classifies vulnerabilities by their idiosyncrasies, weaknesses, flaws and faults et al. The disadvantage of the taxonomy is that the classification standard is not unified and ther...
详细信息
Most existing vulnerability taxonomy classifies vulnerabilities by their idiosyncrasies, weaknesses, flaws and faults et al. The disadvantage of the taxonomy is that the classification standard is not unified and there is overlap classification phenomenon in vulnerability taxonomy. In order to solve the problem, we will propose an algorithm VUNClique, virtual Grid-based Clustering of Uncertain Data on vulnerability database. Firstly, this paper transforms the vulnerability database into uncertain dataset using the existing vulnerability database pretreatment model. Secondly, we define a virtual grid structure, the cells are divided into real cells and virtual cells, but only the real cells which contain data objects stored in memory. The probability attribute value similarity is defined to deal with the similarity of non-numeric attributes, which compares the number of non-numeric attributes with the same value between tuples to measure the similarity. We provide a secondary partition algorithm to improve the similarity between the tuples in the same cell, the algorithm merges a tuple into it's high-density neighbor cell which has the maximum value of probability attribute value similarity with it. Then, a novel identify cluster algorithm is provided to cluster the high-density real cells. It can identify clusters of arbitrary shapes by traversing real cells twice. Finally, performance experiments over the uncertain dataset transformed by NVD vulnerability database. The experiments results show that VUNClique can find clusters of arbitrary shapes, and greatly improve the efficiency of clustering.
The most widely-used collaborative recommendation algorithms are vulnerable to shilling attacks. To this end, in this paper we propose a robust recommendation algorithm based on user rating matrix block and modified L...
详细信息
Many of the previous incremental methods in data streams are deleting the old patterns and adding to the new patterns directly, which may delete useful patterns too early. Both different real data and the data occurri...
详细信息
暂无评论