In order to maximize the influence of commodity profits in e-commerce platforms, designing and improving the K-shell algorithm to select the more influential seed node sets in this paper. The new algorithm improves th...
详细信息
In order to maximize the influence of commodity profits in e-commerce platforms, designing and improving the K-shell algorithm to select the more influential seed node sets in this paper. The new algorithm improves the number of active nodes by setting node threshold and edge weight attributes. To obtain more commodity profits, a strategy IRDSN(Strategy for Improving Repeat Degree of Seed Nodes) is proposed to select initial seed nodes and improve the repeat degree of seed nodes. The profit maximization based on linear threshold model is realized by setting different propagation modes. The improved algorithm and strategy IRDSN are analysed and verified in real data set and e-commerce platform. The results show that the algorithm effectively improves the profit of commodities.
Recently, the grid-density based clustering has become one of the major issues among all of the clustering approaches, it has special advantages over other clustering algorithms, such as less computation and the abili...
详细信息
Recently, the grid-density based clustering has become one of the major issues among all of the clustering approaches, it has special advantages over other clustering algorithms, such as less computation and the ability of clustering with arbitrarily shape, which are particularly useful for the data stream clustering. This paper defines a spatial directed graph named Grid-Based Graph (GBG) to store the non-empty grids in data space, and proposes a data stream clustering algorithm based on spatial directed graph GBGSClu (Grid-Based Graph Stream Clustering). GBG graph composes of vertices and directed edges, if a vertex A has a neighboring dense vertex B, and then there is a directed edge from vertex B to A in GBG. The algorithm maps the data stream into the non-empty vertices online, updates the vertices' feature vectors with the arriving of data stream, deletes the sparse vertices every gap time, generates GBG graph when the clustering quest coming and finally clusters on the current structure. The eventual clustering results can be obtained by only checking the vertices' in-degree which can reduce the computation needed in clustering. The validity and efficiency of GBGSClu algorithm have been tested and verified by clustering on real and synthetic datasets.
In high-dimensional data space, because the data is sparse inherently, clusters tend to exist in different subspaces, which makes the traditional methods no longer suitable for use. In this paper, we present SCFES, a ...
详细信息
In high-dimensional data space, because the data is sparse inherently, clusters tend to exist in different subspaces, which makes the traditional methods no longer suitable for use. In this paper, we present SCFES, a subspace clustering algorithm based on finding effective spaces. First, we define the effective dimension. By calculating relative entropy we remove redundancy dimensions which affect clustering accuracy. Second, according to the data distribution in the effective dimensions, we get the effective intervals through merging adjacent intervals. The effective space is composed of effective intervals. Third, we extend the density estimator based on undirected acyclic connected graph by using weight so as to estimate the expectation of existing clusters in the space, at the same time combine it with the monotonicity of the clustering criterion mentioned in the CLIQUE algorithm to prune candidates. Consequently we get the effective spaces. Finally, we adopt the structure of sibling tree to store all the effective spaces and use DBSCAN algorithm based on density to generate maximal subspace clusters in some effective spaces. Experimental results show that SCFES effectively finds arbitrarily shaped and positioned clusters in different subspaces. Meanwhile SCFES has better clustering quality and scalability.
Most of algorithms based on tree structure for mining frequent pattern on uncertain data streams always store a large number of tree nodes, and record the corresponding information of data streams which can cause mass...
详细信息
Most of algorithms based on tree structure for mining frequent pattern on uncertain data streams always store a large number of tree nodes, and record the corresponding information of data streams which can cause massive information storages. In this paper, an algorithm CTBVT based on compressed tree and bit vector table for mining frequent patterns on uncertain data streams, is proposed. The uncertain data streams are initialized to probability-vector table, in the table, the items are represented by transactions, unlike other bit vector tables the occurrence probabilities of items are stored in it. When the window slides, all the columns in probability-vector table are left shift m bits at the same time and m is the number of transactions in the window. We also propose compressed tree in which the items with different probabilities are stored in the same tree nodes, which will reduce the number of tree nodes significantly, then the items and its probability in the tree node correspond to the bit vector table are converted into binary bit vector, the number of 1s in the binary bit vector is the frequency of the tree node. Afterwards, each leaf node of the tree is connected to an array which is used to store the combination of all items and their expected support in the path. The leaf nodes are stored in the LeafList. Finally, we scan the arrays that are linked to the leaf nodes in the LeafList and compare the expected support that is stored in the array with a minimum support threshold minSup to get all the frequent itemsets, mining time will reduce dramatically. Experiment results show that CTBVT is very efficiency and scalable.
There exist two major problems in weighted closed sequential patterns mining. The first is that only the weights of items are considered and they ignore the time-interval information of data elements during the mining...
详细信息
There exist two major problems in weighted closed sequential patterns mining. The first is that only the weights of items are considered and they ignore the time-interval information of data elements during the mining process;the second is that the existing weighted closed sequential pattern mining algorithms need to scan the sequence database many times or to construct numerous intermediate databases. To address these problems, we propose a memory-based algorithm, MIWCSpan (Memory Indexing for Weighted Closed Sequential pattern mining), for weighted closed sequential pattern mining. In the algorithm, we define a novel sequence weighting approach to find more interesting sequential patterns. Both the weight of sequence items and the time-interval of the data elements are considered in this approach. Moreover, an improved index set based on time-interval, p-iidx, is defined. The structure is a set of triples which store the pointer pointing to the sequence containing p, the time-interval of p in the sequence and the position where p occurs. In the mining process, MIWCSpan first scans the sequence database to read the database into memory. Then it adopts the find-then-index technique recursively to find the items which can constitute a weighted closed sequential pattern and construct p-iidx for the possible weighted closed sequential pattern. At last, the algorithm uses the close-detecting to mine the weighted closed sequential patterns efficiently. The experimental results show that MIWCSpan is better on running time, and it has good scalability.
While the large-scale deformations such as the Laplacian deformation method could not synthesize new expressional details, this paper proposes a method for simulating different subtle facial expressions based on the K...
详细信息
Many previous algorithms in data streams are about single stream, which can only process single items. The algorithms about data streams are always extended by sequential pattern algorithms about static database, they...
详细信息
3-SPS+RRS+PS is a new type of mechanism. There is good application prospect in the field of aerospace. Especially some key kinetic characteristic calculation algorithms are implemented, which makes its calculation mec...
详细信息
In order to reduce the energy consumption and improve the response performance, by introducing the strategies of sleep-delay, pre-awake and split timing to IEEE 802.3az, an enhanced Ethernet energy saving mechanism is...
详细信息
General weighted sequential pattern mining algorithms ignore or do not make good use of the time and time-interval information of data elements. Besides some algorithms require to scan the database many times or build...
详细信息
暂无评论