In recent years, an increasing number of works have been reported that modeling data as multi-graphs more efficiently solves some practical problems. In these contexts, data mining techniques could be useful for disco...
详细信息
In recent years, an increasing number of works have been reported that modeling data as multi-graphs more efficiently solves some practical problems. In these contexts, data mining techniques could be useful for discovering patterns, helping to solve more complex tasks like classification. Additionally, in real world applications, it is very useful to mine patterns allowing approximate matching between graphs. However, to the best of our knowledge, in literature there is only one method that allows mining these types of patterns in multi-graph collections. This method does not compute the approximate patterns directly from the multi-graph collections, which makes it inefficient. In this paper, an algorithm for directly mining patterns in multi-graph collections in a more efficient way than the only alternative reported in the literature is proposed. Our algorithm, introduces an extension of a canonical form based on depth-first search, which allows representing multi-graphs. Experiments on different public standard databases are carried out to demonstrate the performance of the proposed algorithm. The algorithm is compared with the only alternative reported in the literature for mining patterns in multi-graph collections. Note that the new algorithm and the referenced algorithm [N. Acosta-Mendoza, J.A. Carrasco-Ochoa, J.F. Martinez Trinidad, A. Gago-Alonso, and J.E. Medina-Pagola. A New Method Based on Graph Transformation for FAS Mining in Multi-graph Collections. In The 7th Mexican Conference on Pattern Recognition (MCPR'2015), Pattern Recognition, volume LNCS 9116, pages 13-22. Springer, 2015.] produce the same results but the new algorithm is more efficient. (C) 2016 Elsevier B.V. All rights reserved.
Data mining in structured and semi-structured data focuses on frequent data values. However, in graph data mining, the focus is on common specific topologies. Graph mining, although its ubiquity, is a difficult task s...
详细信息
Data mining in structured and semi-structured data focuses on frequent data values. However, in graph data mining, the focus is on common specific topologies. Graph mining, although its ubiquity, is a difficult task since it requires subgraph isomorphism which is known to be NP-complete. In order to effectively prune the search space and thereby save computational time, a graph mining algorithm requires that the support measure of a pattern to be no greater than that of its subpatterns. This property of the support measure is referred to in the literature as the down-closure, anti-monotonicity or admissibility. Unfortunately, when mining a single labeled graph, simply counting the occurrences of a graph pattern may not have the down-closure property. For this, most existing approaches mine frequent substructures in a set of labeled graphs (called also the transactional setting) and few efforts have been devoted to mining frequent globally distributed substructures in a single labeled graph. In this paper, we propose a graph mining algorithm, called NODAR(Non-Overlapping embeDding based grAph mineR), for computing common and globally distributed substructures in a single labeled graph. NODAR adopts the Depth-First Search (dfs) strategy and is based on the SMNOES (Size of Maximum Non Overlapping Embedding Set) as support measure. The core idea of NODAR is to automatically extract frequent subpatterns;and thus without frequency computation thanks to the down-closure property of SMNOES. By adopting this strategy in the computation of frequent substructures, NODAR reduces the number of subgraph isomorphism tests needed to compute pattern frequencies. Experimental results on monograph and transactional graph databases;and comparison with well-known probabilistic and exact algorithms;prove the efficacy of NODAR.
The need for mining structured data has Increased In the past few years. One of the best studied data structures In computer science and discrete mathematics are graphs. Graph based data raining has become quite popul...
详细信息
ISBN:
(纸本)9780769534893
The need for mining structured data has Increased In the past few years. One of the best studied data structures In computer science and discrete mathematics are graphs. Graph based data raining has become quite popular in the last few years. In this paper author presented Metagraph based data mining as a new approach In the field of traditional graph based mining. Metagraph Is a new graph theoretic construct having set-to-set mapping In place of node to node as In conventional graph structure. We Investigate new approaches for frequent Metagraph-based pattern mining In Metagraph datasets. We propose an algorithm for metagraph graph-based Substructure pattern mining which discovers frequent substructures without candidate generation. We apply a new lexicographic order for Metagraphs, and map each Sub metagraph to a unique minimum dfs code as Its canonical label. Based on this lexicographic order. We develop an algorithm which adapts the depth-first search strategy to mine frequent connected submetagraph efficiently.
Of late there has been considerable interest in the efficient and effective storage of large-scale network graphs, such as those within the domains of social networks, web and virtual communities. The representation o...
详细信息
Of late there has been considerable interest in the efficient and effective storage of large-scale network graphs, such as those within the domains of social networks, web and virtual communities. The representation of these data graphs is a complex and challenging task and arises as a result of the inherent structural and dynamic properties of a community network, whereby naturally occurring churn can severely affect the ability to optimize the network structure. Since the organization of the network will change over time, we consider how an established method for storing large data graphs (K-2 tree) can be augmented and then utilized as an indicator of the relative maturity of a community network. Within this context, we present an algorithm and a series of experimental results upon both real and simulated networks, illustrating that the compression effectiveness reduces as the community network structure becomes more dynamic. It is for this reason we highlight a notable opportunity to explore the relevance between the K-2 tree optimization factor with the maturity level of the network community concerned. (C) 2011 Elsevier Ltd. All rights reserved.
Support calculation and duplicate detection are the most challenging and unavoidable subtasks in frequent connected subgraph (FCS) mining. The most successful FCS mining algorithms have focused on optimizing these sub...
详细信息
Support calculation and duplicate detection are the most challenging and unavoidable subtasks in frequent connected subgraph (FCS) mining. The most successful FCS mining algorithms have focused on optimizing these subtasks since the existing solutions for both subtasks have high computational complexity. In this paper, we propose two novel properties that allow removing all duplicate candidates before support calculation. Besides, we introduce a fast support calculation strategy based on embedding structures. Both properties and the new embedding structure are used for designing two new algorithms: gdFil for mining all FCSs;and gdClosed for mining all closed FCSs. The experimental results show that our proposed algorithms get the best performance in comparison with other well known algorithms.
暂无评论