This work seeks to develop a probabilistic framework for modeling, querying and analyzing large-scale structured and semi-structureddata. The framework has three components: (1) Mining non-redundant local patterns fr...
详细信息
This work seeks to develop a probabilistic framework for modeling, querying and analyzing large-scale structured and semi-structureddata. The framework has three components: (1) Mining non-redundant local patterns from data; (2) Gluing these local patterns together by employing probabilistic models (e.g., Markov random field (MRF), Bayesian network); and (3) Reasoning (making inference) over the data for solving various data analysis tasks. In more detail, our contributions are as follows:@pqdt@break@Mining non-redundant frequent itemset patterns on large transactional data. Often times in many real-world problems frequent pattern mining algorithms yield so many frequent patterns that the end-user is swamped when it comes to interpreting the results. We present an approach of employing probabilistic models to identify non-redundant itemset patterns from a large collection of frequent itemsets on transactional data. We show that our approach can effectively eliminate a large amount of redundancy from a large collection of itemset patterns.@pqdt@break@Employing local probabilistic models to glue non-redundant itemset patterns on large transactional or network data. We propose a technique of employing local probabilistic models to glue non-redundant itemset patterns together in tackling the link prediction task in co-authorship network analysis. The new technique effectively combines topology analysis on network structuredata and frequency analysis on network event log data. The main idea is to consider the co-occurrence probability of two end nodes associated with a candidate link. We propose a method of building MRFs over local data regions to compute this co-occurrence probability. Experimental results demonstrate that the co-occurrence probability inferred from the local probabilistic models is very useful for link prediction.@pqdt@break@Employing global probabilistic models to glue non-redundant itemset patterns on large transactional data. We explore employing
暂无评论