检索结果-内蒙古大学图书馆

Sequential pattern mining in multi-databases via multiple alignment

data mining AND KNOWLEDGE DISCOVERY 2006年第2-3期12卷 151-180页

作者： Kum, HC Chang, JH Wang, W Univ N Carolina Dept Comp Sci Chapel Hill NC 27514 USA Yonsei Univ Dept Comp Sci Seoul 120749 South Korea

To efficiently find global patterns from a multi-database, information in each local database must first be mined and summarized at the local level. Then only the summarized information is forwarded to the global mining process. However, conventional sequential pattern mining methods based on support cannot summarize the local information and is ineffective for global pattern mining from multiple data sources. In this paper, we present an alternative local mining approach for finding sequential patterns in the local databases of a multi-database. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. Approximate sequential patterns can effectively summerize and represent the local databases by identifying the underlying trends in the data. We present a novel algorithm, ApproxMAP, to mine approximate sequential patterns, called consensus patterns, from large sequence databases in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. We conduct an extensive and systematic performance study over synthetic and real data. The results demonstrate that ApproxMAP is effective and scalable in mining large sequences databases with long patterns. Hence, (A)pproxMAP can efficiently summarize a local database and reduce the cost for global mining. Furthremore, we present an elegant and uniform model to identify both high vote sequential patterns and exceptional sequential patterns from the collection of these consensus patterns from each local databases.

关键词： data mining algorithm sequential patterns approximate sequential pattern mining local pattern global sequential pattern multiple alignment

来源：评论

学校读者我要写书评

暂无评论

A novel data mining method based on ant colony algorithm

A novel data mining method based on ant colony algorithm

引用

1st International Conference on Advanced data mining and Applications

作者： Jiang, WJ Xu, YS Xu, YH Hunan Univ Ind Dept Comp Zhuzhou 412008 Peoples R China Beijing Univ Technol Coll Mech Engn & Appl Elect Beijing 100022 Peoples R China

ISBN: (纸本)354027894X

data mining has become of great importance owing to ever-increasing amounts of data collected by large organizations. This paper propose an data mining algorithm called Ant-Miner(I), which is based on an improvement of Ant Colony System(ACS) algorithm. Experimental results show that Ant-Miner(l) has a higher predictive accuracy and much smaller rule list than the original Ant-Miner algorithm.

关键词： data mining algorithm Ant-Miner(I) ant colony System(ACS) algorithm

来源：评论

学校读者我要写书评

暂无评论

data mining for inventory item selection with cross-selling considerations

引用

data mining AND KNOWLEDGE DISCOVERY 2005年第1期11卷 81-112页

作者： Wong, RCW Fu, AWC Wang, K Chinese Univ Hong Kong Dept Comp Sci & Engn Sha Tin 100083 Peoples R China Simon Fraser Univ Dept Comp Sci Burnaby BC V5A 1S6 Canada

Association rule mining, studied for over ten years in the literature of data mining, aims to help enterprises with sophisticated decision making, but the resulting rules typically cannot be directly applied and require further processing. In this paper, we propose a method for actionable recommendations from itemset analysis and investigate an application of the concepts of association rules-maximal-profit item selection with cross-selling effect (MPIS). This problem is about choosing a subset of items which can give the maximal profit with the consideration of cross-selling effect. A simple approach to this problem is shown to be NP-hard. A new approach is proposed with consideration of the loss rule-a rule similar to the association rule-to model the cross-selling effect. We show that MPIS can be approximated by a quadratic programming problem. We also propose a greedy approach and a genetic algorithm to deal with this problem. Experiments are conducted, which show that our proposed approaches are highly effective and efficient.

关键词： data mining algorithm cross-selling item selection association rule quadratic programming genetic algorithm

来源：评论

学校读者我要写书评

暂无评论

mining sequential patterns by pattern-growth: The PrefixSpan approach

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND data ENGINEERING 2004年第11期16卷 1424-1440页

作者： Pei, J Han, JW Mortazavi-Asl, B Wang, JY Pinto, H Chen, QM Dayal, U Hsu, MC Simon Fraser Univ Sch Comp Sci Burnaby BC V5A 1S6 Canada Univ Illinois Dept Comp Sci Urbana IL 61801 USA Univ Minnesota Minneapolis MN 55455 USA Simon Fraser Univ Sch Comp Sci Burnaby BC V5A 1S6 Canada Packetmot Inc San Mateo CA 94403 USA Hewlett Packard Labs Palo Alto CA 94303 USA Commerce One Inc San Francisco CA 94105 USA

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] ( a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

关键词： data mining algorithm sequential pattern frequent pattern transaction database sequence database scalability performance analysis

来源：评论

学校读者我要写书评

暂无评论

Discovering local patterns from multiple temporal sequences

引用

1st EurAsian Conference on Advances in Information and Communication Technology (EurAsia-ICT 2002)

作者： Jin, XM Lu, YC Shi, CY Tsing Hua Univ Comp Sci & Technol Dept Natl Key Lab Intelligent Technol & Syst Beijing 100084 Peoples R China

ISBN: (纸本)3540000283

In this paper, we address a data-mining problem that is the discovery of local sequential patterns from a set of long sequences. Each local sequential pattern is represented by a pattern A-->B and a time period in which A-->B is frequent. Such patterns are actually very common in practice and are potentially very useful. However it is impractical to use traditional methods on this problem directly. We propose a suffix-tree-like data structure for indexing the instances of the patterns. Based on this index, our mining method can discover all locally frequent patterns after one scan of the sequences. We have analyzed the behavior of the problem and evaluated the performance of our algorithm with both synthetic and real data. The results correspond with the definition of the problem and verify the superiority of our approach.

关键词： local sequential pattern temporal sequence data mining algorithm

来源：评论

学校读者我要写书评

暂无评论

A representing model of rule distribution in temporal sequence

A representing model of rule distribution in temporal sequen...

引用

4th World Congress on Intelligent Control and Automation

作者： Jin, XM Lu, YC Shi, CY Tsing Hua Univ Dept Comp Sci & Technol State Key Lab Intelligent Technol & Syst Beijing 100084 Peoples R China

ISBN: (纸本)0780372689

In recent years, there has been a lot of interest in using data mining techniques to extract rules from temporal sequences in various applications. Previous work on rule discovery mainly considered global pattern behaviors. In this paper, we consider the rules of which the frequency is large only in a subsequence of the original sequence. To facilitate the discovery of rule distribution, we present a representing model, which is to segment the sequence into a set of continuous subsequence, in which there exists a rule set that appears frequent. We present the definition of local rule and our model in this paper, together with relating methods. We have analyzed the behavior of the problem and our algorithms with both synthetic and real data. The results correspond with the definition of our problem and reveal a kind of novel knowledge.

关键词： data mining algorithm temporal sequence local rule representing model

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：