This book presents an overview of techniques for discovering high-utility patterns (patterns with a high importance) in data. It introduces the main types of high-utility patterns, as well as the theory and core algor...
ISBN:
(数字)9783030049218
ISBN:
(纸本)9783030049201
This book presents an overview of techniques for discovering high-utility patterns (patterns with a high importance) in data. It introduces the main types of high-utility patterns, as well as the theory and core algorithms for high-utility pattern mining, and describes recent advances, applications, open-source software, and research opportunities. It also discusses several types of discrete data, including customer transaction data and sequential data. The book consists of twelve chapters, seven of which are surveys presenting the main subfields of high-utility pattern mining, including itemset mining, sequential pattern mining, bigdata pattern mining, metaheuristic-based approaches, privacy-preserving pattern mining, and pattern visualization. The remaining five chapters describe key techniques and applications, such as discovering concise representations and regular patterns.
This book presents a unified framework, based on specialized evolutionary algorithms, for the global induction of various types of classification and regression trees from data. The resulting univariate or oblique tre...
详细信息
ISBN:
(数字)9783030218515
ISBN:
(纸本)9783030218508
This book presents a unified framework, based on specialized evolutionary algorithms, for the global induction of various types of classification and regression trees from data. The resulting univariate or oblique trees are significantly smaller than those produced by standard top-down methods, an aspect that is critical for the interpretation of mined patterns by domain analysts. The approach presented here is extremely flexible and can easily be adapted to specific data mining applications, e.g. cost-sensitive model trees for financial data or multi-test trees for gene expression data. The global induction can be efficiently applied to large-scale data without the need for extraordinary resources. With a simple GPU-based acceleration, datasets composed of millions of instances can be mined in minutes. In the event that the size of the datasets makes the fastest memory computing impossible, the Spark-based implementation on computer clusters, which offers impressive fault tolerance and scalability potential, can be applied.
暂无评论