This volume was born from the experience of the authors as researchers and educators,whichsuggeststhatmanystudentsofdataminingarehandicapped in their research by the lack of a formal, systematic education in its mat- ...
详细信息
ISBN:
(数字)9781848002012
This volume was born from the experience of the authors as researchers and educators,whichsuggeststhatmanystudentsofdataminingarehandicapped in their research by the lack of a formal, systematic education in its mat- matics. The data mining literature contains many excellent titles that address the needs of users with a variety of interests ranging from decision making to p- tern investigation in biological data. However, these books do not deal with the mathematical tools that are currently needed by data mining researchers and doctoral students. We felt it timely to produce a book that integrates the mathematics of data mining with its applications. We emphasize that this book is about mathematical tools for data mining and not about data mining itself; despite this, a substantial amount of applications of mathematical c- cepts in data mining are presented. The book is intended as a reference for the working data miner. In our opinion, three areas of mathematics are vital for data mining: set theory,includingpartially orderedsetsandcombinatorics;linear algebra,with its many applications in principal component analysis and neural networks; and probability theory, which plays a foundational role in statistics, machine learning and data mining. Thisvolumeisdedicatedtothestudyofset-theoreticalfoundationsofdata mining. Two further volumes are contemplated that will cover linear algebra and probability theory. The ?rst part of this book, dedicated to set theory, begins with a study of ***- sues as equivalences and partitions are discussed. Also, we prepare the ground for the following volumes by discussing indicator functions, ?elds and?-?elds, and other concepts.
The growth in the amount of data collected and generated has exploded in recent times with the widespread automation of various day-to-day activities, advances in high-level scienti?c and engineering research and the ...
详细信息
ISBN:
(数字)9781846282843
ISBN:
(纸本)9781852339890;9781849969918
The growth in the amount of data collected and generated has exploded in recent times with the widespread automation of various day-to-day activities, advances in high-level scienti?c and engineering research and the development of e?cient data collection tools. This has given rise to the need for automa- callyanalyzingthedatainordertoextractknowledgefromit,therebymaking the data potentially more useful. knowledge discovery and data mining (KDD) is the process of identifying valid, novel, potentially useful and ultimately understandable patterns from massive data repositories. It is a multi-disciplinary topic, drawing from s- eral ?elds including expert systems, machine learning, intelligent databases, knowledge acquisition, case-based reasoning, pattern recognition and stat- tics. Many data mining systems have typically evolved around well-organized database systems (e.g., relational databases) containing relevant information. But, more and more, one ?nds relevant information hidden in unstructured text and in other complex forms. Mining in the domains of the world-wide web, bioinformatics, geoscienti?c data, and spatial and temporal applications comprise some illustrative examples in this regard. Discovery of knowledge, or potentially useful patterns, from such complex data often requires the - plication of advanced techniques that are better able to exploit the nature and representation of the data. Such advanced methods include, among o- ers, graph-based and tree-based approaches to relational learning, sequence mining, link-based classi?cation, Bayesian networks, hidden Markov models, neural networks, kernel-based methods, evolutionary algorithms, rough sets and fuzzy logic, and hybrid systems. Many of these methods are developed in the followingchapters.
The last decade of the 20th century has witnessed a surge of interest in num- ical, computation-intensive approaches to informationprocessing. The lines that draw the boundaries among statistics, optimization, arti c...
详细信息
ISBN:
(数字)9781846281174
ISBN:
(纸本)9781852338664;9781849969376
The last decade of the 20th century has witnessed a surge of interest in num- ical, computation-intensive approaches to informationprocessing. The lines that draw the boundaries among statistics, optimization, arti cial intelligence and informationprocessing are disappearing, and it is not uncommon to nd well-founded and sophisticated mathematical approaches in application - mains traditionally associated with ad-hoc programming. Heuristics has - come a branch of optimization and statistics. Clustering is applied to analyze soft data and to provide fast indexing in the World Wide Web. Non-trivial matrix algebra is at the heart of the last advances in computer vision. The breakthrough impulse was, apparently, due to the rise of the interest in arti cial neural networks, after its rediscovery in the late 1980s. Disguised as ANN, numerical and statistical methods made an appearance in the - formation processing scene, and others followed. A key component in many intelligent computational processing is the search for an optimal value of some function. Sometimes, this function is not evident and it must be made explicit in order to formulate the problem as an optimization problem. The search - ten takes place in high-dimensional spaces that can be either discrete, or c- tinuous or mixed. The shape of the high-dimensional surface that corresponds to the optimized function is usually very complex. Evolutionary algorithms are increasingly being applied to informationprocessing applications that require any kind of optimization.
1. 1 TwoFundamentalQuestions There are two fundamental questions that should be answered before buying, and even more before reading, a book: • Why should one read the book? • What is the book about? This is the reaso...
详细信息
1. 1 TwoFundamentalQuestions There are two fundamental questions that should be answered before buying, and even more before reading, a book: • Why should one read the book? • What is the book about? This is the reason why this section, the ?rst of the whole text, proposes some motivations for potential readers (Section 1. 1. 1) and an overall description of the content (Section 1. 1. 2). If the answers are convincing, further information can be found in the rest of this chapter: Section 1. 2 shows in detail the str- ture of the book, Section 1. 3 presents some features that can help the reader to better move through the text, and Section 1. 4 provides some reading tracks targeting speci?c topics. 1. 1. 1 Why Should One Read The Book? One of the most interesting technological phenomena in recent years is the di?usion of consumer electronic products with constantly increasing acqui- tion, storage and processing power. As an example, consider the evolution of digital cameras: the ?rst models available in the market in the early nineties produced images composed of 1. 6 million pixels (this is the meaning of the expression 1. 6 megapixels), carried an onboard memory of 16 megabytes, and had an average cost higher than 10,000 U. S. dollars. At the time this book is being written, the best models are close to or even above 8 megapixels, have internal memories of one gigabyte and they cost around 1,000 U. S. dollars.
RDF-based knowledge graphs require additional formalisms to be fully context-aware, which is presented in this book. This book also provides a collection of provenance techniques and state-of-the-art metadata-enhanced...
详细信息
ISBN:
(数字)9783030676810
ISBN:
(纸本)9783030676803;9783030676834
RDF-based knowledge graphs require additional formalisms to be fully context-aware, which is presented in this book. This book also provides a collection of provenance techniques and state-of-the-art metadata-enhanced, provenance-aware, knowledge graph-based representations across multiple application domains, in order to demonstrate how to combine graph-based data models and provenance representations. This is important to make statements authoritative, verifiable, and reproducible, such as in biomedical, pharmaceutical, and cybersecurity applications, where the data source and generator can be just as important as the data itself.;Capturing provenance is critical to ensure sound experimental results and rigorously designed research studies for patient and drug safety, pathology reports, and medical evidence generation. Similarly, provenance is needed for cyberthreat intelligence dashboards and attack mapsthat aggregate and/or fuse heterogeneous data from disparate data sources to differentiate between unimportant online events and dangerous cyberattacks, which is demonstrated in this book. Without provenance, data reliability and trustworthiness might be limited, causing data reuse, trust, reproducibility and accountability issues.;This book primarily targets researchers who utilize knowledge graphs in their methods and approaches (this includes researchers from a variety of domains, such as cybersecurity, eHealth, data science, Semantic Web, etc.). This book collects core facts for the state of the art in provenance approaches and techniques, complemented by a critical review of existing approaches. New research directions are also provided that combine data science and knowledge graphs, for an increasingly important research topic.
Focuses on the process by which manually crafting interactive, hypertextual maps clarifies one’s own understanding, communicates it to others, and enables collective intelligence. The authors see mapping software as ...
详细信息
ISBN:
(数字)9781447164708
ISBN:
(纸本)9781447164692;9781447171355
Focuses on the process by which manually crafting interactive, hypertextual maps clarifies one’s own understanding, communicates it to others, and enables collective intelligence. The authors see mapping software as visual tools for reading and writing in a networked age. In an information ocean, the challenge is to find meaningful patterns around which we can weave plausible narratives. Maps of concepts, discussions and arguments make the connections between ideas tangible - and critically, disputable. With 22 chapters from leading researchers and practitioners (5 of them new for this edition), the reader will find the current state-of-the-art in the field. Part 1 focuses on knowledge maps for learning and teaching in schools and universities, before Part 2 turns to knowledge maps for information analysis and knowledge management in professional communities, but with many cross-cutting themes: · reflective practitioners documenting the most effective ways to map · conceptual frameworks for evaluating representations · real world case studies showing added value for professionals · more experimental case studies from research and education · visual languages, many of which work on both paper and with software · knowledge cartography software, much of it freely available and open source · visit the companion website for extra resources: ***/knowledge-cartography knowledge Cartography will be of interest to learners, educators, and researchers in all disciplines, as well as policy analysts, scenario planners, knowledge managers and team facilitators. Practitioners will find new perspectives and tools to expand their repertoire, while researchers will find rich enough conceptual grounding for further scholarship.
This book is the first work that systematically describes the procedure of data mining and knowledge discovery on Bioinformatics databases by using the state-of-the-art hierarchical feature selection algorithms. The n...
详细信息
ISBN:
(数字)9783319979199
ISBN:
(纸本)9783319979182
This book is the first work that systematically describes the procedure of data mining and knowledge discovery on Bioinformatics databases by using the state-of-the-art hierarchical feature selection algorithms. The novelties of this book are three-fold. To begin with, this book discusses the hierarchical feature selection in depth, which is generally a novel research area in Data Mining/Machine Learning. Seven different state-of-the-art hierarchical feature selection algorithms are discussed and evaluated by working with four types of interpretable classification algorithms (i.e. three types of Bayesian network classification algorithms and the k-nearest neighbours classification algorithm). Moreover, this book discusses the application of those hierarchical feature selection algorithms on the well-known Gene Ontology database, where the entries (terms) are hierarchically structured. Gene Ontology database that unifies the representations of gene and gene products annotation providesthe resource for mining valuable knowledge about certain biological research topics, such as the Biology of Ageing. Furthermore, this book discusses the mined biological patterns by the hierarchical feature selection algorithms relevant to the ageing-associated genes. Those patterns reveal the potential ageing-associated factors that inspire future research directions for the Biology of Ageing research.
暂无评论