data usually comes in a plethora of formats and dimensions, rendering the exploration and information extraction processes challenging. Thus, being able to perform exploratory analyses in the data with the intent of h...
详细信息
ISBN:
(数字)9783031018664
ISBN:
(纸本)9783031007385
data usually comes in a plethora of formats and dimensions, rendering the exploration and information extraction processes challenging. Thus, being able to perform exploratory analyses in the data with the intent of having an immediate glimpse on some of the data properties is becoming crucial. Exploratory analyses should be simple enough to avoid complicate declarative languages (such as SQL) and mechanisms, and at the same time retain the flexibility and expressiveness of such languages. Recently, we have witnessed a rediscovery of the so-called example-based methods, in which the user, or the analyst, circumvents query languages by using examples as input. An example is a representative of the intended results, or in other words, an item from the result set. Example-based methods exploit inherent characteristics of the data to infer the results that the user has in mind, but may not able to (easily) express. They can be useful in cases where a user is looking for information in an unfamiliar dataset, when the task is particularly challenging like finding duplicate items, or simply when they are exploring the data. In this book, we present an excursus over the main methods for exploratory analysis, with a particular focus on example-based methods. We show how that different data types require different techniques, and present algorithms that are specifically designed for relational, textual, and graph data. The book presents also the challenges and the new frontiers of machine learning in online settings which recently attracted the attention of the database community. The lecture concludes with a vision for further research and applications in this area.
Cloud computing has emerged as a successful paradigm of service-oriented computing and has revolutionized the way computing infrastructure is used. This success has seen a proliferation in the number of applications t...
详细信息
ISBN:
(数字)9783031018954
ISBN:
(纸本)9783031007675
Cloud computing has emerged as a successful paradigm of service-oriented computing and has revolutionized the way computing infrastructure is used. This success has seen a proliferation in the number of applications that are being deployed in various cloud platforms. There has also been an increase in the scale of the data generated as well as consumed by such applications. Scalable database management systems form a critical part of the cloud infrastructure. The attempt to address the challenges posed by the management of big data has led to a plethora of systems. This book aims to clarify some of the important concepts in the design space of scalable datamanagement in cloud computing infrastructures. Some of the questions that this book aims to answer are: the appropriate systems for a specific set of application requirements, the research challenges in datamanagement for the cloud, and what is novel in the cloud for database researchers? We also aim to address one basic question:whether cloud computing poses new challenges in scalable datamanagement or it is just a reincarnation of old problems? We provide a comprehensive background study of state-of-the-art systems for scalable datamanagement and analysis. We also identify important aspects in the design of different systems and the applicability and scope of these systems. A thorough understanding of current solutions and a precise characterization of the design space are essential for clearing the "cloudy skies of datamanagement" and ensuring the success of DBMSs in the cloud, thus emulating the success enjoyed by relational databases in traditional enterprise settings. Table of Contents: Introduction / Distributed datamanagement / Cloud datamanagement: Early Trends / Transactions on Co-located data / Transactions on Distributed data / Multi-tenant database Systems / Concluding Remarks
Roughly a decade ago, power consumption and heat dissipation concerns forced the semiconductor industry to radically change its course, shifting from sequential to parallel computing. Unfortunately, improving performa...
详细信息
ISBN:
(数字)9783031018497
ISBN:
(纸本)9783031007217
Roughly a decade ago, power consumption and heat dissipation concerns forced the semiconductor industry to radically change its course, shifting from sequential to parallel computing. Unfortunately, improving performance of applications has now become much more difficult than in the good old days of frequency scaling. This is also affecting databases and data processing applications in general, and has led to the popularity of so-called data appliances—specialized data processing engines, where software and hardware are sold together in a closed box. Field-programmable gate arrays (FPGAs) increasingly play an important role in such systems. FPGAs are attractive because the performance gains of specialized hardware can be significant, while power consumption is much less than that of commodity processors. On the other hand, FPGAs are way more flexible than hard-wired circuits (ASICs) and can be integrated into complex systems in many different ways, e.g., directly in the network for a high-frequency trading application. This book gives an introduction to FPGA technology targeted at a database audience. In the first few chapters, we explain in detail the inner workings of FPGAs. Then we discuss techniques and design patterns that help mapping algorithms to FPGA hardware so that the inherent parallelism of these devices can be leveraged in an optimal way. Finally, the book will illustrate a number of concrete examples that exploit different advantages of FPGAs for data processing. Table of Contents: Preface / Introduction / A Primer in Hardware Design / FPGAs / FPGA Programming Models / data Stream Processing / Accelerated DB Operators / Secure data Processing / Conclusions / Bibliography / Authors' Biographies / Index
Privacy preservation has become a major issue in many data analysis applications. When a data set is released to other parties for data analysis, privacy-preserving techniques are often required to reduce the possibil...
详细信息
ISBN:
(数字)9783031018343
ISBN:
(纸本)9783031007064
Privacy preservation has become a major issue in many data analysis applications. When a data set is released to other parties for data analysis, privacy-preserving techniques are often required to reduce the possibility of identifying sensitive information about individuals. For example, in medical data, sensitive information can be the fact that a particular patient suffers from HIV. In spatial data, sensitive information can be a specific location of an individual. In web surfing data, the information that a user browses certain websites may be considered sensitive. Consider a dataset containing some sensitive information is to be released to the public. In order to protect sensitive information, the simplest solution is not to disclose the information. However, this would be an overkill since it will hinder the process of data analysis over the data from which we can find interesting patterns. Moreover, in some applications, the data must be disclosed under the government regulations. Alternatively, the data owner can first modify the data such that the modified data can guarantee privacy and, at the same time, the modified data retains sufficient utility and can be released to other parties safely. This process is usually called as privacy-preserving data publishing. In this monograph, we study how the data owner can modify the data and how the modified data can preserve privacy and protect sensitive information. Table of Contents: Introduction / Fundamental Concepts / One-Time data Publishing / Multiple-Time data Publishing / Graph data / Other data Types / Future Research Directions
This book presents a comprehensive overview of Natural Language Interfaces to databases (NLIDBs), an indispensable tool in the ever-expanding realm of data-driven exploration and decision making. After first demonstra...
详细信息
ISBN:
(数字)9783031450433
ISBN:
(纸本)9783031450426;9783031450457
This book presents a comprehensive overview of Natural Language Interfaces to databases (NLIDBs), an indispensable tool in the ever-expanding realm of data-driven exploration and decision making. After first demonstrating the importance of the field using an interactive ChatGPT session, the book explores the remarkable progress and general challenges faced with real-world deployment of NLIDBs. It goes on to provide readers with a holistic understanding of the intricate anatomy, essential components, and mechanisms underlying NLIDBs and how to build them. Key concepts in representing, querying, and processing structured data as well as approaches for optimizing user queries are established for the reader before their application in NLIDBs is explored. The book discusses text to data through early relevant work on semantic parsing and meaning representation before turning to cutting-edge advancements in how NLIDBs are empowered to comprehend and interpret human languages. Various evaluation methodologies, metrics, datasets and benchmarks that play a pivotal role in assessing the effectiveness of mapping natural language queries to formal queries in a database and the overall performance of a system are explored. The book then covers data to text, where formal representations of structured data are transformed into coherent and contextually relevant human-readable narratives. It closes with an exploration of the challenges and opportunities related to interactivity and its corresponding techniques for each dimension, such as instances of conversational NLIDBs and multi-modal NLIDBs where user input is beyond natural language. This book provides a balanced mixture of theoretical insights, practical knowledge, and real-world applications that will be an invaluable resource for researchers, practitioners, and students eager to explore the fundamental concepts of NLIDBs.
The use of logic in databases started in the late 1960s. In the early 1970s Codd formalized databases in terms of the relational calculus and the relational algebra. A major influence on the use of logic in databases ...
详细信息
ISBN:
(数字)9783031018541
ISBN:
(纸本)9783031007262
The use of logic in databases started in the late 1960s. In the early 1970s Codd formalized databases in terms of the relational calculus and the relational algebra. A major influence on the use of logic in databases was the development of the field of logic programming. Logic provides a convenient formalism for studying classical database problems and has the important property of being declarative, that is, it allows one to express what she wants rather than how to get it. For a long time, relational calculus and algebra were considered the relational database languages. However, there are simple operations, such as computing the transitive closure of a graph, which cannot be expressed with these languages. datalog is a declarative query language for relational databases based on the logic programming paradigm. One of the peculiarities that distinguishes datalog from query languages like relational algebra and calculus is recursion, which gives datalog the capability to express queries like computing a graph transitive closure. Recent years have witnessed a revival of interest in datalog in a variety of emerging application domains such as data integration, information extraction, networking, program analysis, security, cloud computing, ontology reasoning, and many others. The aim of this book is to present the basics of datalog, some of its extensions, and recent applications to different domains.
In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the servers of the publisher may be untrusted or susceptible to attacks, we cannot assume that they would alwa...
详细信息
ISBN:
(数字)9783031018879
ISBN:
(纸本)9783031007590
In data publishing, the owner delegates the role of satisfying user queries to a third-party publisher. As the servers of the publisher may be untrusted or susceptible to attacks, we cannot assume that they would always process queries correctly, hence there is a need for users to authenticate their query answers. This book introduces various notions that the research community has studied for defining the correctness of a query answer. In particular, it is important to guarantee the completeness, authenticity and minimality of the answer, as well as its freshness. We present authentication mechanisms for a wide variety of queries in the context of relational and spatial databases, text retrieval, and data streams. We also explain the cryptographic protocols from which the authentication mechanisms derive their security properties. Table of Contents: Introduction / Cryptography Foundation / Relational Queries / Spatial Queries / Text Search Queries / data Streams / Conclusion
暂无评论