This book examines the recent trend of extending data dependencies to adapt to rich data types in order to address variety and veracity issues in big data. Readers will be guided through the full range of rich data ty...
详细信息
ISBN:
(数字)9783031271779
ISBN:
(纸本)9783031271762;9783031271793
This book examines the recent trend of extending data dependencies to adapt to rich data types in order to address variety and veracity issues in big data. Readers will be guided through the full range of rich data types where data dependencies have been successfully applied, including categorical data with equality relationships, heterogeneous data with similarity relationships, numerical data with order relationships, sequential data with timestamps, and graph data with complicated structures. The text will also discuss interesting constraints on ordering or similarity relationships contained in novel classes of data dependencies in addition to those in equality relationships, e.g., considered in functional dependencies (FDs). In addition to exploring the concepts of these data dependency notations, the book investigates the extension relationships between data dependencies, such as conditional functional dependencies (CFDs) that extend conventional functional dependencies (FDs). This forms in the book a family tree of extensions, mostly rooted in FDs, that help illuminate the expressive power of various data dependencies. Moreover, the book points to work on the discovery of dependencies from data, since data dependencies are often unlikely to be manually specified in a traditional way, given the huge volume and high variety in big data. It further outlines the applications of the extended data dependencies, in particular in data quality practice. Altogether, this book provides a comprehensive guide for readers to select proper data dependencies for their applications that have sufficient expressive power and reasonable discovery cost. Finally, the book concludes with several directions of future studies on emerging data.
One of the application areas of data mining is the World Wide Web (WWW or Web), which serves as a huge, widely distributed, global information service for every kind of information such as news, advertisements, consum...
详细信息
ISBN:
(数字)9783031018428
ISBN:
(纸本)9783031007149
One of the application areas of data mining is the World Wide Web (WWW or Web), which serves as a huge, widely distributed, global information service for every kind of information such as news, advertisements, consumer information, financial management, education, government, e-commerce, health services, and many other information services. The Web also contains a rich and dynamic collection of hyperlink information, Web page access and usage information, providing sources for data mining. The amount of information on the Web is growing rapidly, as well as the number of Web sites and Web pages per Web site. Consequently, it has become more difficult to find relevant and useful information for Web users. Web usage mining is concerned with guiding the Web users to discover useful knowledge and supporting them for decision-making. In that context, predicting the needs of a Web user as she visits Web sites has gained importance. The requirement for predicting user needs in order to guidethe user in a Web site and improve the usability of the Web site can be addressed by recommending pages to the user that are related to the interest of the user at that time. This monograph gives an overview of the research in the area of discovering and modeling the users' interest in order to recommend related Web pages. The Web page recommender systems studied in this monograph are categorized according to the data mining algorithms they use for recommendation. Table of Contents: Introduction to Web Page Recommender Systems / Preprocessing for Web Page Recommender Models / Pattern Extraction / Evaluation Metrics
data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some...
详细信息
ISBN:
(数字)9783031018978
ISBN:
(纸本)9783031007699
data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.
On the Web, a massive amount of user-generated content is available through various channels (e.g., texts, tweets, Web tables, databases, multimedia-sharing platforms, etc.). Conflicting information, rumors, erroneous...
详细信息
ISBN:
(数字)9783031018558
ISBN:
(纸本)9783031007279
On the Web, a massive amount of user-generated content is available through various channels (e.g., texts, tweets, Web tables, databases, multimedia-sharing platforms, etc.). Conflicting information, rumors, erroneous and fake content can be easily spread across multiple sources, making it hard to distinguish between what is true and what is not. This book gives an overview of fundamental issues and recent contributions for ascertaining the veracity of data in the era of Big data. The text is organized into six chapters, focusing on structured data extracted from texts. Chapter 1 introduces the problem of ascertaining the veracity of data in a multi-source and evolving context. Issues related to information extraction are presented in Chapter 2. Current truth discovery computation algorithms are presented in details in Chapter 3. It is followed by practical techniques for evaluating data source reputation and authoritativeness in Chapter 4. The theoretical foundations and various approaches for modeling diffusion phenomenon of misinformation spreading in networked systems are studied in Chapter 5. Finally, truth discovery computation from extracted data in a dynamic context of misinformation propagation raises interesting challenges that are explored in Chapter 6. This text is intended for a seminar course at the graduate level. It is also to serve as a useful resource for researchers and practitioners who are interested in the study of fact-checking, truth discovery, or rumor spreading.
This book offers a comprehensive resource on Solid-State Drives (SSD) as the field undergoes a radical evolution characterized by the incredible variety of SSD forms and their rapid diversification. It proposes a new ...
详细信息
ISBN:
(数字)9783031578779
ISBN:
(纸本)9783031578762
This book offers a comprehensive resource on Solid-State Drives (SSD) as the field undergoes a radical evolution characterized by the incredible variety of SSD forms and their rapid diversification. It proposes a new classification system to help readers navigate the SSD landscape. For years, the evolution of SSDs was obscured by the unchanging abstractions of block devices and POSIX I/O, but it is apparent that these abstractions have become a problematic hinderance to performance and also fail to reduce software complexity. The book explores how such a state of affairs impacts the database community in at least two ways. First, it considers how using SSDs through legacy interfaces that hide internal mechanisms invariably results in erratic performance. While the blame often goes to the notoriously expensive garbage collection of SSDs, the authors argue that in truth, several other complex processes result in nonlinear effects on latency and bandwidth. The book describes these processes and how they are implemented in modern devices, knowledge that will help system designers better choose SSDs and shape database workloads to match their performance characteristics. Second, the book explores how the inadequacy of the traditional I/O abstractions opens up an entire research field focused on the co-design of database management systems and SSD. Such research aims at devising mechanisms and policies coupling the storage manager of database and SSD internals, e.g., placing an SSD FTL under the control of database, changing SSD sub-systems in response to the workload, or executing logic within an SSD on a database’s behalf. The book introduces these principles of DBMS/SSD co-design and argues that a more seamless integration of databases and storage solutions as well as the study of SSD variations adapted to database computations are central to the development of the next generation of database systems.
Access control is one of the fundamental services that any datamanagement System should provide. Its main goal is to protect data from unauthorized read and write operations. This is particularly crucial in today'...
详细信息
ISBN:
(数字)9783031018367
ISBN:
(纸本)9783031007088
Access control is one of the fundamental services that any datamanagement System should provide. Its main goal is to protect data from unauthorized read and write operations. This is particularly crucial in today's open and interconnected world, where each kind of information can be easily made available to a huge user population, and where a damage or misuse of data may have unpredictable consequences that go beyond the boundaries where data reside or have been generated. This book provides an overview of the various developments in access control for datamanagement systems. Discretionary, mandatory, and role-based access control will be discussed, by surveying the most relevant proposals and analyzing the benefits and drawbacks of each paradigm in view of the requirements of different application domains. Access control mechanisms provided by commercial datamanagement Systems are presented and discussed. Finally, the last part of the book is devoted to discussion of some of the most challenging and innovative research trends in the area of access control, such as those related to the Web 2.0 revolution or to the database as a Service paradigm. This book is a valuable reference for an heterogeneous audience. It can be used as either an extended survey for people who are interested in access control or as a reference book for senior undergraduate or graduate courses in data security with a special focus on access control. It is also useful for technologists, researchers, managers, and developers who want to know more about access control and related emerging trends. Table of Contents: Access Control: Basic Concepts / Discretionary Access Control for Relational datamanagement Systems / Discretionary Access Control for Advanced data Models / Mandatory Access Control / Role-based Access Control / Emerging Trends in Access Control
The present book's subject is multidimensional data models and data modeling concepts as they are applied in real data warehouses. The book aims to present the most important concepts within this subject in a prec...
详细信息
ISBN:
(数字)9783031018411
ISBN:
(纸本)9783031007132
The present book's subject is multidimensional data models and data modeling concepts as they are applied in real data warehouses. The book aims to present the most important concepts within this subject in a precise and understandable manner. The book's coverage of fundamental concepts includes data cubes and their elements, such as dimensions, facts, and measures and their representation in a relational setting; it includes architecture-related concepts; and it includes the querying of multidimensional databases. The book also covers advanced multidimensional concepts that are considered to be particularly important. This coverage includes advanced dimension-related concepts such as slowly changing dimensions, degenerate and junk dimensions, outriggers, parent-child hierarchies, and unbalanced, non-covering, and non-strict hierarchies. The book offers a principled overview of key implementation techniques that are particularly important to multidimensional databases, including materialized views, bitmap indices, join indices, and star join processing. The book ends with a chapter that presents the literature on which the book is based and offers further readings for those readers who wish to engage in more in-depth study of specific aspects of the book's subject. Table of Contents: Introduction / Fundamental Concepts / Advanced Concepts / Implementation Issues / Further Readings
暂无评论