A distributed database (DDB) consists of copies of data files (usually redundant) geographically distributed and managed on a computer network. One important problem in DDB research is that of concurrency control. Thi...
详细信息
A distributed database (DDB) consists of copies of data files (usually redundant) geographically distributed and managed on a computer network. One important problem in DDB research is that of concurrency control. This paper develops a performance model of timestamp-ordering concurrency control algorithms in a DDB. The performance model consists of five components: input data collection, transaction processing model, communication subnetwork model, conflict model, and performance measures estimation. In this paper we describe the conflict model in detail. We first determine the probability of transaction restarts, the probability of transaction blocking, and the delay due to blocking for the basic timestamp-ordering algorithm. We then develop conflict models for variations of the basic algorithm. These conflict models are illustrated by numerical examples.
Medical sciences are rapidly emerging as a data rich discipline where the amount of databases and their dimensionality increases exponentially with time. Data integration algorithms often rely upon discovering embedde...
详细信息
Medical sciences are rapidly emerging as a data rich discipline where the amount of databases and their dimensionality increases exponentially with time. Data integration algorithms often rely upon discovering embedded, useful, and novel relationships between feature attributes that describe the data. Such algorithms require data integration prior to knowledge discovery, which can lack the timeliness, scalability, robustness, and reliability of discovered knowledge. Knowledge integration algorithms offer pattern discovery on segmented and distributed databases but require sophisticated methods for pattern merging and evaluating integration quality. We propose a unique computational framework for discovering and integrating frequent sets of features from distributed databases and then exploiting them for unsupervised learning from the integrated space. Assorted indices of cluster quality are used to assess the accuracy of knowledge merging. The approach preserves significant cluster quality under various cluster distributions and noise conditions. Exhaustive experimentation is performed to further evaluate the scalability and robustness of the proposed methodology.
This contribution deals with systematic exploitation of logical reduction techniques to big distributed data handling. The particular applications are views and parallel updates over large-scale distributed databases ...
详细信息
This contribution deals with systematic exploitation of logical reduction techniques to big distributed data handling. The particular applications are views and parallel updates over large-scale distributed databases as well as handling of queries over different generations of databases. Logical reduction techniques come in two favors. The first one: the syntactically defined translation schemes, which describe transformations of database schemes. They give rise to two induced maps, translations and transductions. Transductions describe the induced transformation of database instances and the translations describe the induced transformations of queries. The second one: Feferman-Vaught reductions, which are applied in situations of distributed databases. The reduction describes how the queries over a distributed database can be computed from queries over the components and queries over the index set. Combination and development of these techniques allow us to introduce the notion of strongly distributed databases. For such databases, we extend and generalize the known propagation techniques. The method allows unification of the distributed and parallel computation and communication as well as significant reduction of the communication load. The proposed general approach may be easily adopted to other distributed objects and their integration into large-scale systems. Copyright (C) 2015 John Wiley & Sons, Ltd.
The Make 2D-DB tool has been previously developed to help build federated two-dimensional gel electrophoresis (2-DE) databases on one's own web site. The purpose of our work is to extend the strength of the first ...
详细信息
The Make 2D-DB tool has been previously developed to help build federated two-dimensional gel electrophoresis (2-DE) databases on one's own web site. The purpose of our work is to extend the strength of the first package and to build a more efficient environment. Such an environment should be able to fulfill the different needs and requirements arising from both the growing use of 2-DE techniques and the increasing amount of distributed experimental data.
Alice and Bob are mutually untrusting curators who possess separate databases containing information about a set of respondents. This data is to be sanitized and published to enable accurate statistical analysis, whil...
详细信息
ISBN:
(纸本)9781467322850;9781467322867
Alice and Bob are mutually untrusting curators who possess separate databases containing information about a set of respondents. This data is to be sanitized and published to enable accurate statistical analysis, while retaining the privacy of the individual respondents in the databases. Further, an adversary who looks at the published data must not even be able to compute statistical measures on it. Only an authorized researcher should be able to compute marginal and joint statistics. This work is an attempt toward providing a theoretical formulation of privacy and utility for problems of this type. Privacy of the individual respondents is formulated using epsilon-differential privacy. Privacy of the marginal and joint statistics on the distributed databases is formulated using a new model called delta-distributional epsilon-differential privacy. Finally, a constructive scheme based on randomized response is presented as an example mechanism that satisfies the formulated privacy requirements.
The Internet of Things (IoT) era envisions billions of interconnected devices capable of providing new interactions between the physical and digital worlds, offering new range of content and services. At the fundament...
详细信息
ISBN:
(纸本)9781509055692
The Internet of Things (IoT) era envisions billions of interconnected devices capable of providing new interactions between the physical and digital worlds, offering new range of content and services. At the fundamental level, IoT nodes are physical devices that exist in the real world, consisting of networking, sensor, and processing components. Some application examples include mobile and pervasive computing or sensor nets, and require distributed device deployment that feed information into databases for exploitation. While the data can be centralized, there are advantages, such as system resiliency and security to adopting a decentralized architecture that pushes the computation and storage to the network edge and onto IoT devices. However, these devices tend to be much more limited in computation power than traditional racked servers. This research explores using the Cassandra distributed database on IoT-representative device specifications. Experiments conducted on both virtual machines and Raspberry Pi's to simulate IoT devices, examined latency issues with network compression, processing workloads, and various memory and node configurations in laboratory settings. We demonstrate that distributed databases are feasible on Raspberry Pi's as IoT representative devices and show findings that may help in application design.
Integrity constraints represent knowledge about data with which a database must be consistent. The process of checking constraints to ensure that the update operations or transactions which alter the database will pre...
详细信息
ISBN:
(纸本)0769517609
Integrity constraints represent knowledge about data with which a database must be consistent. The process of checking constraints to ensure that the update operations or transactions which alter the database will preserve its consistency has proved to be extremely difficult to implement efficiently, particularly in a distributed environment. In the literature, most of the approaches/methods proposed for finding/deriving a good set of integrity constraints concentrate on deriving simplified forms of the constraints by analyzing both the syntax of the constraints and their appropriate update operations. These methods are based on syntactic criteria and are limited to simple types of integrity constraints. Also, these methods are only able to produce one integrity test for each integrity constraint. In [1], we introduced an integrity constraint subsystem for a relational distributed database. The subsystem consists of several techniques necessary for efficient constraint checking, particularly in a distributed environment where data distribution is transparent to application domain. However, the technique proposed for generating integrity tests is limited to several types of integrity constraints, namely: domain, key, referential and simple general semantic constraint and only produced two integrity tests (global and local) for a given integrity constraint. In this paper, we present a technique for deriving several integrity tests for a given integrity constraint where the following types of integrity constraints are considered: static and transition constraints.
Nowadays, distributed relational databases constitute a large part of information storage handled by a variety of users. The knowledge extraction from these databases has been studied massively during this last decade...
详细信息
ISBN:
(纸本)9781479989348
Nowadays, distributed relational databases constitute a large part of information storage handled by a variety of users. The knowledge extraction from these databases has been studied massively during this last decade. However, the problem still present in the distributed data mining process is the communication cost between the different parts of the database located naturally in remote sites. We present in this paper a decision tree classification approach with a low cost communication strategy using a set of the most useful inter-base links for the classification task. Different experiments conducted on real datasets showed a significant reduction in communication costs and an accuracy almost identical to some traditional approaches.
In the emerging networked environment we are encountering situations in which databases residing at geographically distinct sites must collaborate with each other to analyze their data together. But due to large sizes...
详细信息
ISBN:
(纸本)9781467329255
In the emerging networked environment we are encountering situations in which databases residing at geographically distinct sites must collaborate with each other to analyze their data together. But due to large sizes of the datasets it is neither feasible nor safe to transport large datasets across the network to some common server. We need algorithms that can process the databases at their own locations by exchanging needed information among them and obtain the same results that would have been obtained if the databases were merged. In this paper we present an algorithm for mining association rules from distributed databases by exchanging only the needed summaries among them.
This contribution deals with systematic exploitation of logical reduction techniques to databases. The particular applications are views and updates over distributed databases. Logical reduction techniques come in two...
详细信息
ISBN:
(纸本)9781479984480
This contribution deals with systematic exploitation of logical reduction techniques to databases. The particular applications are views and updates over distributed databases. Logical reduction techniques come in two favors. The first one: the syntactically defined translation schemes, which describe transformations of database schemes. They give rise to two induced maps, translations and transductions. Transductions describe the induced transformation of database instances and the translations describe the induced transformations of queries. The second one: Feferman-Vaught reductions, which are applied in situations, where a relational structure is pieced together from a set of sub-structures. The reduction describes how the queries over the structure can be computed from queries over the components and queries over the index set. Combination and development of these techniques allow us to generalize the propagation technique for relational algebra and the incremental re-computation technique for some kinds of Datalog programs to cases of definable sets of tuples to be deleted or inserted.
暂无评论