Alice and Bob are mutually untrusting curators who possess separate databases containing information about a set of respondents. This data is to be sanitized and published to enable accurate statistical analysis, whil...
详细信息
ISBN:
(纸本)9781467322850;9781467322867
Alice and Bob are mutually untrusting curators who possess separate databases containing information about a set of respondents. This data is to be sanitized and published to enable accurate statistical analysis, while retaining the privacy of the individual respondents in the databases. Further, an adversary who looks at the published data must not even be able to compute statistical measures on it. Only an authorized researcher should be able to compute marginal and joint statistics. This work is an attempt toward providing a theoretical formulation of privacy and utility for problems of this type. Privacy of the individual respondents is formulated using epsilon-differential privacy. Privacy of the marginal and joint statistics on the distributed databases is formulated using a new model called delta-distributional epsilon-differential privacy. Finally, a constructive scheme based on randomized response is presented as an example mechanism that satisfies the formulated privacy requirements.
Integrity constraints represent knowledge about data with which a database must be consistent. The process of checking constraints to ensure that the update operations or transactions which alter the database will pre...
详细信息
ISBN:
(纸本)0769517609
Integrity constraints represent knowledge about data with which a database must be consistent. The process of checking constraints to ensure that the update operations or transactions which alter the database will preserve its consistency has proved to be extremely difficult to implement efficiently, particularly in a distributed environment. In the literature, most of the approaches/methods proposed for finding/deriving a good set of integrity constraints concentrate on deriving simplified forms of the constraints by analyzing both the syntax of the constraints and their appropriate update operations. These methods are based on syntactic criteria and are limited to simple types of integrity constraints. Also, these methods are only able to produce one integrity test for each integrity constraint. In [1], we introduced an integrity constraint subsystem for a relational distributed database. The subsystem consists of several techniques necessary for efficient constraint checking, particularly in a distributed environment where data distribution is transparent to application domain. However, the technique proposed for generating integrity tests is limited to several types of integrity constraints, namely: domain, key, referential and simple general semantic constraint and only produced two integrity tests (global and local) for a given integrity constraint. In this paper, we present a technique for deriving several integrity tests for a given integrity constraint where the following types of integrity constraints are considered: static and transition constraints.
This paper presents the reliability mechanisms of SDD-1, a prototype distributed database system being developed by the Computer Corporation of America. Reliability algorithms in SDD-1 center around the concept of the...
详细信息
In this paper we present a horizontal fragmentation algorithm for design phase of a distributed databases. Our results has implemented in case of university databases application. We propose a matrix with values for a...
详细信息
ISBN:
(纸本)9786197105100
In this paper we present a horizontal fragmentation algorithm for design phase of a distributed databases. Our results has implemented in case of university databases application. We propose a matrix with values for attributes is used by the database administrator in the requirement analysis phase of system development life cycle for making decision of data mapping to different locations. Our matrix name is SIDU (S Select, I - Insert D - Delete, and U - Update). It is a table constructed by placing predicates of attributes of a relation as the rows and applications of the sites of a DDBMS as the columns. We have used SIDU to generate ACF array with values for each relation. We treated cost as the effort of access and modification of a particular attribute of a relation by an application from a particular site.
This paper focuses on the fragment allocation problem in distributed databases and proposes an approach that minimizes query splitting. Query splitting occurs when a query has to access multiple servers to retrieve th...
详细信息
This paper focuses on the fragment allocation problem in distributed databases and proposes an approach that minimizes query splitting. Query splitting occurs when a query has to access multiple servers to retrieve the fragments it needs, resulting in reduced system performance. The objective of minimizing query splitting is important because it captures many factors that affect the performance of the database, such as reducing response time and cost. The paper presents a column generation-based algorithm to solve the fragment allocation problem, which requires less fine-tuning of its parameters and outperforms the IP approach implemented by CPLEX in terms of the number of queries split and execution time. The approach and algorithm offer practical solutions to optimize the design of a distributed database system. The paper's contribution is significant as it fills the gap in the literature by offering a novel approach that minimizes query splitting, which can serve as a proxy for achieving a combination of other objectives such as minimizing costs, reducing response time, and balancing server workloads.
作者:
Vlach, RCharles Univ
Fac Math & Phys Dept Software Engn Prague 11800 1 Czech Republic
Mobile agent technology raises a new dimension in distributed database processing. Of interest in this paper are mobile procedures cs queryihg multiple databasesdistributed over a network. In the used execution model...
详细信息
ISBN:
(纸本)0769508197
Mobile agent technology raises a new dimension in distributed database processing. Of interest in this paper are mobile procedures cs queryihg multiple databasesdistributed over a network. In the used execution model, mobile execution can profit from both migration, reducing amount of transmitted data, and data prefetching at the most beneficial site. To achieve the lowest response time, an execution strategy should suggest an appropriate mix of agent migration, remote database access, and data prefetching. Since there is no universal strategy suggesting the optimal execution in all cases we must resign ourselves to possibly not optimal solution. Thc main achievement of this paper consists in proposing four dynamic execution strategies of different implementation complexity with different behavior under various conditions. The strategies were tested in the real internet and their performance is compared to each other and to a classical centralized stationary approach.
The standard way to scale a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this results in each query/transaction being executed at just one node, to avoid the overhead of distr...
详细信息
ISBN:
(纸本)9781467300421
The standard way to scale a distributed OLTP DBMS is to horizontally partition data across several nodes. Ideally, this results in each query/transaction being executed at just one node, to avoid the overhead of distribution and allow the system to scale by adding nodes. For some applications, simple strategies such as hashing on primary key provide this property. Unfortunately, for many applications, including social networking and order-fulfillment, simple partitioning schemes applied to many-to-many relationships create a large fraction of distributed queries/transactions. What is needed is a fine-grained partitioning, where related individual tuples (e. g., cliques of friends) are co-located together in the same partition. Maintaining a fine-grained partitioning requires storing the location of each tuple. We call this metadata a lookup table. We present a design that efficiently stores very large tables and maintains them as the database is modified. We show they improve scalability for several difficult to partition database workloads, including Wikipedia, Twitter, and TPC-E. Our implementation provides 40% to 300% better throughput on these workloads than simple range or hash partitioning.
A locking protocol to coordinate access to a distributed database and to maintain system consistency throughout normal and abnormal conditions is presented. The proposed protocol is robust in the face of crashes of an...
详细信息
A locking protocol to coordinate access to a distributed database and to maintain system consistency throughout normal and abnormal conditions is presented. The proposed protocol is robust in the face of crashes of any participating site, as well as communication failures. Recovery from any number of failures during normal operation or any of the recovery stages is supported. Recovery is done in such a way that maximum forward progress is achieved by the recovery procedures. Integration of virtually any locking discipline including predicate lock methods is permitted by this protocol. The locking algorithm operates, and operates correctly, when the network is partitioned, either intentionally or by failure of communication lines. Each partition is able to continue with work local to it, and operation merges gracefully when the partitions are reconnected. A subroutine of the protocol, that assures reliable communication among sites, is shown to have better performance than two-phase commit methods. For many topologies of interest, the delay introduced by the overall protocol is not a direct function of the size of the network. The communications cost is shown to grow in a relatively slow, linear fashion with the number of sites participating in the transaction. An informal proof of the correctness of the algorithm is also presented in this paper. The algorithm has as its core a centralized locking protocol with distributed recovery procedures. A centralized controller with local appendages at each site coordinates all resource control, with requests initiated by application programs at any site. However, no site experiences undue load. Recovery is broken down into three disjoint mechanisms: for single node recovery, merge of partitions, and reconstruction of the centralized controller and tables. The disjointness of the mechanisms contributes to comprehensibility and ease of proof. The paper concludes with a proposal for an extension aimed at optimizing operation of
There are several companies that provide their own hardware and software solutions for the collection and storage of telemetry data from transport. However, how to be sure that the system stores the correct data? Ther...
详细信息
ISBN:
(纸本)9781728103396
There are several companies that provide their own hardware and software solutions for the collection and storage of telemetry data from transport. However, how to be sure that the system stores the correct data? There is always the possibility of bribery of employees of the Operator of the telemetry data. In this matter, the use of distributed databases is a reliable solution. Telemetry data from devices immediately gets into the database and can no longer be changed. When using a private distributed databases within the data Operator company, the possibility of various manipulations with the data is still preserved. To solve this problem, it is proposed to use the public distributed database Emercoin, one of the functional features of which is the NVS (Name Value Storage), - the ability to write to the database NVS-records of arbitrary content up to 20KB.
In the emerging networked environment we are encountering situations in which databases residing at geographically distinct sites must collaborate with each other to analyze their data together. But due to large sizes...
详细信息
ISBN:
(纸本)9781467329255
In the emerging networked environment we are encountering situations in which databases residing at geographically distinct sites must collaborate with each other to analyze their data together. But due to large sizes of the datasets it is neither feasible nor safe to transport large datasets across the network to some common server. We need algorithms that can process the databases at their own locations by exchanging needed information among them and obtain the same results that would have been obtained if the databases were merged. In this paper we present an algorithm for mining association rules from distributed databases by exchanging only the needed summaries among them.
暂无评论