Consistency of replicated copies is difficult to maintain and recover during multiple failures of sites and network communication in a distributed database system. Transaction processing must continue as long as a sin...
详细信息
Consistency of replicated copies is difficult to maintain and recover during multiple failures of sites and network communication in a distributed database system. Transaction processing must continue as long as a single copy is available. But in a multiple failure environment, each operational site must make correct decisions about which copy to update and which one will be updated by the recovery system. This requires refreshing the copies on failed sites that missed the updates and doing this correctly while other transactions are updating and some more sites are either failing or recovering. This problem has been classified as the "replicated copy control problem." In this paper, we present several ideas that are necessary to attack and manage this problem. We introduce the ideas of session numbers, nominal session vectors, fail locks, and view serializability and discuss their role in transaction processing on operational, recovering, and partitioned sites. We have experimented with many of these ideas in a prototype system called RAID and we present the implementation issues. There is little overhead associated with our approach if no failures occur.
Two strategies for processing transactions during partition failures in distributed databases are reviewed: the optimistic protocol and conservative class conflict graph analysis. Both use graph techniques for detecti...
详细信息
Two strategies for processing transactions during partition failures in distributed databases are reviewed: the optimistic protocol and conservative class conflict graph analysis. Both use graph techniques for detecting and resolving conflicts, although one is "optimistic," detecting and resolving conflict after the failure is repaired, while the other is "conservative," detecting and preventing potential conflicts when the failure occurs. A simulation comparing the two approaches with respect to the cost of missed opportunity, the cost of repair, and overhead cost is presented, along with sample results. The optimistic protocol generally minimizes missed opportunity, while conservative class conflict graph analysis requires less overhead and no repair. The applicability of these approaches to fractured networks involving more than two partitions is also discussed.
Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalan- guage. Optimization ...
详细信息
Thii paper describes the techniques used to optimize relational queries in the SDD-1 distributed database system. Queries are submitted to SDD-1 in a high-level procedural language called Datalan- guage. Optimization begins by translating each Datalanguage query into a relational calculus form called an envelope, which is essentially an aggregate-free QUEL query. This paper is primarily concerned with the optimization of envelopes. Envelopes are processed in two phases. The first phase executes relational operations at various sites of the distributed database in order to delimit a subset of the database that contains all data relevant to the envelope. This subset is called a reduction of the database. The second phase transmits the reduction to one designated site, and the query is executed locally at that site. The critical optimization problem is to perform the reduction phase efficiently. Success depends on designing a good repertoire of operators to use during this phase, and an effective algorithm for deciding which of these operators to use in processing a given envelope against a given database. The principal reduction operator that we employ is called a sem@oin. In this paper we define the semijoin operator, explain why semijoin is an effective reduction operator, and present an algorithm that constructs a cost-effective program of semijoins, given an envelope and a database.
Data Mining is the technique of automated extraction of interesting data patterns used to represent knowledge, from the large data sets but sometimes these datasets are divided among various parties. Association rule ...
详细信息
Data Mining is the technique of automated extraction of interesting data patterns used to represent knowledge, from the large data sets but sometimes these datasets are divided among various parties. Association rule mining is a popular mining technique that identifies interesting correlations between database attributes. In this paper, proposed a protocol Privacy Preserving Fast distributed Mining (PPFDM) for association rules mining in horizontally distributed databases which is based on the Fast distributed Mining (FDM) algorithm. FDM is an unsecured distributed version of the Apriori algorithm devoted to generate a small number of candidate sets and considerably cut down the number of messages to be passed at mining association rules. PPFDM adopts two major ideas: one that computes the union of private subsets that each of the interacting player holds and another that evaluate the inclusion of an element held by one player in a subset held by another. An implementation of a PPDM algorithm is developed in Java framework and performance results are presented for synthetic data generation and association rules as well as indexing is provided to the user. It is simpler and significantly more efficient in the matter of communication rounds, communication cost and computational cost.
A description is given of how to determine the optimal migration policy for data items in a distributed database. A migration policy determines whether remote execution or migration is used when a remote request on a ...
详细信息
A description is given of how to determine the optimal migration policy for data items in a distributed database. A migration policy determines whether remote execution or migration is used when a remote request on a data item is initiated. A migration policy is modeled using a discrete Markov chain. The states in the Markov chain encode the history of previous requests and determine the probability distribution of the location of future requests. A modified policy iteration procedure is used to determine the optimal policy.< >
Soap/XML method for data exchange between distributed databases proposed a decision for communication between client/server based solutions, for easily way to exchange data. It is based on open source standards XML, S...
详细信息
Soap/XML method for data exchange between distributed databases proposed a decision for communication between client/server based solutions, for easily way to exchange data. It is based on open source standards XML, SOAP. The application based on this method are developed and explained.
The authors evaluate and compare the performance of two concurrency control protocols for distributed databases with multiversioned entities, assuming that each transaction incrementally declares its access set from t...
详细信息
The authors evaluate and compare the performance of two concurrency control protocols for distributed databases with multiversioned entities, assuming that each transaction incrementally declares its access set from the successive parts of the preordered entities. The first protocol is called protocol proposed (PP). The second is a variant of the protocol proposed by D.P. Reed (1978), here called RP1. Performance results for these protocols are collected using simulations. Key performance issues of PP are studied and relative performance of PP and RP1 is compared. Extra memory requirement is the most important cost for PP, while the cost associated with abortion of transactions are most important for RP1. For slow communication networks, at all workloads except some range of low workloads, PP performs better than RP1. For fast networks, between low to very high workloads RP1 performs better for a range of parameters. At extremely high workloads, both perform poorly, but the higher memory requirement of PP is more tolerable than the high abortion rate of RP1. A protocol similar to PP is proposed that permits universioned entities and so does not have extra memory cost and has the advantages of PP.< >
A set of algorithms is described that can be used to reduce the complexity of evaluating multiple queries of a transaction in a distributed environment. With the consideration of conjunct sharing, it compiles a set of...
详细信息
A set of algorithms is described that can be used to reduce the complexity of evaluating multiple queries of a transaction in a distributed environment. With the consideration of conjunct sharing, it compiles a set of queries into a network based on the concept of semijoins. As some of the queries in a transaction may change the contents of a database, evaluation of the network corresponding to the transaction is synchronized into several phases so that the dependencies among the queries can be properly captured. It is shown how a transaction that includes database updates can be evaluated incrementally in multiple phases such that the states of the evaluation process can be saved and only part of the transaction which is affected by a change needs to be reevaluated. The algorithms described can be applied to relational databases with slight modifications.< >
In this paper we have introduced the concept of multirelation semijoin and we have shown that it is possible to get substantial reduction in the cost of data communication if a restricted form of this operation called...
详细信息
In this paper we have introduced the concept of multirelation semijoin and we have shown that it is possible to get substantial reduction in the cost of data communication if a restricted form of this operation called cyclic multi-relation semijoin is used for query optimization in distributed databases. We have proposed a heuristic to identify situations where the cyclic multi-relation semijoin operation is likely to be useful. To study the applicability of this operation and to determine how much additional benefit we may expect, we have augmented a well known semijoin based algorithm with this operation. We have used simulation studies with a large number of queries and our experiments indicate that depending on the characteristics of the database, improvements ranging from 30% to 80% is possible.
暂无评论