A semijoin is a relational operator which reduces a relation by selecting a set of tuples that match one or more tuples of another relation in the joining domains. Most of the queries can be evaluated by using semijoi...
详细信息
A semijoin is a relational operator which reduces a relation by selecting a set of tuples that match one or more tuples of another relation in the joining domains. Most of the queries can be evaluated by using semijoins. For the class of tree queries, there exists sequences of semijoins that ''fully reduce'' the database. Those sequences delimit the exact portions of the database needed to answer the query. Such sequences are called full reducers. This paper extends the results of Bernstein and Goodman [P. A. Bernstein and N. Goodman. SIAM J. Comput. 10(4), 751-771 (1981)], Bernstein and Chiu [P. A. Bernstein and D. W. Chiu, J. ACM 28(1), 25-40 (1981)] and Ullman [J. D. Ullman. Principles of Relational databases (1988)], by constructing a parallel algorithm for a subset of tree queries, called chain queries. An efficient parallel algorithm for the construction of full reducers for chain queries is presented and analyzed. We claim that the full reduction of a chain query can be done in parallel by executing only 2n - 2 semijoins in the time required for an n - 1 semijoins evaluation using 4 processors.
One feature of cloud storage systems is data fragmentation (or sharding) so that data can be distributed over multiple servers and subqueries can be run in parallel on the fragments. On the other hand, flexible query ...
详细信息
One feature of cloud storage systems is data fragmentation (or sharding) so that data can be distributed over multiple servers and subqueries can be run in parallel on the fragments. On the other hand, flexible query answering can enable a database system to find related information for a user whose original query cannot be answered exactly. Query generalization is a way to implement flexible query answering on the syntax level. In this paper we study a clustering-based fragmentation for the generalization operator Anti-Instantiation with which related information can be found in distributed data. We use a standard clustering algorithm to derive a semantic fragmentation of data in the database. The database system uses the derived fragments to support an intelligent flexible query answering mechanism that avoids overgeneralization but supports data replication in a distributed database system. We show that the data replication problem can be expressed as a special Bin Packing Problem and can hence be solved by an off-the shelf solver for integer linear programs. We present a prototype system that makes use of a medical taxonomy to determine similarities between medical expressions.
This study proposes a robust concurrency control scheme which is reliable and offers ease of implementation and expansion. It falls under the category of least cost concurrency control techniques. A centralized cert...
详细信息
This study proposes a robust concurrency control scheme which is reliable and offers ease of implementation and expansion. It falls under the category of least cost concurrency control techniques. A centralized certifier design is also proposed; it improves on the difficulties encountered in previously proposed certifier methods. As opposed to the case of a common centralized control scheme, in the case of the proposed scheme, it is demonstrated that the failure of the central node poses no threat to the system. It follows that the scheme is able to reap the advantages of a centralized control scheme and still incur the least costs for maintenance of high reliability. Figures.
In a distributed relational database, relations are divided into disjoint fragments. These fragments are allocated to different sites in the database using some allocation scheme to improve the data retrieval time. Al...
详细信息
In a distributed relational database, relations are divided into disjoint fragments. These fragments are allocated to different sites in the database using some allocation scheme to improve the data retrieval time. Allocation schemes that are not constrained by other features of the DBMS will make the allocation easier to implement and provide desired performance (retrieval time). However, at present, the allocation schemes are constrained by the assumptions made by the existing query processing schemes. Most of the existing query processing schemes assume a restricted form of fragment allocation Some of these assume that a fragment is allocated to only one site, while others assume that the sets of fragments allocated to two different sites are either disjoint or the same. This paper emphasizes the importance of nondisjoint data among sites in a distributed database environment and presents query processing framework for such an allocation. Using the framework, a number of query processing schemes can be implemented. This paper also presents a heuristic query processing scheme using this framework. The heuristics presented here attempts to use the redundant data to eliminate the expensive join, I/O, and communication cost.
Entity Resolution (ER) is a task for identifying same real world entity. It refers to data object matching or deduplication. It has been a leading research in the field of structure database. Due to its significance, ...
详细信息
ISBN:
(纸本)9788132225171;9788132225164
Entity Resolution (ER) is a task for identifying same real world entity. It refers to data object matching or deduplication. It has been a leading research in the field of structure database. Due to its significance, entity resolution continues to be a most important challenge for heterogeneous distributed databases. Several methods have been proposed for the Entity resolution, but they have yielded unsatisfactory results. In this paper, we propose an efficient integrated solution to the entity resolution problem based on Jaccard similarity coefficient. Here we use Markov logic and Jaccard similarity coefficient for providing an efficient solution towards ER problem in heterogeneous distributed databases. The approach that we have implemented gives an overall success rate of about 98 %, thus proving better than the previously implemented algorithms.
Nowadays, with the evolution of data and their geographical distribution, distributed database Management Systems (DDBMS) have become undoubtedly a need for Information Systems (IS) users. Unfortunately, query optimiz...
详细信息
ISBN:
(纸本)9783319198576;9783319198569
Nowadays, with the evolution of data and their geographical distribution, distributed database Management Systems (DDBMS) have become undoubtedly a need for Information Systems (IS) users. Unfortunately, query optimization remains a handicap for existing DDBMS, given the high cost of network traffic caused by the access to geographically distributed data in different sites. To remedy this problem, we propose a new effective approach of querying distributed database (DDB) based on the definition of relevant sites to the query knowing fragmentation and /or duplication of distributed data. This approach allows us to minimize the volume of transferred data via network and consequently reduces the query execution cost. This approach has been validated by implementing a layer "effective-query" on Oracle DDBMS.
In today's world many of researches have been done on distributed databases. The main issue in distributed databases is to maintain consistency in databases. To maintain consistency in database, correctness criter...
详细信息
ISBN:
(纸本)9781479930708
In today's world many of researches have been done on distributed databases. The main issue in distributed databases is to maintain consistency in databases. To maintain consistency in database, correctness criteria must be met. Many of the concurrency control methods are presented earlier, but they have problems about delay, performance, waiting time and number of message exchanges while maintaining correctness. Our paper presents comparison of the recent concurrency control methods considering the above mentioned parameters.
Efficient online transaction processing is key to many database applications, and existing concurrency control protocols perform remarkably well under specific workloads or access patterns that they have been designed...
详细信息
ISBN:
(纸本)9781538655207
Efficient online transaction processing is key to many database applications, and existing concurrency control protocols perform remarkably well under specific workloads or access patterns that they have been designed for. However, they often do not scale well when the workload is dynamic. To tackle the challenge of dynamic workloads, we propose an Adaptive and Speculative Optimistic Concurrency Control (ASOCC) protocol for effective transaction processing. Based on real-time monitoring of data access frequency, ASOCC adaptively embeds 2PL into the OCC scheme to facilitate superior contention resolution with reduced transaction aborts. Further, ASOCC dynamically inspects the correlation of data accesses and exploits such information to perform speculative transaction restart to save CPU cycles wasted on the processing of transactions that are destined to abort.
With the increase in geographical spread of data both in terms of quality and quantity, attention on the storage, retrieval and modification of this distributed data has become a prime area of research. The focus is o...
详细信息
ISBN:
(数字)9789811383007
ISBN:
(纸本)9789811383007;9789811382994
With the increase in geographical spread of data both in terms of quality and quantity, attention on the storage, retrieval and modification of this distributed data has become a prime area of research. The focus is on efficient, accurate and timely availability of information extracted from various underlying data centers. Processing of queries from these distributed database environments has become a challenging task for the database researchers because as the number of relations increases in the database, the join order complexity also increases. There are N! ways of solving a particular query where N represents the number of Relations in the join query. The success of query processed in the distributed database Environment depends largely on the search strategy implemented by the query optimizer whose task is to search an optimal Query Evaluation Plan in minimum time amongst the various query plans that can minimize the consumption of computer resources. Various search strategies beginning from Deterministic Algorithms to the most recent and modern Evolutionary Algorithms have contributed incalculably towards query optimization but they bear their own set of limitations and drawbacks. This research paper focuses on the implementation of a hybrid strategy of Evolutionary Algorithms for the optimization of join queries in DDBMS. The hybrid strategy is an integration of Ant Colony Optimization Algorithm and Genetic Algorithm and has been coined as GACO-D (Genetic Ant Colony Optimization Algorithm for distributed database). This paper focuses on the search of an optimal Join Order in minimum response time using GACO-D and also compares its performance with existing strategies.
Data Mining is the technique of automated extraction of interesting data patterns used to represent knowledge, from the large data sets but sometimes these datasets are divided among various parties. Association rule ...
详细信息
ISBN:
(纸本)9781479985531
Data Mining is the technique of automated extraction of interesting data patterns used to represent knowledge, from the large data sets but sometimes these datasets are divided among various parties. Association rule mining is a popular mining technique that identifies interesting correlations between database attributes. In this paper, proposed a protocol Privacy Preserving Fast distributed Mining (PPFDM) for association rules mining in horizontally distributed databases which is based on the Fast distributed Mining (FDM) algorithm. FDM is an unsecured distributed version of the Apriori algorithm devoted to generate a small number of candidate sets and considerably cut down the number of messages to be passed at mining association rules. PPFDM adopts two major ideas: one that computes the union of private subsets that each of the interacting player holds and another that evaluate the inclusion of an element held by one player in a subset held by another. An implementation of a PPDM algorithm is developed in Java framework and performance results are presented for synthetic data generation and association rules as well as indexing is provided to the user. It is simpler and significantly more efficient in the matter of communication rounds, communication cost and computational cost.
暂无评论