We consider the problem of searching a set of data repositories, each managed by an independent agent which is able to handle queries from users, as well as queries from peer agents. Given a query, an agent's task...
详细信息
We consider the problem of searching a set of data repositories, each managed by an independent agent which is able to handle queries from users, as well as queries from peer agents. Given a query, an agent's task is to find a closest match to the query over all the distributed databases. The agents search the distributed databases cooperatively by routing queries with each other. However, it is inefficient to route all queries to all agents. In this paper, we present strategies for routing queries efficiently between agents.
One feature of cloud storage systems is data fragmentation (or sharding) so that data can be distributed over multiple servers and sub queries can be run in parallel on the fragments. On the other hand, flexible query...
详细信息
One feature of cloud storage systems is data fragmentation (or sharding) so that data can be distributed over multiple servers and sub queries can be run in parallel on the fragments. On the other hand, flexible query answering can enable a database system to find related information for a user whose original query cannot be answered exactly. Query generalization is a way to implement flexible query answering on the syntax level. In this paper we study a taxonomy-based fragmentation for the generalization operator Anti-Instantiation with which related information can be found in distributed data.
An intelligent routing control system that uses distributed databases is discussed. Each database manages a switching system, and the databases are connected through high-speed signalling networks that are separated f...
详细信息
An intelligent routing control system that uses distributed databases is discussed. Each database manages a switching system, and the databases are connected through high-speed signalling networks that are separated from the transport network. If a requested physical address cannot be found in one database, search request messages are distributed, to all the other databases simultaneously. It is shown that for up to 100 million subscribers, this routing control system can find a physical address within one second, if each database uses ten memories with access times of 200 ns and the interdatabase link speed is 14 Mb/s.< >
A version control mechanism is proposed that enhances the modularity and extensibility of multiversion concurrency control algorithms. We decouple the multiversion algorithms into two components: version control and c...
详细信息
A version control mechanism is proposed that enhances the modularity and extensibility of multiversion concurrency control algorithms. We decouple the multiversion algorithms into two components: version control and concurrency control. This permits modular development of multiversion protocols, and simplifies the task of proving the correctness of these protocols. A set of procedures for version control is described that defines the interface to the version control component. We show that the same interface can be used by the database actions of both two-phase locking and time-stamp concurrency control protocols to access multiversion data. An interesting feature of our framework is that the execution of read-only transactions becomes completely independent of the underlying concurrency control implementation. Unlike other multiversion algorithms, read-only transactions in this scheme do not modify any version related information, and therefore, do not interfere with the execution of read-write transactions. Finally, the extension of the multiversion algorithms to a distributed environment becomes very simple.
There are many applications for the Reverse Nearest Neighbor (RNN) problem, including continuous referral systems, resource allocation, decision support, location-based services, bioinformatics, profile-based marketin...
详细信息
There are many applications for the Reverse Nearest Neighbor (RNN) problem, including continuous referral systems, resource allocation, decision support, location-based services, bioinformatics, profile-based marketing, and many others. Although there exist numerous studies on RNN problem, most of the existing algorithms for RNN works on a single database. In contrast to the existing approaches, we propose a Decomposable Reverse Nearest Neighbor Algorithm (DRNNA), which computes RNN in a high dimensional space across distributed databases. DRNNA helps in minimizing the amount of data transferred between sites and hence provides data privacy and security for each site. Our approach aims to achieve valid results through minimum information disclosure. The data privacy at individual sites are preserved as our approach requires only minimal transmission of information between sites. Only a minimal count of higher-level summaries is exchanged for performing computations and therefore an intruder cannot obtain the actual data tuples even if they try to capture the exchanged summaries. The simulation results prove that the algorithm can correctly find the RNN set for a given point.
Discovering functional dependencies FDs from existing databases is important to knowledge discovery, machine learning and data quality assessment. A number of algorithms has been proposed in the literature for FD disc...
详细信息
Discovering functional dependencies FDs from existing databases is important to knowledge discovery, machine learning and data quality assessment. A number of algorithms has been proposed in the literature for FD discovery. However these algorithms are designed to work with centralized databases. When they are applied to distributed databases, communication cost of transporting data from different sites makes the algorithms not efficient. In this paper, We analyze the characteristics of mining functional dependencies from large distributed database, and we propose an distributed mining framework for discovery FDs from distribute large databases. We develop a theorem that can prune candidate FDs effectively and extend the partition based approach for distributed databases.
This study implements a semi-join approach with the SDD-1 algorithm to obtain a fast query process and find out the benefits and costs of the semi-join method. The subject of this study is a distributed banking databa...
详细信息
This study implements a semi-join approach with the SDD-1 algorithm to obtain a fast query process and find out the benefits and costs of the semi-join method. The subject of this study is a distributed banking database with 8,910 data sets. Optimization is done with three (3) query processes that obtain a 600% speed increase, for the first query process it resulted in an increase of 300,4, the second query process resulted in an increase of 144,9, and in the third query process resulted in an increase of 154,9. After obtaining the optimization of the query through the semi-join method, the benefits and costs obtained from the three processes of query optimization, that is, the first query process benefits 2,711,168 at a cost of 806, the second The query process obtained 3,304 benefits at 238 and in the third query process it benefited 3,069,888 at a cost of 615. The process data shows that the application of a semi-join method with the SDD-1 algorithm for the optimization of distributed database queries effectively increases the query speed and obtains great benefits and low costs.
This article is devoted to the novel situation, where a large distributed cloud database is a union of several separate databases belonging to individual database owners who are not allowed to transfer their data for ...
详细信息
This article is devoted to the novel situation, where a large distributed cloud database is a union of several separate databases belonging to individual database owners who are not allowed to transfer their data for storage in locations different from their already chosen separate cloud service providers. For example, a very large number of medical records may be stored in a distributed cloud database, which is a union of several separate databases from different hospitals, or even from different countries. The owners of the databases may need to provide answers to certain common aggregated queries using all information available without sharing or transferring all data. It is necessary to minimize the communication costs, improve efficiency, and comply with the legal requirements protecting the privacy of confidential data. In this situation, it is impossible to aggregate the whole database in one location, but effective methods for answers to the aggregated queries with privacy protection are required. To solve this important problem, the present article proposes a Multistage Separate Query Processing (MSQP) protocol employing homomorphic encryption with split keys. We show that our protocol can answer a large class of natural queries of practical significance. The running time of the MSQP protocol is O(d + m/d), where d is the number of database owners and m is the total number of records in the whole database. In practice, d is small, m can be very large, and so the running time is 0(m). This means that the protocol is very efficient for large databases. It dramatically reduces the communication costs of computation and completely eliminates the need for exchange of confidential data. We define a new generalized additive homomorphic property and introduce a Multipart ElGamal Cryptosystem (MEC) with split keys, which enjoys this property. MEC is a novel modification of the ElGamal cryptosystem with split keys. This paper presents the results of extensive experim
distributed databases on cluster computers are widely used in many applications. With the volume of data getting bigger and bigger and the velocity of data getting faster and faster, it is important to develop techniq...
详细信息
distributed databases on cluster computers are widely used in many applications. With the volume of data getting bigger and bigger and the velocity of data getting faster and faster, it is important to develop techniques that can improve query response time to meet applications' needs. Database vertical partitioning that splits a database table into smaller tables containing fewer attributes in order to reduce disk I/Os is one of those techniques. While many algorithms have been developed for database vertical partitioning, none of them is designed to partition the database stored in cluster computers dynamically, i.e., without human interference and without fixed query workloads. To fill this gap, this paper introduces a dynamic algorithm, SMOPD-C, that can autonomously partition a distributed database vertically on cluster computers, determine when a database re-partitioning is needed, and re-partition the database accordingly. The paper then presents comprehensive experiments that were conducted to study the performance of SMOPD-C using the TPC-H benchmark on a cluster computer. The experiment results show that SMOPD-C is capable of performing database re-partitioning dynamically with high accuracy to provide better query cost than the current partitioning configuration.
In today's world many of researches have been done on distributed databases. The main issue in distributed databases is to maintain consistency in databases. To maintain consistency in database, correctness criter...
详细信息
In today's world many of researches have been done on distributed databases. The main issue in distributed databases is to maintain consistency in databases. To maintain consistency in database, correctness criteria must be met. Many of the concurrency control methods are presented earlier, but they have problems about delay, performance, waiting time and number of message exchanges while maintaining correctness. Our paper presents comparison of the recent concurrency control methods considering the above mentioned parameters.
暂无评论