distributed database systems offer scalability and fault tolerance by replicating databases across geographically dispersed nodes. This redundancy aims to ensure data availability even during failures and allows for b...
详细信息
distributed database systems offer scalability and fault tolerance by replicating databases across geographically dispersed nodes. This redundancy aims to ensure data availability even during failures and allows for backups in case of disasters. However, maintaining strong data consistency, where all nodes reflect the latest data simultaneously, becomes a challenge in such geographically distributed setups since database systems need to prioritize availability or performance over strong data consistency. This study examines how different data consistency configurations affect the performance of popular NoSQL (Not only SQL) databases, namely Cassandra, MongoDB, and Redis, in a multi-region cloud environment. We adopt the Yahoo! Cloud Serving Benchmark (YCSB) tool to simulate various workloads, measure performance metrics, and compare the results. Our findings reveal significant performance degradation associated with strong data consistency configurations. For instance, in Cassandra, the number of writing/reading operations processed per second can decrease by up to 95% for specific workloads. Similarly, enforcing strong data consistency in Redis can result in execution times that are over 20 times slower on writing/reading operations.
Optimisation of queries in distributed databases is more imperative to enhance the processing rate and use of resources with limited computational abilities. This work presents a novel GA-ACO hybrid swarm intelligence...
详细信息
ISBN:
(数字)9798331527495
ISBN:
(纸本)9798331527501
Optimisation of queries in distributed databases is more imperative to enhance the processing rate and use of resources with limited computational abilities. This work presents a novel GA-ACO hybrid swarm intelligence model to address the query optimization problem. The optimization effort of the proposed model starts with GA for an initial population of probable paths for query execution which uses the concept of evolution to select diverse solutions. After that, ACO strengthens this set by the application of pheromone-based reinforcement that allows adjusting of a query paths toward the best solution or its approximation with the following number values. In this way the enhancement of the convergence rates and the scalability of the proposed solution when distributed databases are used can be explained. The model was tested with artificial query logs with DEAP being used for the implementation of the genetic algorithm while ant colony operations were implemented using Python built ACO libraries. Performance analysis shows the success of the developed model in decreasing query response time as follows: This essentially complex hybrid mechanism shows potential for future use in other areas with large data sets and efficient query handling.
Consensus is critical for distributed databases as it ensures the consistency of states across nodes, reinforcing the robustness of the overall system. However, faults related to the consensus protocols such as Paxos ...
详细信息
ISBN:
(数字)9798331535100
ISBN:
(纸本)9798331535117
Consensus is critical for distributed databases as it ensures the consistency of states across nodes, reinforcing the robustness of the overall system. However, faults related to the consensus protocols such as Paxos can lead to serious issues in distributed databases. Such consensus issues impact the correctness and availability of these databases. Therefore, to automatically uncover consensus issues in distributed databases, we propose Conan, a framework designed with fuzzing-driven fault injection. Conan applies a state-guided fuzzing algorithm to effectively explore the fault search space. Moreover, Conan employs hybrid fault sequences that combines fine-grained message-level faults and coarse-grained system-level faults to enhance fault injection. We implement and evaluate Conan on 3 widely-used distributed databases, including etcd, rqlite and openGauss. Finally, Conan has successfully uncovered previously unknown consensus issues, some of which are not detected by existing approaches.
In the last decade there have been significant changes in the computer industry. In database systems, we have seen a widespread acceptance of the management of relational databases for traditional business application...
详细信息
ISBN:
(纸本)9780982148952
In the last decade there have been significant changes in the computer industry. In database systems, we have seen a widespread acceptance of the management of relational databases for traditional business applications such as order tasking, inventory control, banking operations and flight bookings. However, the management of existing relational databases have proven inadequate for applications whose requirements are quite different from those of the database type applications to traditional business. These applications include: computer aided design (CAD), computer aided manufacturing (CAM), computer aided software engineering, information systems and office and multimedia systems, digital editing, geographical information systems.
In this paper, we study the execution of logic queries in a distributed database environment. We assume that each local database system can execute logic queries, and we design methods for the efficient execution or q...
详细信息
In this paper, we study the execution of logic queries in a distributed database environment. We assume that each local database system can execute logic queries, and we design methods for the efficient execution or queries requiring data from multiple sites. Conventional optimization strategies which are well known in the field of distributed databases, such as the early evaluation of selection conditions and the clustering of processing to manipulate and exchange large sets of tuples, are redefined in view of the additional difficulties due to logic queries, in particular to recursive rules. In order to allow efficient processing of these logic queries we present several program transformation techniques which attempt to minimize distribution costs based on the idea of semi-joins and generalized semi-joins in conventional databases. Although local computation of semi-joins is not possible for the general case, we indicate classes of programs for which these transformations succeed in producing set-oriented computation. We describe processes evaluating the recursive program in a distributed network and develop an efficient method for testing the termination of the computation. Finally, we compare our approach with sequential as well as dataflow-oriented evaluation. Datalog is assumed as logic programming language and paradigm.
A blockchain is a decentralised linked data structure that is characterised by its inherent resistance to data modification, but it is deficient in search queries primarily due to its inferior data formatting. A distr...
详细信息
A blockchain is a decentralised linked data structure that is characterised by its inherent resistance to data modification, but it is deficient in search queries primarily due to its inferior data formatting. A distributed database is also a decentralised data structure which features quick query processing and well-designed data formatting but suffers from data reliability. In this work, we showcase CHAINSQL, an open-source system developed by integrating the blockchain with the database, i.e. we present a blockchain database application platform that has the decentralised, distributed and audibility features of the blockchain and quick query processing and well-designed data structure of the distributed databases. CHAINSQL features a tamper-resistant and consistent multi-active database, a reliable and cost effective data-level disaster recovery backup and an auditable transaction log mechanism. The system is presented as an operational multi-active database along with the data-level disaster recovery backup and audibility features. A comprehensive experimental evaluation is performed to demonstrate the effectiveness of the system. (C) 2018 Elsevier B.V. All rights reserved.
A real-time distributed database system (RTDDBS) must maintain the consistency constraints of objects and must also guarantee the time constraints imposed by each request arriving at the system. Such a time constraint...
详细信息
A real-time distributed database system (RTDDBS) must maintain the consistency constraints of objects and must also guarantee the time constraints imposed by each request arriving at the system. Such a time constraint of a request is usually defined as a deadline period, which means that the request must be serviced on or before its time constraint. Servicing these requests may incur I/O costs, control-message transferring costs or data-message transferring costs. As a result, in our work, we first present a mathematical model that considers all these costs. Using this cost model, our objective is to service all the requests on or before their respective deadline periods and minimize the total servicing cost. To this end, from theoretical standpoint, we design a dynamic object replication algorithm, referred to as Real-time distributed dynamic Window Mechanism (RDDWM), that adapts to the random patterns of read-write requests. Using competitive analysis, from practical perspective, we study the performance of RDDWM algorithm under two different extreme conditions, i.e., when the deadline period of each request is sufficiently long and when the deadline period of each request is very short. Several illustrative examples are provided for the ease of understanding.
Private processing of database queries protects the confidentiality of sensitive data when queries are answered. It is important to design collusion-resistant protocols ensuring that privacy remains protected even whe...
详细信息
Private processing of database queries protects the confidentiality of sensitive data when queries are answered. It is important to design collusion-resistant protocols ensuring that privacy remains protected even when a certain number of honest-but-curious participants collude to share their knowledge in order to gain unauthorised access to sensitive information. A novel setting arises when aggregated queries need to be answered for a large distributed database, but legal requirements or commercial interests forbid making access to records in each subdatabase available to other counterparts. For example, a very large number of medical records may be stored in a distributed database, which is a union of several separate databases from different hospitals, or even from different countries. The present article introduces and investigates two protocols for collusion-resistant private processing of aggregated queries in this novel setting: Accelerated Multi-round Iterative Protocol (AMIP) and Restricted Multi-round Iterative Protocol (RMIP). We define a large collection of query functions and show that AMIP and RMIP protocols can answer all queries in this collection. Our experiments demonstrate that the AMIP protocol outperforms all other applicable algorithms, and this achievement is especially significant in terms of the communication complexity.
The CAP Theorem shows that (strong) consistency, availability, and partition tolerance are impossible to be ensured together. Causal consistency is one of the weak consistency models that can be implemented to ensure ...
详细信息
The CAP Theorem shows that (strong) consistency, availability, and partition tolerance are impossible to be ensured together. Causal consistency is one of the weak consistency models that can be implemented to ensure availability and partition tolerance in distributed systems. In this work, we propose a tool to check automatically the conformance of distributed/concurrent systems executions to causal consistency models. Our approach consists in reducing the problem of checking if an execution is causally consistent to solving datalog queries. The reduction is based on complete characterizations of the executions violating causal consistency in terms of the existence of cycles in suitably defined relations between the operations occurring in these executions. We have implemented the reduction in a testing tool for distributed databases, and carried out several experiments on real case studies, showing the efficiency of the suggested approach.
For a distributed database system to function efficiently, the fragments of the database need to be located judiciously at various sites across the relevant communications network. The problem of allocating these frag...
详细信息
For a distributed database system to function efficiently, the fragments of the database need to be located judiciously at various sites across the relevant communications network. The problem of allocating these fragments to the most appropriate sites is a difficult one to solve, however, with most approaches available relying on heuristic techniques. Optimal approaches are usually based on mathematical programming, and formulations available for this problem are based on the linearization of nonlinear binary integer programs and have been observed to be ineffective except on very small problems. This paper presents new integer programming formulations for the nonredundant version of the fragment allocation problem. This formulation is extended to address problems which have both storage and processing capacity constraints;the approach is observed to be particularly effective in the presence of capacity restrictions. Extensive computational tests conducted over a variety of parameter values indicate that the reformulations are very effective even on relatively large problems, thereby reducing the need for heuristic approaches.
暂无评论