A deadlock detection algorithm is presented, by which each node in a network can decide locally whether a deadlock exists. Besides the problem of deadlock detection, we also take into account the problem of cycle dete...
详细信息
A deadlock detection algorithm is presented, by which each node in a network can decide locally whether a deadlock exists. Besides the problem of deadlock detection, we also take into account the problem of cycle detection in graphs distributed over several nodes. This problem arises in several locking protocols like the RAC-locking protocol in a distributed environment. The proposed algorithm is based on the idea of sending relevant paths of the wait-for graph by broadcast messages to each other node requiring only one physical transmission. In order to detect deadlocks only locally each node collects all paths sent in a local graph. As this graph has to be equal within all nodes a protocol has been developed which is responsible for ensuring equality. Finally, we determine the overhead caused by this protocol within the network. Besides, we also propose a broadcast two-phase commit protocol exploiting the facility of a broadcast message.
An overview is presented of an approach to distributed database design which emphasizes high availability in the face of network partitions and other communication failures. This approach is appropriate for applicatio...
详细信息
An overview is presented of an approach to distributed database design which emphasizes high availability in the face of network partitions and other communication failures. This approach is appropriate for applications which require continued operation and can tolerate some loss of integrity of the data. Each site presents its users and application programs with the best possible view of the data which it can, based on those updates which it has received so far. Mutual consistency of replicated copies of data is ensured by using time stamps to establish a known total ordering on all updates ❉d, and by a mechanism which ensures the same final result regardless of the order in which a site actually receives these updates. A mechanism is proposed, based on alerters and triggers, by which applications can deal with exception conditions which may arise as a consequence of the high-availability architecture. A prototype system which demonstrates this approach is near completion.
In many distributed databases ''locality of reference'' is crucial to achieve acceptable performance. However, the purpose of data distribution is to spread the data among several remote sites. One way...
详细信息
In many distributed databases ''locality of reference'' is crucial to achieve acceptable performance. However, the purpose of data distribution is to spread the data among several remote sites. One way to solve this contradiction is to use partitioned data techniques. instead of accessing the entire data, a site works on a fraction that is made locally available, thereby increasing the site's autonomy. We present a theory of partitioned data that formalizes the concept and establishes the basis to develop a correctness criterion and a concurrency control protocol for partitioned databases. Set-serializability is proposed as a correctness criterion and we suggest an implementation that integrates partitioned and non-partitioned data. To complete this study, the policies required in a real implementation are also analyzed.
The constant growth of social media, unconventional web technologies, mobile applications, and Internet of Things (IoT) devices create challenges for cloud data systems in order to support huge datasets and very high ...
详细信息
The constant growth of social media, unconventional web technologies, mobile applications, and Internet of Things (IoT) devices create challenges for cloud data systems in order to support huge datasets and very high request rates. NoSQL databases, such as Cassandra and HBase, and relational SQL databases with replication, such as Citus/PostgreSQL, have been used to increase horizontal scalability and high availability of data store systems. In this paper, we evaluated three distributed databases on a low-power low-cost cluster of commodity Single-Board Computers (SBC): relational Citus/PostgreSQL and NoSQL databases Cassandra and HBase. The cluster has 15 Raspberry Pi 3 nodes with Docker Swarm orchestration tool for service deployment and ingress load balancing over SBCs. We believe that a low-cost SBC cluster can support cloud serving goals such as scale-out, elasticity, and high availability. Experimental results clearly demonstrated that there is a trade-off between performance and replication, which provides availability and partition tolerance. Besides, both properties are essential in the context of distributed systems with low-power boards. Cassandra attained better results with its consistency levels specified by the client. Both Citus and HBase enable consistency but it penalizes performance as the number of replicas increases.
A benchmark study of modern distributed databases (DDBs) (e.g., Cassandra, MongoDB, Redis, and MySQL) is an important source of information for selecting the right technology for managing data in edge-cloud deployment...
详细信息
A benchmark study of modern distributed databases (DDBs) (e.g., Cassandra, MongoDB, Redis, and MySQL) is an important source of information for selecting the right technology for managing data in edge-cloud deployments. While most of the existing studies have investigated the performance and scalability of DDBs in cloud computing, there is a lack of focus on resource utilization (e.g., energy, bandwidth, and storage consumption) of workload offloading for DDBs deployed in edge-cloud environments. For this purpose, we conducted experiments on various physical and virtualized computing nodes, including variously powered servers, Raspberry Pi, and hybrid cloud (OpenStack and Azure). Our extensive experimental results reveal insights into which database under which offloading scenario is more efficient in terms of energy, bandwidth, and storage consumption.
In distributed database systems, it is desirable to allow read and write accesses to occur independently on replicated copies of database files in case of network partitions to increase availability. However, the syst...
详细信息
In distributed database systems, it is desirable to allow read and write accesses to occur independently on replicated copies of database files in case of network partitions to increase availability. However, the system should detect mutual conflicts among the copies of the database files when sites from different partitions merge to form one partition. We present a timestamp-based algorithm for the detection of both write-write and read-write conflicts for a single file in distributed databases when sites from different partitions merge, Our algorithm allows read and write operations to occur in different network partitions simultaneously. When the sites from two different partitions merge, the algorithm detects and resolves both read-write and write-write conflicts with the help of stored timestamps using some additional information. Once the conflicts have been detected, we propose some reconciliation steps for the resolution of conflicts to bring the file into some consistent state. Our algorithm does not take into account the semantics of the transactions while detecting and resolving conflicts. Our algorithm will be useful in real-time systems where timeliness of operations is more important than response time (delayed commit).
In a distributed database, maintaining large table replicas with frequent asynchronous insertions is a challenging problem that requires carefully managing a tradeoff between consistency and availability. With that mo...
详细信息
In a distributed database, maintaining large table replicas with frequent asynchronous insertions is a challenging problem that requires carefully managing a tradeoff between consistency and availability. With that motivation in mind, we propose efficient algorithms to repair and measure replica consistency. Specifically, we adapt, extend and optimize distributed set reconciliation algorithms to efficiently compute the symmetric difference between replicated tables in a distributed relational database. Our novel algorithms enable fast synchronization of replicas being updated with small sets of new records, measuring obsolence of replicas having many insertions and deciding when to update a replica, as each table replica is being continuously updated in an asynchronous manner. We first present an algorithm to repair and measure distributed consistency on a large table continuously updated with new records at several sites when the number of insertions is small. We then present a complementary algorithm that enables fast synchronization of a summarization table based on foreign keys when the number of insertions is large, but happening on a few foreign key values. From a distributed systems perspective, in the first algorithm the large table with data is reconciled, whereas in the second case, its summarization table is reconciled. Both distributed database algorithms have linear communication complexity and cubic time complexity in the size of the symmetric difference between the respective table replicas they work on. That is, they are effective when the network speed is smaller than CPU speed at each site. A performance experimental evaluation with synthetic and real databases shows our algorithms are faster than a previous state-of-the art algorithm as well as more efficient than transferring complete tables, assuming large replicated tables and sporadic asynchronous insertions.
In industrial and government settings, there is often a need to perform statistical analyses that require data stored in multiple distributed databases. However, the barriers to literally integrating these data can be...
详细信息
In industrial and government settings, there is often a need to perform statistical analyses that require data stored in multiple distributed databases. However, the barriers to literally integrating these data can be substantial, even insurmountable. In this article we show how tools from information technology-specifically, secure multiparty computation and networking-can be used to perform statistically valid analyses of distributed databases. The common characteristic of these methods is that the owners share sufficient statistics computed on the local databases in a way that protects each owner's data from the other owners. Our focus is on horizontally partitioned data, in which data records rather than attributes are spread among the databases. We present protocols for securely performing regression, maximum likelihood estimation, and Bayesian analysis, as well as secure construction of contingency tables. We outline three current research directions: a software system implementing the protocols, secure EM algorithms, and partially trusted third parties, which reduce incentives for owners to be dishonest.
Data availability and security are two important issues in a distributed database system. Existing schemes achieve high availability at the expense of higher storage cost and data security at the expense of higher pro...
详细信息
Data availability and security are two important issues in a distributed database system. Existing schemes achieve high availability at the expense of higher storage cost and data security at the expense of higher processing cost. In this concise paper, we develop an integrated methodology that combines the features of some existing schemes dealing with data fragmentation, data encoding, partial replication, and quorum consensus concepts to achieve storage efficient, highly available, and secure distributed database systems.
Credit card fraud has grown increasingly common in today’s age, and with the rise in cybercrimes with fraud, several examples have been recorded in the past. The use of a distributed search plays a pivotal role in en...
详细信息
ISBN:
(数字)9798331518882
ISBN:
(纸本)9798331518899
Credit card fraud has grown increasingly common in today’s age, and with the rise in cybercrimes with fraud, several examples have been recorded in the past. The use of a distributed search plays a pivotal role in enhancing the performance of fraud detection systems. By enabling the aggregation and retrieval of data from multiple decentralized sources without compromising data privacy, it facilitates the efficient training of models on large, diverse datasets. Another technique used in controlling fraud losses is through applying Federated Learning (FL) for detecting fraudulent transactions. The models may gain the benefit of the dispersed data without actually sharing the data in this way. This paper focuses on the implementation of CNN with FL to improve a security and accuracy of fraud detection in financial transactions. The suggested model employs Kaggle credit card fraud dataset and uses enhanced techniques such as; SMOTE to work on class imbalance problem and one hot encoder to work on Categorical features. The proposed CNN-FL model surpassed other classifiers and yielded better accuracy, precision, and recall rates compared to traditional ML classifiers like NB, LR, and Gaussian Naive Bayes; accuracy $\mathbf{9 9. 8 6\%}$, precision 99.83%, recal 199.85%, and an F1-score of $\mathbf{9 9. 84\%}$. Thus, effectiveness of suggested CNN-based federated learning approach for enriching fraud detection systems is shown, with good generalisation and high accuracy on various types of transactions.
暂无评论