fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and th...
详细信息
ISBN:
(纸本)9781450335492
fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. We introduce PSYNC, a domain specific language based on the Heard-Of model, which views asynchronous faulty systems as synchronous ones with an adversarial environment that simulates asynchrony and faults by dropping messages. We define a runtime system for PSYNC that efficiently executes on asynchronous networks. We formalize the relation between the runtime system and PSYNC in terms of observational refinement. The high-level lockstep abstraction introduced by PSYNC simplifies the design and implementation of fault-tolerant distributed algorithms and enables automated formal verification. We have implemented an embedding of PSYNC in the SCALA programming language with a runtime system for asynchronous networks. We show the applicability of PSYNC by implementing several important fault-tolerant distributed algorithms and we compare the implementation of consensus algorithms in PSYNC against implementations in other languages in terms of code size, runtime efficiency, and verification.
fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and th...
详细信息
fault-tolerant distributed algorithms play an important role in many critical/high-availability applications. These algorithms are notoriously difficult to implement correctly, due to asynchronous communication and the occurrence of faults, such as the network dropping messages or computers crashing. We introduce PSYNC, a domain specific language based on the Heard-Of model, which views asynchronous faulty systems as synchronous ones with an adversarial environment that simulates asynchrony and faults by dropping messages. We define a runtime system for PSYNC that efficiently executes on asynchronous networks. We formalize the relation between the runtime system and PSYNC in terms of observational refinement. The high-level lockstep abstraction introduced by PSYNC simplifies the design and implementation of fault-tolerant distributed algorithms and enables automated formal verification. We have implemented an embedding of PSYNC in the SCALA programming language with a runtime system for asynchronous networks. We show the applicability of PSYNC by implementing several important fault-tolerant distributed algorithms and we compare the implementation of consensus algorithms in PSYNC against implementations in other languages in terms of code size, runtime efficiency, and verification.
We introduce Distal, a new framework that simplifies turning pseudocode of faulttolerantdistributedalgorithms into efficient executable code. Without proper tool support, even small amounts of pseudocode normally e...
详细信息
ISBN:
(纸本)9781479901814;9781467364713
We introduce Distal, a new framework that simplifies turning pseudocode of faulttolerantdistributedalgorithms into efficient executable code. Without proper tool support, even small amounts of pseudocode normally ends up in several thousands of non-trivial lines of Java or C++. Distal is implemented as a library in Scala and consists of two main parts: a domain specific language (DSL) in which algorithms are expressed and an efficient messaging layer that deals with low level issues such as connection management, threading and (de) serialization. The DSL is designed such that implementations of distributedalgorithms highly resemble the pseudocode found in research papers. By writing code that is close to the protocol description, one can be more convinced that the implemented system really reflects the protocol specification on paper. Distal does not only make it simple and intuitive to implement distributedalgorithms but it also leads to efficient implementations.
fault-tolerantalgorithms, such as Reliable Broadcast, assure the correct operation of modern distributed systems, even when some of the system nodes fail. However, the development of distributedalgorithms is a manua...
详细信息
fault-tolerantalgorithms, such as Reliable Broadcast, assure the correct operation of modern distributed systems, even when some of the system nodes fail. However, the development of distributedalgorithms is a manual and complex process, where slight changes in requirements can require a complete redesign of the algorithm. To automate the process of developing such algorithms, this work presents a new approach that uses Reinforcement Learning to synthesize correct and efficient fault-tolerant distributed algorithms. This work shows the first application of the approach on the synthesis of fault-tolerant Reliable Broadcast algorithms. The presented technique is capable of synthesizing correct and efficient algorithms with the same performance as others available in the literature as well as a new Byzantine tolerant algorithm, in only 12,000 learning episodes. Based on the success of this implementation, we aim, in the future, to extend this technique to other distributedalgorithms such as Consensus.
This article addresses novel real-valued consensus problems in the presence of malicious adversaries that can move within the network and induce faulty behaviors in the attacked agents. By adopting several mobile adve...
详细信息
This article addresses novel real-valued consensus problems in the presence of malicious adversaries that can move within the network and induce faulty behaviors in the attacked agents. By adopting several mobile adversary models from the computer science literature, we develop protocols which can mitigate the influence of such malicious agents. The algorithms follow the class of mean subsequence reduced (MSR) algorithms, under which agents ignore the suspicious values received from neighbors during their state updates. Different from the static adversary models, even after the adversaries move away, the infected agents may remain faulty in their values, whose effects must be taken into account. We develop conditions on the network structures for both the complete and non-complete directed graph cases, under which the proposed algorithms are guaranteed to attain resilient consensus. The tolerance bound for network conditions becomes more strict as the adversaries are allowed to have more power. Extensive simulations are carried out over random graphs to verify the effectiveness of our approach when the information of the adversarial agents in terms of their models and numbers is unknown to the agents.
fault-tolerant distributed algorithms such as Reliable Broadcast, Causal Broadcast, Total Order Broadcast, and Consensus, are at the core of many modern distributed systems. However, the development of distributed alg...
详细信息
ISBN:
(纸本)9798350325454
fault-tolerant distributed algorithms such as Reliable Broadcast, Causal Broadcast, Total Order Broadcast, and Consensus, are at the core of many modern distributed systems. However, the development of distributedalgorithms by humans is a laborious and complex process. This work presents a novel approach to generating distributedalgorithms using Generative Artificial Intelligence that allows for automating the process of generating such algorithms. The paper also summarizes our initial results on using the approach to generate Reliable Broadcast algorithms.
In this paper, we study a distributed approach based on consensus algorithms for clock synchronization in wireless sensor networks. The sensor nodes face two types of uncertainties. One is that some of the nodes in th...
详细信息
In this paper, we study a distributed approach based on consensus algorithms for clock synchronization in wireless sensor networks. The sensor nodes face two types of uncertainties. One is that some of the nodes in the network can be faulty and transmit arbitrary signals by not following the given protocol;similar effects may be caused by false data injection by an external malicious attacker. The other is that the communication is unreliable and the packets exchanged may become lost. To deal with these uncertainties, we propose a resilient consensus-type algorithm based on the so-called mean subsequence reduced (MSR) technique, where each normal node ignores the outliers in the clock data collected from its neighbors and makes updates using data from the past if new data have not arrived yet. We establish network connectivity conditions in terms of graph robustness for the MSR algorithm to attain resilient properties.
distributedfault-tolerant control algorithms are in great demand nowadays due to their practical importance in cloud computing, Internet of Things (IoT) technology, swarm robotics, and other areas. It is usually hard...
详细信息
ISBN:
(纸本)9781450351799
distributedfault-tolerant control algorithms are in great demand nowadays due to their practical importance in cloud computing, Internet of Things (IoT) technology, swarm robotics, and other areas. It is usually hard to make a distributed algorithm fault-tolerant. It is even harder to ensure that such algorithm behaves correctly in the presence of faults of some kind. In this work, we construct a reliable, adaptive, fault-tolerantdistributed mutual exclusion algorithm based on the well-known Ricart-Agrawala algorithm. In order to formally verify it, we use a hybrid approach of deductive reasoning and bounded model-checking. First, a safety property of the Ricart-Agrawala algorithm is proved in Calculus of Inductive Constructions of Coq proof assistant using assertional reasoning. Then, an extension of that algorithm turning it into fault-tolerant and adaptive to participants set change, is formalized in TLA and checked on a bounded model. Besides constructing and verifying the algorithm, this work aims to familiarize those interested in constructing highly reliable components with well established verification methods that were used to verify the proposed algorithm.
Automatic verification of threshold-based fault-tolerant distributed algorithms (FTDA) is challenging: FTDAs have multiple parameters that are restricted by arithmetic conditions, the number of processes and faults is...
详细信息
Automatic verification of threshold-based fault-tolerant distributed algorithms (FTDA) is challenging: FTDAs have multiple parameters that are restricted by arithmetic conditions, the number of processes and faults is parameterized, and the algorithm code is parameterized due to conditions counting the number of received messages. Recently, we introduced a technique that first applies data and counter abstraction and then runs bounded model checking (BMC). Given an FTDA, our technique computes an upper bound on the diameter of the system. This makes BMC complete for reachability properties: it always finds a counterexample, if there is an actual error. To verify state-of-the-art FTDAs, further improvement is needed. In contrast to encoding bounded executions of a counter system over an abstract finite domain in SAT, in this paper, we encode bounded executions over integer counters in SMT. In addition, we introduce a new form of reduction that exploits acceleration and the structure of the FTDAs. This aggressively prunes the execution space to be explored by the solver. In this way, we verified safety of seven FTDAs that were out of reach before.
Counter abstraction is a powerful tool for parameterized model checking, if the number of local states of the concurrent processes is relatively small. In recent work, we introduced parametric interval counter abstrac...
详细信息
Counter abstraction is a powerful tool for parameterized model checking, if the number of local states of the concurrent processes is relatively small. In recent work, we introduced parametric interval counter abstraction that allowed us to verify the safety and liveness of threshold-based fault-tolerant distributed algorithms (FTDA). Due to state space explosion, applying this technique to distributedalgorithms with hundreds of local states is challenging for state-of-the-art model checkers. In this paper, we demonstrate that reachability properties of FTDAs can be verified by bounded model checking. To ensure completeness, we need an upper bound on the distance between states. We show that the diameters of accelerated counter systems of FTDAs, and of their counter abstractions, have a quadratic upper bound in the number of local transitions. Our experiments show that the resulting bounds are sufficiently small to use bounded model checking for parameterized verification of reachability properties of several FTDAs, some of which have not been automatically verified before. (C) 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license.
暂无评论