We design and implement PG, a Byzantine fault-tolerant and privacy-preserving multi-sensor fusion system. PG is flexible and extensible, supporting a variety of fusion algorithms and application scenarios. On the theo...
详细信息
ISBN:
(纸本)9798400706363
We design and implement PG, a Byzantine fault-tolerant and privacy-preserving multi-sensor fusion system. PG is flexible and extensible, supporting a variety of fusion algorithms and application scenarios. On the theoretical side, PG develops and unifies techniques from dependable distributed systems and modern cryptography. PG can provably protect the privacy of individual sensor inputs and fusion results. In contrast to prior works, PG can provably defend against pollution attacks and guarantee output delivery, even in the presence of malicious sensors that may lie about their inputs, contribute ill-formed inputs, and provide no inputs at all to sway the final result, and in the presence of malicious servers serving as aggregators. On the practical side, we implement PG in the client-server-sensor setting. Moreover, we deploy PG in a cloud-based system with 261 sensors and a cyber-physical system with 19 resource-constrained sensors. In both settings, we show that PG is efficient and scalable in both failure-free and failure scenarios.
Technologies that interact with the physical world rely on various types of sensors to measure environment variables. However, sensors can become defective due to aging and harsh conditions, leading to inaccurate read...
详细信息
ISBN:
(纸本)9798350350562;9781713899310
Technologies that interact with the physical world rely on various types of sensors to measure environment variables. However, sensors can become defective due to aging and harsh conditions, leading to inaccurate readings and incorrect decisions. To address faulty sensors, a distributed network with replicated sensors can be used. In this research, we combine modeling and simulation to develop these applications, proposing the Brooks-Iyengar algorithm for sensor replication in a publish/subscribe architecture. Our approach demonstrates how a modeling and simulation platform can develop embedded applications that achieve multisensory fusion and inexact agreement between nodes. Using the Discrete Event System Specification (DEVS), our models are utilized both in simulation environments and on hardware.
In this paper, we propose a novel fault-tolerant parallel matrix multiplication algorithm called 3D Coded SUMMA that achieves higher failure-tolerance than replication-based schemes for the same amount of redundancy. ...
详细信息
ISBN:
(纸本)9783030576752;9783030576745
In this paper, we propose a novel fault-tolerant parallel matrix multiplication algorithm called 3D Coded SUMMA that achieves higher failure-tolerance than replication-based schemes for the same amount of redundancy. This work bridges the gap between recent developments in coded computing and fault-tolerance in high-performance computing (HPC). The core idea of coded computing is the same as algorithm-based fault-tolerance (ABFT), which is weaving redundancy in the computation using error-correcting codes. In particular, we show that MatDot codes, an innovative code construction for parallel matrix multiplications, can be integrated into three-dimensional SUMMA (Scalable Universal Matrix Multiplication Algorithm [30]) in a communication-avoiding manner. To tolerate any two node failures, the proposed 3D Coded SUMMA requires similar to 50% less redundancy than replication, while the overhead in execution time is only about 5-10%.
On future extreme scale computers, it is expected that faults will become an increasingly serious problem as the number of individual components grows and failures become more frequent. This is driving the interest in...
详细信息
On future extreme scale computers, it is expected that faults will become an increasingly serious problem as the number of individual components grows and failures become more frequent. This is driving the interest in designing algorithms with built-in fault tolerance that can continue to operate and that can replace data even if part of the computation is lost in a failure. For fault-free computations, the use of adaptive refinement techniques in combination with finite element methods is well established. Furthermore, iterative solution techniques that incorporate information about the grid structure, such as the parallel geometric multigrid method, have been shown to be an efficient approach to solving various types of partial different equations. In this article, we present an advanced parallel adaptive multigrid method that uses dynamic data structures to store a nested sequence of meshes and the iteratively evolving solution. After a fail-stop fault, the data residing on the faulty processor will be lost. However, with suitably designed data structures, the neighbouring processors contain enough information so that a consistent mesh can be reconstructed in the faulty domain with the goal of resuming the computation without having to restart from scratch. This recovery is based on a set of carefully designed distributed algorithms that build on the existing parallel adaptive refinement routines, but which must be carefully augmented and extended.
This article is concerned with the security of modern Cyber-Physical Systems in the presence of transient sensor faults. We consider a system with multiple sensors measuring the same physical variable, where each sens...
详细信息
This article is concerned with the security of modern Cyber-Physical Systems in the presence of transient sensor faults. We consider a system with multiple sensors measuring the same physical variable, where each sensor provides an interval with all possible values of the true state. We note that some sensors might output faulty readings and others may be controlled by a malicious attacker. Differing from previous works, in this article, we aim to distinguish between faults and attacks and develop an attack detection algorithm for the latter only. To do this, we note that there are two kinds of faults-transient and permanent;the former are benign and short-lived, whereas the latter may have dangerous consequences on system performance. We argue that sensors have an underlying transient fault model that quantifies the amount of time in which transient faults can occur. In addition, we provide a framework for developing such a model if it is not provided by manufacturers. Attacks can manifest as either transient or permanent faults depending on the attacker's goal. We provide different techniques for handling each kind. For the former, we analyze the worst-case performance of sensor fusion over time given each sensor's transient fault model and develop a filtered fusion interval that is guaranteed to contain the true value and is bounded in size. To deal with attacks that do not comply with sensors' transient fault models, we propose a sound attack detection algorithm based on pairwise inconsistencies between sensor measurements. Finally, we provide a real-data case study on an unmanned ground vehicle to evaluate the various aspects of this article.
This article focuses on the design of safe and attack-resilient Cyber-Physical Systems (CPS) equipped with multiple sensors measuring the same physical variable. A malicious attacker may be able to disrupt system perf...
详细信息
This article focuses on the design of safe and attack-resilient Cyber-Physical Systems (CPS) equipped with multiple sensors measuring the same physical variable. A malicious attacker may be able to disrupt system performance through compromising a subset of these sensors. Consequently, we develop a precise and resilient sensor fusion algorithm that combines the data received from all sensors by taking into account their specified precisions. In particular, we note that in the presence of a shared bus, in which messages are broadcast to all nodes in the network, the attacker's impact depends on what sensors he has seen before sending the corrupted measurements. Therefore, we explore the effects of communication schedules on the performance of sensor fusion and provide theoretical and experimental results advocating for the use of the Ascending schedule, which orders sensor transmissions according to their precision starting from the most precise. In addition, to improve the accuracy of the sensor fusion algorithm, we consider the dynamics of the system in order to incorporate past measurements at the current time. Possible ways of mapping sensor measurement history are investigated in the article and are compared in terms of the confidence in the final output of the sensor fusion. We show that the precision of the algorithm using history is never worse than the no-history one, while the benefits may be significant. Furthermore, we utilize the complementary properties of the two methods and show that their combination results in a more precise and resilient algorithm. Finally, we validate our approach in simulation and experiments on a real unmanned ground robot.
Cloud computing is increasingly important, with the industry moving towards outsourcing computational resources as a means to reduce investment and management costs, while improving security, dependability and perform...
详细信息
ISBN:
(纸本)9781479964741
Cloud computing is increasingly important, with the industry moving towards outsourcing computational resources as a means to reduce investment and management costs, while improving security, dependability and performance. Cloud operators use multi-tenancy, by grouping virtual machines (VMs) into a few physical machines (PMs), to pool computing resources, thus offering elasticity to clients. Although cloud-based fault tolerance schemes impose communication and synchronization overheads, the cloud offers excellent facilities for critical applications, as it can host varying numbers of replicas in independent resources. Given these contradictory forces, determining whether the cloud can host elastic critical services is a major research question. We address this challenge from the perspective of a standard three-tiered system with relational data. We propose to tolerate Byzantine faults using groups of replicas placed on distinct physical machines, as a means to avoid exposing applications to correlated failures. To improve the scalability of our system, we divide data to enable parallel accesses. Using a realistic setup, this setting can reach speedups largely exceeding the number of partitions. Even for a wide variation of the load, the system preserves latency and throughput within reasonable bounds. We believe that the elasticity we observe demonstrates the feasibility of tolerating Byzantine faults in a cloud-based server using a relational database.
Recently, cloud computing frameworks have gained popularity for processing large scale parallel data applications. They usually generate enormous amounts of intermediate data which are short-lived, yet are important f...
详细信息
ISBN:
(纸本)9781479957279
Recently, cloud computing frameworks have gained popularity for processing large scale parallel data applications. They usually generate enormous amounts of intermediate data which are short-lived, yet are important for the completion of job. Once there are server failures, it leads to the failures of the intermediate data, and then affects the computation of the whole job. However, the existing fault-tolerant processing approaches only adopt simple replication strategies which can incur significant network overhead, and have no considering of the characteristics of the intermediate data. Therefore, in this paper, we propose an efficient supporting intermediate data fault-tolerant cloud computing framework, named IDF_Support framework. By dividing the computing tasks into different classifications, IDF_Support framework can effectively process the intermediate data failures. Then, two levels based intermediate data fault-tolerant algorithms are proposed, respectively the inner task intermediate data fault-tolerant algorithm (Inner_task IDF) which resolves the fault-tolerance within a task, and the outer task intermediate data fault-tolerant algorithm (Outer_task IDF) which resolves the fault-tolerance among tasks. The experimental results show that our algorithms keep the reliability of the system when there are server failures.
Recently, cloud computing frameworks have gained popularity for processing large scale parallel data *** usually generate enormous amounts of intermediate data which are short-lived, yet are important for the completi...
详细信息
ISBN:
(纸本)9781479957286
Recently, cloud computing frameworks have gained popularity for processing large scale parallel data *** usually generate enormous amounts of intermediate data which are short-lived, yet are important for the completion of *** there are server failures, it leads to the failures of the intermediate data, and then affects the computation of the whole ***, the existing fault-tolerant processing approaches only adopt simple replication strategies which can incur significant network overhead, and have no considering of the characteristics of the intermediate ***, in this paper, we propose an efficient supporting intermediate data fault-tolerant cloud computing framework,named IDF_Support *** dividing the computing tasks into different classifications, IDF_Support framework can effectively process the intermediate data failures, Then, two levels based intermediate data fault-tolerant algorithms are proposed,respectively the inner task intermediate data fault-tolerant algorithm (Inner_task IDF) which resolves the fault-tolerance within a task, and the outer task intermediate data fault-tolerant algorithm (Outer_task IDF) which resolves the fault-tolerance among *** experimental results show that our algorithms keep the reliability of the system when there are server failures.
Overlay networks are expected to operate in hostile environments where node and link failures are commonplace. One way to make overlay networks robust is to design self-stabilizing overlay networks, i.e., overlay netw...
详细信息
Overlay networks are expected to operate in hostile environments where node and link failures are commonplace. One way to make overlay networks robust is to design self-stabilizing overlay networks, i.e., overlay networks that can handle node and link failures without any external supervision. In this paper, we first describe a simple framework, which we call the Transitive Closure Framework (TCF), for the self-stabilizing construction of an extensive class of overlay networks. Like previous self-stabilizing algorithms for overlay networks, TCF permits intermediate node degrees to grow to Omega(n), independent of the maximum degree of the target overlay network. However, TCF has several advantages over previous work in this area: (i) it is a "framework" and can be used for the construction of a variety of overlay networks (e.g. LINEAR, SKIP+), not just a particular network, (ii) it runs in an optimal number of rounds for a variety of overlay networks, and (iii) it can easily be composed with other non-self-stabilizing protocols that can recover from specific bad initial states in a memory-efficient fashion. We demonstrate the power of our framework by deriving from TCF a simple self-stabilizing protocol for constructing SKIP+ graphs [R. Jacob, A. Richa, C. Scheideler, S. Schmid, H. Taubig, A distributed polylogarithmic time algorithm for self-stabilizing skip graphs, in: PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computing, ACM, New York, NY, USA, 2009, pp. 131-1401 that guarantees optimal convergence time from any configuration. (C) 2013 Elsevier B.V. All rights reserved.
暂无评论