Communication models are a key aspect in the design and implementation of distributed system architectures. Application logic must consider the guarantees of these models, which fundamentally influence its correctness...
详细信息
Communication models are a key aspect in the design and implementation of distributed system architectures. Application logic must consider the guarantees of these models, which fundamentally influence its correctness. Modern multi-core processor architectures face a similar problem when it comes to accessing shared memory: the guarantees of an architecture have a fundamental impact on the observable behavior of software. The formalization of these guarantees in a declarative way has led to powerful tools and algorithms to define reusable constraints on patterns of memory access events and their relationships, enabling the efficient description and automatic formal analysis of software properties with respect to a specific architecture. The Cat memory modeling language provides a standard means of specifying these constraints. Despite the parallels, the axiomatic modeling and analysis of communication models in distributed systems remain a relatively unexplored area. In this paper, we address this gap and demonstrate how communication models can be mapped to the Cat language. We create a standard library of reusable patterns and demonstrate our approach, called NetworCat, on the simple examples of UDP and TCP, and we also present its applicability to the vastly configurable OMG-DDS service. This adaptation-based approach enables the use of ever-improving verification tools built for shared memory concurrency on distributed systems. We believe this not only benefits distributed system analyses by broadening the toolset for verification but also positively impacts the field of memory-model-aware verification by widening its audience to another domain.
Cloud computing, edge computing, replicated databases, and various modern technologies adopt distributed computing concepts for a reliable, high-performance, large-scale computational platform. In distributed computin...
详细信息
Cloud computing, edge computing, replicated databases, and various modern technologies adopt distributed computing concepts for a reliable, high-performance, large-scale computational platform. In distributed computing, leader election is a fundamental problem because the elected leader helps coordinate and utilize the resources efficiently. Several state-of-the-art works mentioned that a good quality leader is essential as it improves system performance, simplifies management procedures, reduces coordinational complexity, and makes the system more fault-tolerant. However, designing a self-stabilizing leader election algorithm with weak assumptions in a failure-prone distributed environment is challenging. This paper proposes a multi-attribute-based, self-stabilizing, network partition-tolerant leader election method for failure-prone distributed systems. Here, based on the system requirements, the pertinent attributes of a good quality leader are identified and assigned weights according to their importance. Then, the identified attributes and their weight are used to elect a suitable node as the leader. We show that the algorithm is self-stabilizing and can tolerate multiple nodes and link failures. Further, we analyze the proposed algorithm's time, communication, and bit complexities. We consider a distributed database system scenario to simulate the proposed method and compare it with existing approaches to evaluate and validate the proposed method's performance and the elected leader's quality.
Safety-critical systems are often designed as real-time distributed systems. Despite the need for strong guarantees of safety and reliability in these systems, applying formal verification methods to real-time distrib...
详细信息
Safety-critical systems are often designed as real-time distributed systems. Despite the need for strong guarantees of safety and reliability in these systems, applying formal verification methods to real-time distributed systems at the implementation level has faced significant technical challenges. In this paper, we present VERIRT, an end-to-end formal verification framework that closes the formal gap between high-level abstract timed specifications and low-level implementations for real-time distributed systems. Within the framework, we establish a theoretical foundation for constructing formal timed operational semantics by integrating conventional operational semantics and low-level timing assumptions, along with principles for reasoning about their timed behaviors against abstract specifications. We leverage CompCert's correctness proofs to guarantee the correctness of the assembly implementation of real-time distributed systems. We provide two case studies on realistic real-time systems. All the results are formalized in Coq.
Many distributed systems require temporal properties to hold for correctness. Model checking can verify these properties on a small system but it doesn't scale for arbitrarily large systems. This work presents a n...
详细信息
ISBN:
(纸本)9783031826993;9783031827006
Many distributed systems require temporal properties to hold for correctness. Model checking can verify these properties on a small system but it doesn't scale for arbitrarily large systems. This work presents a new method for proving that temporal properties verified on a small system extend to an arbitrarily large system when that system has a ring topology. It uses a model checker to prove temporal properties and properties of a partial order of events in the system. It then admits the partial order properties as axioms in a theorem prover and proves a conformance relation between an arbitrary-sized ring of nodes and the model-checked base case. The conformance relation is used to prove that adding a new node to the ring does not affect the possible states of the existing nodes in the system and therefore any properties proven in the small system continue to hold in an arbitrarily large system. We demonstrate the approach in a case study of a nontrivial distributed protocol that is used by the MyCHIP's digital currency to clear credit.
Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In offline RV, observation and verificati...
详细信息
Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In offline RV, observation and verification are done in two separate and successive steps. In this paper, we define an approach to offline RV of distributed systems (DS) against interactions. Interactions are formal models describing communications within a DS. A DS is composed of subsystems deployed on different machines and interacting via message passing to achieve common goals. Therefore, observing executions of a DS entails logging a collection of local execution traces, one for each subsystem, collected on its host machine. We call multi-trace such observational artifacts. A major challenge in analyzing multi-traces is that there are no practical means to synchronize the ends of observations of all the local traces. We address this via an operation called lifeline removal, which we apply on-the-fly to the specification during the verification of a multi-trace once a local trace has been entirely analyzed. This operation removes from the interaction the specification of actions occurring on the subsystem that is no longer observed. This may allow further execution of the specification by removing potential deadlock. We prove the correctness of the resulting RV algorithm and introduce two optimization techniques, which we also prove correct. We implement a Partial Order Reduction (POR) technique by selecting a one-unambiguous action (as a unique first step to a linearization) whose existence is determined via the lifeline removal operator. Additionally, Local Analyses (LOC), i.e., the verification of local traces, can be leveraged during the global multi-trace analysis to prove failure more quickly. Experiments illustrate the application of our RV approach and the benefits of our optimizations.
Context: Information Technology education must cultivate proficiency on distributed systems, including strong hands-on laboratory skills, to meet the needs of the society and the industry. Given the complexity of dist...
详细信息
Context: Information Technology education must cultivate proficiency on distributed systems, including strong hands-on laboratory skills, to meet the needs of the society and the industry. Given the complexity of distributed systems, any successful methodology to teach them to novice students must be scaffolded appropriately to ensure that the students acquire the required degree of expertise. Objective: We propose a comprehensive scaffolding approach for inquiry-based hands-on laboratory on distributed systems course, which guides not only the learning process, but also its assessment. The approach is based mainly on embedded scaffolds, namely explicit coding and experimental milestones and open questions with predefined grades, but also features contingent scaffolds provided by the teacher when additional assistance is needed. Method: We apply the methodology in the context of the subject 'distributed Network systems' offered by our university. We compare the students' performance during three academic courses using the proposed methodology with respect to the three previous courses that were still using the former methodology. We use both visual representations and planned Analysis of Variance (ANOVA) tests to verify our hypothesis defined as a complex contrast. Findings: We find that there is a statistically significant improvement in the students' performance when using the new methodology, both in their grades of the assignments (F(1, 75.364) = 17.770, p = 6.85 x 10-5) and, more importantly, also in their grades of the exam questions about the practicals (F(1, 123.186) = 13.285, p = 3.93 x 10-4). Implications: Our results encourage other instructors to incorporate embedded scaffolds for teaching and assessing their hands-on laboratories on distributed systems.
The purpose of this paper is to give some results related to the link between actuator and sensor structures and systems analysis. Abstract concepts of actuators and sensors are introduced and applied to the controlla...
详细信息
The purpose of this paper is to give some results related to the link between actuator and sensor structures and systems analysis. Abstract concepts of actuators and sensors are introduced and applied to the controllability and observability of systems described by partial differential equations. The developed results are illustrated by many examples of specific geometries and systems.
IT experts expect open distributed processing to become the predominant computing infrastructure in the late nineties. All computer supported work places of large enterprises and organizations will then be networked a...
详细信息
IT experts expect open distributed processing to become the predominant computing infrastructure in the late nineties. All computer supported work places of large enterprises and organizations will then be networked and will be integrated into cross-regional and cross-sector business and information processes. The size and complexity of such applications, the local autonomy, distribution and heterogeneity of participating subsystems, and their asynchronous interaction, however, require new architectures, strategies, and tools for their technical management. In previous work we placed a production rule interpreter into the monitoring, decision, control action loop to provide flexible, operational semantics of well-understood management policies. In this article we extend this work in two directions. First we map the structure and dynamic behavior of policies into a graph representation. This semantic representation enables a systematic prediction of the effects of policy executions and allows for a better impact analysis in case of policy changes. Then we introduce a declarative event definition mechanism. It supports a causal and temporal correlation of individual events and serves to instantiate and adapt a predefined generic event handler to the specific needs of the actual management application. Such event handlers join in the interaction between monitoring agents and policy interpreter. By event correlation they may reduce the number of events triggering management actions significantly and help to filter secondary events.
Based on the specific characteristics of electronic commerce (E-Commerce) requirements for an adequate software system support, this contribution gives an overview of the respective distributed systems technology whic...
详细信息
ISBN:
(纸本)3540652604
Based on the specific characteristics of electronic commerce (E-Commerce) requirements for an adequate software system support, this contribution gives an overview of the respective distributed systems technology which is (or will be shortly) available for open and heterogeneous electronic commerce applications. Starting from basic communication mechanisms this includes (transactionally secure) remote procedure call and database access mechanisms, service trading and brokerage functions as well as security aspects including such as notary and non-repudiation functions. Further important elements of a system infrastructure for E-Commerce applications are: common middleware infrastructures, componentware techniques, distributed and mobile agent technologies etc. Increasingly new and important: topics in this area are currently: workflow management support for compound and distributed E-Commerce services as well as negotiation protocols to support both the settlement and the fulfillment of electronic contracts in E-Commerce applications. In addition to an overview of the state of the art of the respective technology, the paper also presents briefly some aspects of related projects conducted by the authors jointly with international partners (sponsored by EU/ACTS, EU/ESPRIT, DFG) in order to realize some of the important new functions of a systems infrastructure for open distributed E-Commerce applications.
This paper describes a methodology for working with distributed systems, and achieve performance in Big Data, through the framework Hadoop, Python programming language, and Apache Hive module. The efficiency of the pr...
详细信息
ISBN:
(纸本)9783319324678;9783319324661
This paper describes a methodology for working with distributed systems, and achieve performance in Big Data, through the framework Hadoop, Python programming language, and Apache Hive module. The efficiency of the proposed methodology is tested through a case study that addresses a real problem found in the supercomputing environment of the Center for Weather Forecasting and Climate Studies linked to the Brazilian Institute for Space Research (CPTEC / INPE), which provides Society a work able to predict disasters and save people lives. In all three experiments involving the issue, using the Cray XT-6 supercomputer: (i) the first issue involves programming in Python and a sequential and monoprocessed arquitecture;(ii) the second uses Python and Hadoop framework, over parallel and distributed arquitecture;(iii) the latter combines Hadoop and Hive in a parallel and distributed arquitecture. The main results of these experiments are compared, discussed, and topics beyond the scope in this research are exposed as recommendations and suggestions for future work.
暂无评论