作者:
Wang, YMAT&T Bell Labs.
Murray Hill NJ USA Abstract Authors References Cited By Keywords Metrics Similar Download Citation Email Print Request Permissions
In this paper, we consider the problem of constructing consistent global checkpoints that contain a given set of checkpoints. We address three important issues related to this problem. First, we define the maximum and...
详细信息
In this paper, we consider the problem of constructing consistent global checkpoints that contain a given set of checkpoints. We address three important issues related to this problem. First, we define the maximum and minimum consistent global checkpoints containing a set S, and give algorithms to construct them. These algorithms are based on reachability analysis on a rollback-dependency graph. Second, we introduce a concept called ''rollback-dependency trackability'' that enables this analysis to be performed efficiently for a certain class of checkpoint and communication models. We define the least stringent of these models (''FDAS''), and put it in context with other models defined in the literature. Significant in this is a way to use FDAS to provide efficient rollback recovery for applications that do not satisfy perfect piecewise determinism. Finally, we describe several applications of the theorems and algorithms derived in this paper to demonstrate the capability of our approach to unify, generalize, and extend many previous works.
distributed applications are hard to debug because timing-dependent network communication is a source of non-deterministic behavior. Current approaches to debug non deterministic failures include post-mortem debugging...
详细信息
distributed applications are hard to debug because timing-dependent network communication is a source of non-deterministic behavior. Current approaches to debug non deterministic failures include post-mortem debugging as well as record and replay. However, the first impairs system performance to gather data, whereas the latter requires developers to understand the timing-dependent communication at a lower level of abstraction than they develop at. Furthermore, both approaches require intrusive core library modifications to gather data from live systems. In this paper, we present the Peek-At-Talk debugger for investigating non-deterministic failures with low overhead in a systematic, top-down method, with a particular focus on tool-building issues in the following areas: First, we show how our debugging framework Path Tools guides developers from failures to their root causes and gathers run-time data with low overhead. Second, we present Peek-At-Talk, an extension to our Path Tools framework to record non-deterministic communication and refine behavioral data that connects source code with network events. Finally, we scope changes to the core library to record network communication without impacting other network applications. (C) 2015 Elsevier B.V. All rights reserved.
In a distributed system, detecting whether a given logical predicate is true on the global states is fundamental for testing and debugging the program. Detecting predicates by examining all global states is intractabl...
详细信息
In a distributed system, detecting whether a given logical predicate is true on the global states is fundamental for testing and debugging the program. Detecting predicates by examining all global states is intractable due to the combinatorial nature of the problem. This work designs an efficient online algorithm that identifies the consistent and useless states each time a new state is reported. This paper formulates the optimality of detecting algorithms in terms of pseudo states, which are employed to represent unknown states to the monitor process. Based on this technique, memory space of the debugger can be minimized by removing the useless states without affecting the debugging results. While minimizing memory space, the proposed algorithm requires only O(p (2) M) time in total, where p is the number of processes, and M is the number of reported states.
Development of multi-agent systems is a complex process during which use of adequate development tools is essential. This paper focuses on debugging of multi-agent systems. A survey of existing multi-agent systems dev...
详细信息
Development of multi-agent systems is a complex process during which use of adequate development tools is essential. This paper focuses on debugging of multi-agent systems. A survey of existing multi-agent systems development tools is presented first, Subsequently a new tool is introduced, based on the innovative approach of the developer's conceptual models, These are multiple complementary abstractions of the multi-agent system concentrating on specific aspects of the system. In relation to these abstractions, several system views are defined that allow a graphical representation of the selected aspects of the system state and its dynamic behaviour, It is argued that these views support effectively understanding of system behaviour and thus efficient debugging. Examples of an implementation of this approach in the frame of the Esprit Project ARCHON are also discussed.
To support incremental replay of message-passing applications. processes must periodically checkpoint and the content of some messages must be logged, to break dependencies of the current slate of the execution on pas...
详细信息
To support incremental replay of message-passing applications. processes must periodically checkpoint and the content of some messages must be logged, to break dependencies of the current slate of the execution on past events. This paper shows that known adaptive logging algorithms are likely to introduce deadlocks in replay, and we introduce a new algorithm that: (i) prevents deadlocks in replay and (ii) enables the tuning of its behavior to meet specific user needs. (C) 2001 Academic Press.
The predicate control problem involves synchronizing a distributed computation to maintain a given global predicate. In contrast with many popular distributed synchronization problems such as mutual exclusion, readers...
详细信息
The predicate control problem involves synchronizing a distributed computation to maintain a given global predicate. In contrast with many popular distributed synchronization problems such as mutual exclusion, readers writers, and dining philosophers, predicate control assumes a look-ahead, so that the computation is an off-line rather than an on-line input. Predicate control is targeted towards applications such as rollback recovery, debugging, and optimistic computing, in which such computation look-ahead is natural. We define predicate control formally and show that, in its full generality, the problem is NP-complete. We find efficient solutions for some important classes of predicates including "disjunctive predicates", "mutual exclusion predicates", and "readers writers predicates". For each class of predicates, we determine the necessary and sufficient conditions for solving predicate control and describe an efficient algorithm for determining a synchronization strategy. In the case of "independent mutual exclusion predicates", we determine that predicate control is NP-complete and describe an efficient algorithm that finds a solution under certain constraints. (C) 2003 Elsevier Inc. All rights reserved.
It has been shown that global predicate detection in a distributed computation is an NP-complete problem in general. However, efficient predicate detection algorithms exist for some subclasses of predicates, such as s...
详细信息
It has been shown that global predicate detection in a distributed computation is an NP-complete problem in general. However, efficient predicate detection algorithms exist for some subclasses of predicates, such as stable predicates, observer-independent predicates, conjunctions of local predicates, channel predicates, etc. We show here that the problem of deciding whether a given predicate is a member of any of these tractable subclasses is NP-hard in general. We also explore the tractability of linear and regular predicates. In particular, we show that, unless RP = NP, there is no polynomial-time algorithm to detect possibly: B for linear and regular predicates B. (c) 2005 Elsevier B.V. All rights reserved.
We describe two problems and their optimal solutions for partially ordered sets. We first describe an optimal algorithm for computing the largest anti-chain of a partially ordered set given its decomposition into its ...
详细信息
We describe two problems and their optimal solutions for partially ordered sets. We first describe an optimal algorithm for computing the largest anti-chain of a partially ordered set given its decomposition into its chains. Our algorithm requires O(n2m) comparisons where n is the number of chains and m is the maximum number of elements in any chain. We also give an adversary argument to prove that this is a lower bound. Our second problem requires us to find if the given poset is a total order. Our optimal algorithm requires 0(mn log n) comparisons. These algorithms have applications in distributed debugging and recovery in distributed systems.
Recently published algorithms for matching concurrent sets of events have the problem of unbounded message queue growth if events arrive in an undesirable order. This paper presents some algorithms that mitigate this ...
详细信息
Recently published algorithms for matching concurrent sets of events have the problem of unbounded message queue growth if events arrive in an undesirable order. This paper presents some algorithms that mitigate this problem by examining events waiting to be processed and removing those that cannot be part of a concurrent set.
This paper proposes a framework for detecting global state predicates in systems of processes with approximately-synchronized real-time clocks. Timestamps from these clocks are used to define two orderings on events: ...
详细信息
This paper proposes a framework for detecting global state predicates in systems of processes with approximately-synchronized real-time clocks. Timestamps from these clocks are used to define two orderings on events: "definitely occurred before" and "possibly occurred before". These orderings lead naturally to definitions of 3 distinct detection modalities, i.e., 3 meanings of "predicate Phi held during a computation", namely: Poss (db) Phi ("Phi, possibly held"), Def (db) Phi, ("Phi definitely held"), and Inst Phi ("Phi definitely held in a specific global state"). This paper defines these modalities and gives efficient algorithms for detecting them. The algorithms an: based on algorithms of Garg and Waldecker, Alagar and Venkatesan, Cooper and Marzullo, and Fromentin and Raynal. Complexity analysis shows that under reasonable assumptions, these real-time-clock-based detection algorithms are less expensive than detection algorithms based on Lamport's happened-before ordering. Sample applications are given to illustrate the benefits of this approach.
暂无评论