In many distributed systems, tokens are fundamental tools to manage resources shared by processes. Thus, monitoring tokens has become a significant problem in developing the distributed programs. This paper formulates...
详细信息
In many distributed systems, tokens are fundamental tools to manage resources shared by processes. Thus, monitoring tokens has become a significant problem in developing the distributed programs. This paper formulates the problems of monitoring tokens in terms of detecting the special global predicates, called summative global predicates. In this paper, several algorithms to detect various summative global predicates are developed and their time complexities are discussed.
Testbeds for wireless IoT devices facilitate testing and validation of distributed target nodes. A testbed usually provides methods to control, observe, and log the execution of the software. However, most of the meth...
详细信息
Testbeds for wireless IoT devices facilitate testing and validation of distributed target nodes. A testbed usually provides methods to control, observe, and log the execution of the software. However, most of the methods used for tracing the execution require code instrumentation and change essential properties of the observed system. Methods that are non-intrusive are typically not applicable in a distributed fashion due to a lack of time synchronization or necessary hardware/software support. In this article, we present a tracing system for validating time-critical software running on multiple distributed wireless devices that does not require code instrumentation, is non-intrusive and is designed to trace the distributed state of an entire network. For this purpose, we make use of the on-chip debug and trace hardware that is part of most modern microcontrollers. We introduce a testbed architecture as well as models and methods that accurately synchronize the timestamps of observations collected by distributed observers. In a case study, we demonstrate how the tracing system can be applied to observe the distributed state of a flooding-based low-power communication protocol for wireless sensor networks. The presented non-intrusive tracing system is implemented as a service of the publicly accessible open source FlockLab 2 testbed.
An integrated system design for debuggingdistributed programs written in concurrent high-level languages is described. A variety of user-interface, monitoring, and analysis tools integrated around a uniform process m...
详细信息
An integrated system design for debuggingdistributed programs written in concurrent high-level languages is described. A variety of user-interface, monitoring, and analysis tools integrated around a uniform process model are provided. Because the tools are language-based, the user does not have to deal with low-level implementation details of distribution and concurrency, and instead can focus on the logic of the program in terms of language-level objects and constructs. The tools provide facilities for experimentation with process scheduling, environment simulation, and nondeterministic selections. Presentation and analysis of the program's behavior are supported by history replay, state queries, and assertion checking. Assertions are formulated in linear time temporal logic, which is a logic particularly well suited to specify the behavior of distributed programs. The tools are separated into two sets. The language-specific tools are those that directly interact with programs for monitoring of and on-line experimenting with distributed programs. The language-independent tools are those that support off-line presentation and analysis of the monitored information. This separation makes the system applicable to a wide range of programming languages. In addition, the separation of interactive experimentation from off-line analysis provides for efficient exploitation of both user time and machine resources. The implementation of a debugging facility for OCCAM is described.
Sensor network debugging is notoriously difficult because many bugs manifest themselves only when they encounter the real world -- exactly when most powerful debugging tools can no longer be applied. There are currect...
详细信息
ISBN:
(纸本)9781595937636
Sensor network debugging is notoriously difficult because many bugs manifest themselves only when they encounter the real world -- exactly when most powerful debugging tools can no longer be applied. There are currectly two common approaches to source level debugging in wireless sensor networks (WSNs), (i) simulation-based debugging [3], and (ii) wire-based debugging via logic analyzers or in-circuit emulators [1]. The former do not capture the true dynamics of a real deployment while the latter do not scale with the number of nodes and geographic size of the network. Thus, existing source-level debugging approaches for WSNs cannot be used in many real deployment environments.
We present a three-part approach for diagnosing bugs and performance problems in production distributed environments. First, we introduce a novel execution monitoring technique that dynamically injects a fragment of c...
详细信息
ISBN:
(纸本)9783540898559
We present a three-part approach for diagnosing bugs and performance problems in production distributed environments. First, we introduce a novel execution monitoring technique that dynamically injects a fragment of code, the agent, into an application process on demand. The agent inserts instrumentation ahead of the control flow within the process and propagates into other processes, following communication events, crossing host boundaries, and collecting a distributed function-level trace of the execution. Second, we present an algorithm that separates the trace into user-meaningful activities called flows. This step simplifies manual examination and enables automated analysis of the trace. Finally, we describe our automated root cause analysis technique that compares the flows to help the analyst locate an anomalous flow and identify a function in that flow that is a likely cause of the anomaly. We demonstrate the effectiveness of our techniques by diagnosing two complex problems in the Condor distributed scheduling system.
暂无评论