The author presents an algorithm for maintaining consistency and improving the performance of databases with replicated data in distributed real-time systems. The semantic information of read-only transactions is used...
详细信息
ISBN:
(纸本)0818608153
The author presents an algorithm for maintaining consistency and improving the performance of databases with replicated data in distributed real-time systems. The semantic information of read-only transactions is used for improved efficiency, and a multiversion technique is used to increase the degree of concurrency. Related issues, including the consistency of the states seen by transactions, version management, and recovery of replicated data in distributedsystems. are discussed.
Characterizing latent software faults is crucial to address dependability issues of current three-tier systems. A client should not have a misconception that a transaction succeeded, when in reality, it failed due to ...
详细信息
ISBN:
(纸本)9780769544502
Characterizing latent software faults is crucial to address dependability issues of current three-tier systems. A client should not have a misconception that a transaction succeeded, when in reality, it failed due to a silent error. We present a fault injection-based evaluation to characterize silent and non-silent software failures in a representative three-tier web service, one that mimics a day trading application widely used for benchmarking application servers. For failure characterization, we quantify distribution of silent and non-silent failures, and recommend low cost application-generic and application-specific consistency checks, which improve the reliability of the application. We inject three variants of null-call, where a callee returns null to the caller without executing business logic. Additionally, we inject three types of unchecked exceptions and analyze the reaction of our application. Our results show that 49% of error injections from null-calls result in silent failures, while 34% of unchecked exceptions result in silent failures. Our generic-consistency check can detect silent failures in null-calls with an accuracy as high as 100%. Non-silent failures with unchecked exceptions can be detected with an accuracy of 42% with our application-specific checks.
Advances in field reconfigurable technology have made possible the design and implementation of highly flexible parallel multi-processor-memory systems.system reliability is often an important measure of these systems...
MEADEP is a user-friendly dependability evaluation tool for measurement-based analysis of computing systems.including both hardware and software. Features of MEADEP are: a data processor for converting data in various...
详细信息
MEADEP is a user-friendly dependability evaluation tool for measurement-based analysis of computing systems.including both hardware and software. Features of MEADEP are: a data processor for converting data in various formats (records with a number of fields stored in a commercial database format) to the MEADEP format, a statistical analysis module for graphical data presentation and parameter estimation, a graphical modeling interface for constructing reliability block and Markov diagrams, and a model solution module for availability/reliability calculation with graphical parametric analysis. Use of the tool on failure data from measurements can provide quantitative assessments of dependability for critical systems. while greatly reducing requirements for specialized skills in data processing, analysis, and modeling from the user. MEADEP has been applied to evaluate dependability for several air traffic control systems.(ATC) and results produced by MEADEP have provided valuable feedback to the program management of these critical systems.
Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located, and corrected using a policy-driven ...
详细信息
Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located, and corrected using a policy-driven fault management system. Other work in this are to date has focused on network-level faults. We believe that in a distributed system it is more appropriate to focus on faults at the application level. Furthermore, this work has been largely domain specific - a generic, structured approach to this problem is needed. Our work has focused on policy-driven fault management in distributedsystems.at the application level. In this paper, we define a generic architecture for policy-driven fault management, and present a prototype system based on this architecture. We also discuss experience to date using and experimenting with our prototype system.
Creating robust software requires not only careful specification and implementation, but also quantitative measurement. This paper describes Ballista exception handling testing of the High Level Architecture Run-Time ...
详细信息
Creating robust software requires not only careful specification and implementation, but also quantitative measurement. This paper describes Ballista exception handling testing of the High Level Architecture Run-Time Infrastructure (HLA RTI). The RTI is a standard distributed simulation system intended to provide completely robust exception handling, yet implementations have normalized robustness failure rates as high as 10%. Non-robust testing responses include exception handler crashes, segmentation violations, `unknown' exceptions, and task hangs. Other issues include different robustness failure modes across ports to two operating systems. and mandatory client machine rebooting after a particular RTI failure. Testing the RTI led to scalable extensions of the Ballista architecture for handling exception-based error reporting models, testing object-oriented software structures (including call-backs, pass by reference, and constructors), and operating in a state-rich, distributed system environment. These results demonstrate that robustness testing can provide useful feedback to high-quality software development processes, and can be applied to domains well beyond the previous work on testing operating systems.
The authors present an election protocol that does not assume an underlying ring structure and that tolerates failures, including lost messages and network partitioning, during the execution of the protocol itself. Th...
详细信息
ISBN:
(纸本)0818608757
The authors present an election protocol that does not assume an underlying ring structure and that tolerates failures, including lost messages and network partitioning, during the execution of the protocol itself. The major problem to be solved is that when nodes cannot communicate with one another or messages are lost, a conflict in resolving the election will often arise. In the authors' approach, the conflict is detected by the cohorts (noncandidate participants in the election). Related election protocols are discussed, and the system model is described together with assumptions about the communication subsystem. The protocol and the lost-message situations are then examined.
This paper presents a software modeling environment for estimating the performance of distributeddatabasesystems. This tool supports a simulation language, HGPSS, which comprises various simulation primitives, conta...
详细信息
ISBN:
(纸本)0818619465
This paper presents a software modeling environment for estimating the performance of distributeddatabasesystems. This tool supports a simulation language, HGPSS, which comprises various simulation primitives, contains a collection of network modules, and allows for the collection of statistics. This provides an overview of the HGPSS environment emphasizing its applicability to the modeling of distributeddatabases.
This paper describes how parallel retrieval is implemented in the content-based visual information retrieval framework VizIR. Generally, two major use cases for parallelisation exist in visual retrieval systems. distr...
详细信息
ISBN:
(纸本)0819455547
This paper describes how parallel retrieval is implemented in the content-based visual information retrieval framework VizIR. Generally, two major use cases for parallelisation exist in visual retrieval systems.distributed querying and simultaneous multi-user querying. distributed querying includes parallel query execution and querying multiple databases. Content-based querying is a two-step process: transformation of feature space to distance space using distance measures and selection of result set elements from distance space. Parallel distance measurement is implemented by sharing example media and query parameters between querying threads. In VizIR, parallelisation is heavily based on caching strategies. Querying multiple distributeddatabases is already supported by standard relational database management systems. The most relevant issues here are error handling and minimisation of network bandwidth consumption. Moreover, we describe strategies for distributed similarity measurement and content-based indexing. Simultaneous multi-user querying raises problems such as caching of querying results and usage of relevance feedback and user preferences for query refinement. We propose a 'real' multi-user querying environment that allows users to interact in defining queries and browse through result sets simultaneously. The proposed approach opens an entirely new field of applications for visual information retrieval systems.
In order to guarantee data reliability in distributed storage systems. erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to rep...
详细信息
ISBN:
(纸本)9781479955848
In order to guarantee data reliability in distributed storage systems. erasure codes are widely used for the desirable storage properties. Nevertheless, the codes have one drawback that overmuch data are needed to repair a failure, resulting in both large bandwidth consuming in the network and high calculation pressure on the replacement node. For repair bandwidth problem, researchers derive the tradeoff between storage and repair traffic from network coding and propose regenerating codes. However, the constructions of regenerating codes complicate the systems.as well as recovery calculation. Hence, this paper proposes a distributed repair method based on general erasure codes to mitigate the burden of both recovery computation and network traffic. We observe that distributing recovery computation among helpers can distract the whole calculation procedure and accelerate repair speed in practical systems. Furthermore, by combining this technique with network topology, we introduce a novel repair tree to minimize repair traffic. Repair tree is also derived from network coding. The performance of the repair tree is preliminarily analyzed and evaluated, which infers that the storage-bandwidth bound of regenerating codes can be broken under this model.
暂无评论