This paper describes a nonblocking checkpointing mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific opera...
详细信息
This paper describes a nonblocking checkpointing mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific operations (e.g., event list update, event execution) with the aim of removing the, cost of recording state information from the completion time of the parallel simulation application. We present an implementation of a C library supporting nonblocking checkpointing on a myrinet based cluster, which demonstrates the practical viability of this checkpointing mode on standard off-the-shelf hardware. By the results of an empirical study on classical parameterized synthetic benchmarks, we show that, except for the case of minimal state granularity applications, nonblocking checkpointing allows improvement of the speed of the parallel execution, as compared to commonly adopted, optimized checkpointing methods based on the classical blocking mode. A performance study for the case of a Personal Communication System (PCS) simulation is additionally reported to point out the benefits from nonblocking checkpointing for a real world application.
Fault-tolerant, real-time communication in distributedsystems is very important yet difficult to achieve. Traditional protocols like the TCP/IP achieve reliable communication through acknowledgment and retransmission...
详细信息
Fault-tolerant, real-time communication in distributedsystems is very important yet difficult to achieve. Traditional protocols like the TCP/IP achieve reliable communication through acknowledgment and retransmission schemes, where one achieves the reliability at the cost of performance. In this paper, we discuss how both the timeliness and fault-tolerance of communication can be achieved by using the concept of real-time channel [1] and exploring the inherent spatial redundancy of a given network topology. Specifically, we show how isolated failure immune real-time channels can be established in wrapped hexagonal mesh networks, thus ensuring timely delivery of messages in the presence of network component failures as long as the failures are isolated. This kind of fault-tolerance cannot be achieved with other commonly-known topologies like rings, rectangular meshes, and hypercubes. The proposed approach is to be implemented in an experimental distributedreal-time system, called HARTS [2], whose construction is underway.
Different distributedreal-timesystems (DRS) must handle aperiodic and periodic events under diverse sets of requirements. While existing middleware such as real-time CORBA has shown promise as a platform for distrib...
详细信息
Different distributedreal-timesystems (DRS) must handle aperiodic and periodic events under diverse sets of requirements. While existing middleware such as real-time CORBA has shown promise as a platform for distributedsystems with time constraints, it lacks flexible configuration mechanisms needed to manage end-to-end timing easily for a wide range of different DRS with both aperiodic and periodic events. The primary contribution of this work is the design, implementation, and performance evaluation of the first configurable component middleware services for admission control and load balancing of aperiodic and periodic event handling in DRS. Empirical results demonstrate the need for, and the effectiveness of, our configurable component middleware approach in supporting different applications with aperiodic and periodic events, and providing a flexible software platform for DRS with end-to-end timing constraints.
This paper illustrates that several of the new features specified in the revised Ada standard facilitate programming real-timedistributed/parallel applications. In particular, the Ada distributedsystems Annex suppor...
详细信息
This paper outlines the key areas of research in distributedreal-timesystems that are being investigated within the Spring Project at the University of Massachusetts. This includes reflective, multiprocessor operati...
详细信息
A real-time fault-tolerant multicast protocol is necessary for obtaining high performance in operating distributedreal-time computing systems. The purpose of this paper is to show the efficiency of RFRM (Release-time...
详细信息
ISBN:
(纸本)0769515762
A real-time fault-tolerant multicast protocol is necessary for obtaining high performance in operating distributedreal-time computing systems. The purpose of this paper is to show the efficiency of RFRM (Release-time based Fault-tolerant real-time Multicast) protocol which is based on the idea of attaching the official release time to each multicast message. As a part of this, a real-time simulation based on the TMO structuring scheme is conducted to evaluate the proposed approach. We experiment a real-time multicast model which does not receive ack-messages toward reducing the message traffic on the network by employing fault detection mechanism. Simulation results promised the efficiency of the proposed real-time multicast protocol.
The emerging discipline of responsive systems demands fault-tolerant and real-time performance in uniprocessor, parallel, and distributed computing environments. The new proposal for responsiveness measure is presente...
详细信息
The emerging discipline of responsive systems demands fault-tolerant and real-time performance in uniprocessor, parallel, and distributed computing environments. The new proposal for responsiveness measure is presented, followed by an introduction of a model for responsive computing. The model, called CONCORDS (CONsensus/COmputation for Responsive distributedsystems), is based on the integration of various forms of consensus and computation (progress or recovery). The consensus tasks include clock synchronization, diagnosis, check-pointing scheduling and resource allocation.
An object-oriented application software development approach with fault tolerance for distributedreal-timesystems is presented. This approach is based on the parallel Object-Oriented Functional computation model (PR...
详细信息
ISBN:
(纸本)0818675705
An object-oriented application software development approach with fault tolerance for distributedreal-timesystems is presented. This approach is based on the parallel Object-Oriented Functional computation model (PROOF) with real-time and fault tolerance features. The real-time constraints are satisfied by using multi-version methods, which are based on virtual method definitions with different underlying implementations, and encapsulation of time constraints in objects. The fault tolerance in application software layer is supported by monitoring, checkpoints and recovery.
With the addition of special needs annexes to Ada 95, the traditional reliance on Ada Compiler Validation Capability style testing may not suffice. This paper explores some alternatives for testing a portion of the Di...
详细信息
This paper presents an approach to parallel implementation of wavelet transforms in a distributed computing environment. To achieve robustness and efficiency, we proposed a parallel algorithm for wavelet transform whi...
详细信息
This paper presents an approach to parallel implementation of wavelet transforms in a distributed computing environment. To achieve robustness and efficiency, we proposed a parallel algorithm for wavelet transform which can be implemented in SIMD, MIMD and pipeline architectures on the configured system. Our experimental results show that our proposed algorithm will speed up the wavelet-based image processing tasks on a network of computer workstation clusters. (C) 2000 Academic Press.
暂无评论