Porting large parallel applications to new and various distributed computing platforms is a challenging task from a software engineering perspective. The primary aim of this paper is to demonstrate how the development...
详细信息
ISBN:
(纸本)0769515827
Porting large parallel applications to new and various distributed computing platforms is a challenging task from a software engineering perspective. The primary aim of this paper is to demonstrate how the development time to port very large applications to the Computational Grid can be significantly reduced. TOP-C and AMPIC are software packages that have each seen successful applications in their respective domains of parallel computing and process creation/communication over the Computational Grid. We combined the two packages in I man-week, thereby leveraging several man-years of previous independent software development. As a real world test case, the 1,000,000 line Geant4 sequential application was then deployed over the Computational Grid in 3 man-weeks by using TOP-C/AMPIC. The cluster parallelization of Geant4 using TOP-C is now included as part of the Geant44.1 distribution, and the integration of TOP-C/AMPIC and the Globus protocols will additionally enable the use of the fundamental Grid middleware services in the future. (C) 2003 Elsevier Science B.V. All rights reserved.
The ’distributed Control Lab’ [6] at Hasso-Plattner-Institute, University of Potsdam allows experimentation with a variety of physical equipment via the web (intra and internet), among them the Lego Mindstorm robots...
详细信息
The ’distributed Control Lab’ [6] at Hasso-Plattner-Institute, University of Potsdam allows experimentation with a variety of physical equipment via the web (intra and internet), among them the Lego Mindstorm robots and Foucault’s Pendulum. In order to conduct control experiments, students may write programs, which are validated, run on a simulator, and eventually downloaded on the actual control device. We use online replacement of software components (dynamic re-configuration) as a safeguard mechanism to avoid damage to our hardware. Our research focuses on the extension of middleware concepts to embedded devices. The component-based architecture of the laboratory in conjunction with given timing and safety constraints dictated by the experiments make our infrastructure an ideal candidate for studying system predictability, availability and security in context of middleware-based dynamic control systems. Within this paper we are going to describe our extensible architecture for hosting physical control experiments and focus on Foucault’s Pendulum as a case study. For the Pendulum we have implemented a dynamic *** algorithm, which is able to replace erroneous user-supplied control programs with a verified safety controller at runtime. In addition we are going to discuss the design of custom-built controller hardware which allows us to meet the timing constraints of the Pendulum experiment with a commercial-off-the-shelf (COTS) operating system and middleware. Architectural characteristics of our hardware and software as well as a performance evaluation of the *** process will be discussed in some detail.
A process fails by omission if it "forgets" to send or receive messages. Considering omission failures is crucial for distributedsystems, as such failures model both crash failures and incorrect behavior of...
详细信息
A process fails by omission if it "forgets" to send or receive messages. Considering omission failures is crucial for distributedsystems, as such failures model both crash failures and incorrect behavior of process input/output buffers (such as buffer overflow). So, designing protocols that cope not only with crash failures but also with omission failures is a real challenge as soon as one is interested in obtaining real-time dependable distributedsystems. While the consensus problem has received a lot of attention in the crash failure model and in the Byzantine failure model, it has received less attention in the omission failure model. This paper presents a simple uniform consensus protocol for synchronous systems made up of n processes where up to t can commit crash or omission failures. This protocol requires t+1 communication steps. Interestingly, as this bound is tight for crash failures and those are included in omission failures, this shows that t+1 is a tight lower bound for protocols solving uniform consensus in synchronous systems prone to process omission failures. The protocol assumes t
Most of today's distributed computing systems in the field do not support the migration of execution entities among computing nodes during runtime. The relatively static association between units of processing and...
详细信息
Most of today's distributed computing systems in the field do not support the migration of execution entities among computing nodes during runtime. The relatively static association between units of processing and computing nodes makes it difficult to implement fault-tolerant behavior or load-balancing schemes. The concept of code migration may provide a solution to the problems mentioned above. It can be defined as the movement of process, object or component instances from one computing node to another during system runtime in a distributed environment. Within our paper we describe the integration of a migration facility with the help of aspect-oriented programming (AOP) into the .NET framework. AOP is interesting as it addresses nonfunctional system properties on the middleware level, without the need to manipulate lower system layers like the operating system itself. We have implemented two proof-of-concept applications, namely a migrating Web server as well as a migrating file version checker application. The paper contains an experimental evaluation of the performance impact of object migration in context of those two applications.
Over the past 10 years, Thales Naval Nederland (TNN) has successfully applied a pure data-centric architecture called SPLICE in its naval Combat Management systems This fielded architecture provides the essential non-...
详细信息
Over the past 10 years, Thales Naval Nederland (TNN) has successfully applied a pure data-centric architecture called SPLICE in its naval Combat Management systems This fielded architecture provides the essential non-functional properties as demanded in these mission-critical environments such as (real-time) performance, scalability, fault-tolerance and evolveability. Thales recently contributed this knowledge and experience in a joint submission regarding the OMG's Data Distribution Service (DDS) for real-timesystems. The SPLICE architecture is characterized by autonomous applications with minimal dependencies where function and interaction are clearly separated and SPLICE-agents act as real-time information brokers. SPLICE thus offers a normalized environment that is designed once for all applications and which delivers 'the right information at the right place at the right time'.
In applications where data needs to be shared among distributed components it is desirable to have overall data consistency at all times. This is crucial for safety-critical systems, where inconsistency can lead to fa...
详细信息
In applications where data needs to be shared among distributed components it is desirable to have overall data consistency at all times. This is crucial for safety-critical systems, where inconsistency can lead to failures. Overall continuous data consistency is, however, rarely possible to achieve. For distributedsystems, a relaxed view based on the temporal validity of data can be proven sufficient. If components in a distributed computer-based system have different temporal validity constraints for the same data, then as long as these constraints are satisfied overall system inconsistency is not harmful. We propose the use of a formal analysis technique for guaranteeing temporal validity of shared data. The approach is based on a real-time temporal logic of knowledge suitable for verification through model checking. It allows us to check that the shared data in the system is consistent "enough" and cannot be a source of failure. We illustrate the approach with an open dynamic real-timedistributed computer-based system.
We present a framework that will enable scalable analysis and design of graceful degradation in distributed embedded systems. We define graceful degradation in terms of utility. A system that gracefully degrades suffe...
详细信息
We present a framework that will enable scalable analysis and design of graceful degradation in distributed embedded systems. We define graceful degradation in terms of utility. A system that gracefully degrades suffers a proportional loss of system utility as individual software and hardware components fail. However, explicitly designing a system to gracefully degrade; i.e. handle all possible combinations of component failures, becomes impractical for systems with more than a few components. We avoid this exponential complexity of component combinations by exploiting the structure of the system architecture to partition components into subsystems. We view each subsystem as a configuration of components that changes when components are removed or added. Thus, a subsystem's utility changes when components fail or are repaired. We then view the system as a composition of subsystems that each contribute to overall system utility. We demonstrate the scalability of our framework by applying it to an example automobile navigation system. Using this framework, we improve the system dependability by identifying architectural properties that enhance a system's ability to gracefully degrade.
In the past few years the massive deployment of real-time multimedia services has motivated the research community to investigate new quality of service (QoS) mechanisms to overcome the limitations of IP networks. Sin...
详细信息
In the past few years the massive deployment of real-time multimedia services has motivated the research community to investigate new quality of service (QoS) mechanisms to overcome the limitations of IP networks. Since assured levels of service must be provided, these mechanisms should interact flexibly with network performance management systems. When such integration is achieved, it is possible to trigger actions effectively to prevent QoS failures and simultaneously to balance network load. Mechanisms for rerouting traffic flows of QoS critical applications may be employed to support performance management systems in satisfying the application requirements. In addition, proactive actions could be taken before applications are affected by QoS failures. To meet these goals, a proactive network management and rerouting framework is introduced. The proposed framework is based on active technology and aims at providing the means needed for establishing new routes with sufficient resources for traffic flows of QoS critical distributed applications. A system prototype using the proposed framework was implemented and test results show that its deployment is feasible considering the hardware processing capability available today.
The proceedings contain 27 papers. The special focus in this conference is on Modeling, Resource Allocation, Admission Control, Multimedia and Incentives. The topics include: Modelling, measurements, and admission con...
ISBN:
(纸本)3540402810
The proceedings contain 27 papers. The special focus in this conference is on Modeling, Resource Allocation, Admission Control, Multimedia and Incentives. The topics include: Modelling, measurements, and admission control;statistical characterization for per-hop QOS;performance analysis of server sharing collectives for content distribution;an approximation of the end-to-end delay distribution;price-based resource allocation in wireless ad hoc networks;on achieving fairness in the joint allocation of processing and bandwidth resources;distributed admission control for heterogeneous multicast with bandwidth guarantees;subjective impression of variations in layer encoded videos;a moving average predictor for playout delay control in VOIP;a game-based control-theoretic approach to peer-to-peer incentive engineering;improving dependability of real-time communication with preplanned backup routes and spare resource pool;fault tolerance in networks with an advance reservation service;routing and grooming in two-tier survivable optical mesh networks;fast network re-optimization schemes for MPLS and optical networks;hotspot mitigation protocol for mobile ad hoc networks;failure insensitive routing for ensuring service availability;network availability based service differentiation;replica placement for widely distributedsystems;using latency quantiles to engineer QOS guarantees for web services;dynamic resource allocation for shared data centers using online measurements;providing deterministic end-to-end fairness guarantees in core-stateless networks;per-domain packet scale rate guarantee for expedited forwarding;online response time optimization of apache web server and a practical learning-based approach for dynamic storage bandwidth allocation.
暂无评论