The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128 K processors....
详细信息
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128 K processors. One of the challenges when designing and deploying these systems in a production setting is the need to take failure occurrences, whether it be in the hardware or in the software, into account. Earlier work has shown that conventional runtime fault-tolerant techniques such as periodic checkpointing are not effective to the emerging systems. Instead, the ability to predict failure occurrences can help develop more effective checkpointing strategies. Failure prediction has long been regarded as a challenging research problem, mainly due to the lack of realistic failure data from actual production systems. In this study, we have collected RAS event logs from BlueGene/L over a period of more than 100 days. We have investigated the characteristics of fatal failure events, as well as the correlation between fatal events and non-fatal events. Based on the observations, we have developed three simple yet effective failure prediction methods, which can predict around 80% of the memory and network failures, and 47% of the application I/O failures
This paper addresses the need of semantic component in the grid environment to discover and describe the grid resources semantically. We propose semantic grid architecture by introducing a knowledge layer at the top o...
详细信息
This paper addresses the need of semantic component in the grid environment to discover and describe the grid resources semantically. We propose semantic grid architecture by introducing a knowledge layer at the top of Gridbus broker architecture and thereby enabling broker to discover resources semantically. The semantic component in the knowledge layer enables semantic description of grid resources with the help of ontology template. The ontology template has been created using Protege-OWL editor for different types of computing resources in the grid environment. The Globus Toolkit's MDS is used to gather grid resource information and Protege-OWL libraries are used to dynamically create knowledge base of grid resources. Algernon inference engine is used for interacting with the knowledge base to discover suitable resources.
With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed r...
详细信息
An algorithm for time division multiple access (TDMA) is found to be applicable in converting existing distributed algorithms into a model that is consistent with sensor networks. Such a TDMA service needs to be self-...
详细信息
This paper identifies challenges in managing resources in a Grid computing environment and proposes computational economy as a metaphor for effective management of resources and application scheduling. It identifies d...
详细信息
This paper describes TRAP/C++, a software tool that enables new adaptable behavior to be added to existing C++ programs in a transparent fashion. In previous investigations, we used an aspectoriented approach to manua...
详细信息
ISBN:
(纸本)1595930396
This paper describes TRAP/C++, a software tool that enables new adaptable behavior to be added to existing C++ programs in a transparent fashion. In previous investigations, we used an aspectoriented approach to manually define aspects for adaptation infrastructure, which were woven into the original application code at compile time. In follow-on work, we developed TRAP, a transparent shaping technique for automatically generating adaptation aspects, where TRAP/J is a specific instantiation of TRAP. This paper presents our work into building TRAP/C++, which was intended to be a port of TRAP/J into C++. Designing TRAP/C++ required us to overcome two major hurdles: lack of reflection in C++ and the incompatibility between the management of objects in C++ and the aspect weaving technique used in TRAP/J. We used generative programming methods to produce two tools, TrapGen and TrapCC, that work together to produce the desired TRAP/C++ functionality. Details of the TRAP/C++ architecture and operation are presented, which we illustrate with a description of a case study that adds dynamic auditing capabilities to an existing distributed C++ application. Copyright 2005 ACM.
Embedded systems are pervasive and frequently used for critical systems with time-dependent functionality. Dwyer et al. have developed qualitative specification patterns to facilitate the specification of critical pro...
详细信息
ISBN:
(纸本)9781581139631
Embedded systems are pervasive and frequently used for critical systems with time-dependent functionality. Dwyer et al. have developed qualitative specification patterns to facilitate the specification of critical properties, such as those that must be satisfied by embedded systems. Thus far, no analogous repository has been compiled for real-time specification patterns. This paper makes two main contributions: First, based on an analysis of timing-based requirements of several industrial embedded system applications, we created real-time specification patterns in terms of three commonly used real-time temporal logics. Second, as a means to further facilitate the understanding of the meaning of a specification, we offer a structured English grammar that includes support for real-time properties. We illustrate the use of the real-time specification patterns in the context of property specifications of a real-world automotive embedded system. Copyright 2005 ACM.
Recomposable software enables a system to change its structure and behavior during execution, in response to a dynamic execution environment. This paper proposes an approach to ensure that such adaptations are safe wi...
详细信息
In this work, we present two perspectives of Grid computing by using two different Grid middleware as examples: an Enterprise Grid using Xgrid and a Global Grid with Gridbus. We also present the integration of Enterpr...
详细信息
暂无评论