A description is given of the results of a study of methods of achieving fault tolerance in the Clouds system and, in particular, of achieving increased availability of objects. The problems explored in this work, the...
详细信息
ISBN:
(纸本)0818608757
A description is given of the results of a study of methods of achieving fault tolerance in the Clouds system and, in particular, of achieving increased availability of objects. The problems explored in this work, the model of distributed computation in which the problems posed by the research were examined (the Clouds system), the tools that were used to address these problems (the Aeolus programming language), and some related research are briefly described. The authors present a methodology for achieving available services by conversion of resilient single-site implementations into replicated implementations. A mechanism with which they propose to support this methodology, called distributed locking (DL), is presented. A description is also given of a linguistic feature for the specification of the availability properties of an object replicated via DL. The language runtime support features (primitives) required for DL and the operating system support needed for these features are presented.
This paper outlines a human-centered virtual machine of problem solving agents, intelligent agents, software agents and objects. It deals with issues related to high-assurance (e.g. reliability, availability real-time...
详细信息
ISBN:
(纸本)0818692219
This paper outlines a human-centered virtual machine of problem solving agents, intelligent agents, software agents and objects. It deals with issues related to high-assurance (e.g. reliability, availability real-time and others) through design of human-centered system architecture in which technology is a primitive. The human-centered virtual machine is based on a number of human-centered perspectives including the distributed cognition approach. The human-centered virtual machine has been applied in complex data intensive time critical problems like real-time alarm processing and fault diagnosis, air combat simulation and business (decision support).
This paper describes a tool-based approach for designing and prototyping a distributeddatabase application. This approach is demonstrated for an Academic Affairs Information System (AAIS) to assist the Webster Univer...
详细信息
ISBN:
(纸本)1581130864
This paper describes a tool-based approach for designing and prototyping a distributeddatabase application. This approach is demonstrated for an Academic Affairs Information System (AAIS) to assist the Webster University main campus and its 70+ remote sites in managing the information required to admit students, approve programs, schedule courses, assign faculty, register students, and generate the required queries and reports. ORACLE Relational database Management (RDBMS) tools and products for Windows NT were used to support AAIS requirements analysis, design, and prototype implementation. The Designer/2000 Process Modeler tool was used to document the top-level business functions, and the Data Modeler tool was used to develop a third normal form data model. The Developer/2000 Forms tool was used to prototype several user interface forms for main campus staff, remote staff, and students to enter and update student and program data. A Web Server was also installed, along with the Java software and AppletViewer, to test the prototype forms from a Web Browser.
The proceedings contain 39 papers. The topics discussed include: detection of unexpected situations by applying softwarereliability growth models to test phases;resource/schedule/content model: improving testing effe...
ISBN:
(纸本)9781509019441
The proceedings contain 39 papers. The topics discussed include: detection of unexpected situations by applying softwarereliability growth models to test phases;resource/schedule/content model: improving testing effectiveness;static analysis of physical properties in Simulink models;test suites for benchmarks of static analysis tools;optimizing resiliency of distributed video surveillance system for safer city;software-defined networking (SDN) control message classification, verification, and optimization system;integrating formal methods with testing for reliability estimation of component based systems.C-SEC (Cyber SCADA evaluation capability): securing critical infrastructures;operational softwarized networks reliability management;knowledge transition: discovering workflow models from functional tests;and analyzing failure mechanism for complex software-intensive systems.
The authors describe the overall system design for ImageNet and present a system prototype developed on an Ethernet network in the Computer Engineering Research Laboratory at the University of Arizona. ImageNet is a g...
详细信息
This paper presents a feasibility study of two combined techniques for software fault tolerance in distributedsystems. A probabilistic model for each technique is presented that represents each technique's abilit...
详细信息
The complexity of today's distributed computing environments is such that the presence of bugs and security holes is statistically unavoidable. A very promising approach to this issue is to implement a self-protec...
详细信息
ISBN:
(纸本)3540490183
The complexity of today's distributed computing environments is such that the presence of bugs and security holes is statistically unavoidable. A very promising approach to this issue is to implement a self-protected system, similarly to a natural immune system which has the ability to detect the intrusion of foreign elements and react while it is still in progress. This paper describes an approach relying on component-based software engineering to ease the protection of distributedsystems. The knowledge of the application architecture is used to detect foreign activities and to trigger counter measures. We focus on a mean to recognize known and unknown attacks independently from legacy software and avoiding false positives. Hence, the scope of the detected attacks is, for the moment, limited to the detection of illegal communications. We describe how this approach can be applied to provide self-protection for clustered J2EE applications with a very low overhead.
distributed services are often provided by process groups for purposes of reliability, availability, and performance. It is often important for the members of a such a group to have a consistent view of the group'...
详细信息
distributed services are often provided by process groups for purposes of reliability, availability, and performance. It is often important for the members of a such a group to have a consistent view of the group's membership. For this reason, membership services are important part of many distributedsoftwaresystems. Despite their importance, the specification and implementation of membership services in completely asynchronous systems.has challenged researchers. Recent papers have demonstrated that earlier specifications are either unsolvable or admit trivial solutions. Informally, membership services require a kind of agreement among processes and it has been shown that it is impossible to solve many consensus-like problems in completely asynchronous systems. If the specification of membership service is nearly as strong as that of consensus, the specification will be unsolvable. If it is much weaker, its solutions may be useless. This paper provides an alternative specification of group membership and exhibits an algorithm that satisfies it. The specification is solvable in spite of earlier impossibility results because it permits executions in which all processes are evicted from the process group yet none ever learns that the group has become empty. This represents a weakening of earlier specifications, which required that, at all times, at least one process be aware of a group's membership. However, the new specification cannot be trivially satisfied because it prohibits a potential solution from arbitrarily removing a process for no reason. This specification thus represents an important step towards a better understanding of membership services in completely asynchronous systems.
The authors present an enhancement to distributed file systems.that allows the users of the system to keep local copies of important files, decreasing the dependency over file servers. Using the notions of stashing an...
详细信息
The authors present an enhancement to distributed file systems.that allows the users of the system to keep local copies of important files, decreasing the dependency over file servers. Using the notions of stashing and quasi-copies, the system allows users to tune up the quality of the service they want to receive when the file server is not reachable. One of the key points of this work is the focus on the tradeoff between availability and degradation of service. The other main contribution is the design of a distributed file system which is ideally suited to very large distributedsystems. in that it provides users with greater tolerance of network partitions and server failures. It is emphasized that the use of stashing does not preclude the use of other performance-enhancing or fault-tolerant techniques. The file system architecture has been implemented and FACE, a prototype of a file system service based on Sun's NFS, is described. Performance figures are reported. These figures show that the overhead of providing the service is negligible. Current plans also call for porting the FACE design to a number of other processors.
A checkpoint algorithm is presented that benefits from the research in concurrency control, commit, and site recovery algorithms in transaction processing. In the authors' approach a number of checkpointing proces...
详细信息
ISBN:
(纸本)0818608757
A checkpoint algorithm is presented that benefits from the research in concurrency control, commit, and site recovery algorithms in transaction processing. In the authors' approach a number of checkpointing processes, a number of rollback processes, and computations on operational processes can proceed concurrently while tolerating the failure of an arbitrary number of processes. Each process takes checkpoints independently. During recovery after a failure, a process invokes a two-phase rollback algorithm. It collects information about relevant message exchanges in the system in the first phase and uses it in the second phase to determine both the set of processes that must roll back and the set of checkpoints up to which rollback must occur. Concurrent rollbacks are completed in the order of the priorities of the recovering processes. The proposed solution is optimistic in the sense that it does well if failures are infrequent by minimizing overhead during normal processing.
暂无评论