Hardware-software co-synthesis is the process of partitioning an embedded system specification into hardware and software modules to meet performance, cost and reliability goals. In this paper, we address the problem ...
详细信息
Hardware-software co-synthesis is the process of partitioning an embedded system specification into hardware and software modules to meet performance, cost and reliability goals. In this paper, we address the problem of hardware-software co-synthesis of fault-tolerant real-time heterogeneous distributed embedded systems. Fault detection capability is imparted to the embedded system by adding assertion and duplicate-and-compare tasks to the task graph specification prior to cosynthesis. The reliability and availability of the architecture are evaluated during co-synthesis. Our algorithm allows the user to specify multiple types of assertions for each task. It uses the assertion or combination of assertions which achieves the required fault coverage without incurring too much overhead. We propose new methods to: 1) perform fault tolerance based task clustering 2) derive the best error recovery topology using a small number of extra processing elements, 3) exploit multi-dimensional assertions, and 4) share assertions to reduce the fault tolerance overhead. Our algorithm can tackle multirate systems.commonly found in multimedia applications. Application of the proposed algorithm to several real-life telecom transport system examples shows its efficacy.
Fault tolerance is a survival attribute of complex computer systems.and software in their ability to deliver continuous service to their users in the presence of faults. Formulating an analytic model for dependability...
详细信息
Fault tolerance is a survival attribute of complex computer systems.and software in their ability to deliver continuous service to their users in the presence of faults. Formulating an analytic model for dependability and performance evaluation of hardware/software fault tolerant architectures can be quite cumbersome. Also, in practice, isolating the effect of various parameters on a system, while holding the others constant requires exploring a variety of scenarios. It is economically infeasible to build several such systems. Simulation offers an attractive mechanism for dependability evaluation and the study of the influence of various parameters on the failure behavior of the system. In this paper, we develop algorithms to simulate the failure behavior of three commonly used fault tolerant architectures, viz., distributed Recovery Block (DRB), N-Version Programming (NVP) and N-Self Checking Programming (NSCP). We demonstrate the ability of the approach to simulate complex failure scenarios with various dependencies using some illustrative numerical examples.
The various softwaresystems.developed for the DIII-D tokamak have played a highly visible and important role in tokamak operations and fusion research. Because of the heavy reliance on in-house developed software enc...
详细信息
The various softwaresystems.developed for the DIII-D tokamak have played a highly visible and important role in tokamak operations and fusion research. Because of the heavy reliance on in-house developed software encompassing all aspects of operating the tokamak, much attention has been given to the careful design, development and maintenance of these softwaresystems.softwaresystems.responsible for tokamak control and monitoring, neutral beam injection, and data acquisition demand the highest level of reliability during plasma operations. These systems.made up of hundreds of programs totaling thousands of lines of code have presented a wide variety of software design and development issues ranging from low level hardware communications, database management, and distributed process control, to man machine interfaces. The focus of this paper will be to describe how software is developed and managed for the DIII-D control and data acquisition computers. It will include an overview and status of softwaresystems.implemented for tokamak control, neutral beam control, and data acquisition. The issues and challenges faced developing and managing the large amounts of software in support of the dynamic and everchanging needs of the DIII-D experimental program will be addressed.
Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located, and corrected using a policy-driven ...
详细信息
Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located, and corrected using a policy-driven fault management system. Other work in this are to date has focused on network-level faults. We believe that in a distributed system it is more appropriate to focus on faults at the application level. Furthermore, this work has been largely domain specific - a generic, structured approach to this problem is needed. Our work has focused on policy-driven fault management in distributedsystems.at the application level. In this paper, we define a generic architecture for policy-driven fault management, and present a prototype system based on this architecture. We also discuss experience to date using and experimenting with our prototype system.
The proceedings contains 27 papers from the 1996 IEEE Real-Time Technology and Applications symposium. Topics discussed include case studies and applications of real time systems.databasesystems.and concurrency cont...
详细信息
The proceedings contains 27 papers from the 1996 IEEE Real-Time Technology and Applications symposium. Topics discussed include case studies and applications of real time systems.databasesystems.and concurrency control, software engineering, data communication systems. real time system development and analysis tools, formal methods and processing scheduling, and operating systems.and distributedsystems.
distributed services are often provided by process groups for purposes of reliability, availability, and performance. It is often important for the members of a such a group to have a consistent view of the group'...
详细信息
distributed services are often provided by process groups for purposes of reliability, availability, and performance. It is often important for the members of a such a group to have a consistent view of the group's membership. For this reason, membership services are important part of many distributedsoftwaresystems. Despite their importance, the specification and implementation of membership services in completely asynchronous systems.has challenged researchers. Recent papers have demonstrated that earlier specifications are either unsolvable or admit trivial solutions. Informally, membership services require a kind of agreement among processes and it has been shown that it is impossible to solve many consensus-like problems in completely asynchronous systems. If the specification of membership service is nearly as strong as that of consensus, the specification will be unsolvable. If it is much weaker, its solutions may be useless. This paper provides an alternative specification of group membership and exhibits an algorithm that satisfies it. The specification is solvable in spite of earlier impossibility results because it permits executions in which all processes are evicted from the process group yet none ever learns that the group has become empty. This represents a weakening of earlier specifications, which required that, at all times, at least one process be aware of a group's membership. However, the new specification cannot be trivially satisfied because it prohibits a potential solution from arbitrarily removing a process for no reason. This specification thus represents an important step towards a better understanding of membership services in completely asynchronous systems.
Programmability, reliability, and scalability are system requirements that are essential to retaining or establishing the high ground in the new world order for telecommunications solutions. For telephony systems.to b...
详细信息
ISBN:
(纸本)0818674849
Programmability, reliability, and scalability are system requirements that are essential to retaining or establishing the high ground in the new world order for telecommunications solutions. For telephony systems.to be successful in the next century, they must deliver 'fifth-generation software flexibility' that enables interoperability while satisfying these 3 key requirements to give customers the applications they need, when they need them. Further, these requirements must be designed into an open system architecture from the start: systems.lacking them cannot be easily extended to leverage their combined advantages. Finally, these solutions must readily build upon inexpensive, off-the-shelf hardware components. Telecommunication is becoming telecomputing. Although traditional fault-tolerance approaches can be used to increase the reliability of the platform, a more general software fault-tolerant architecture is required to integrate new off-the-shelf system components reliably and seamlessly into the existing service. The ease of integration hinges both on the system architecture and on easy programming paradigms to integrate dependability with the service functionality. This paper describes our working prototype of an open Telecomputing system based on these architectural principles.
The ATM-based Metropolitan Area Network (MAN) of Berlin connects two university hospitals (Benjamin Franklin University Hospital and Charite) with the computer resources of the Technical University of Berlin (TUB). Di...
详细信息
ISBN:
(纸本)0819420867
The ATM-based Metropolitan Area Network (MAN) of Berlin connects two university hospitals (Benjamin Franklin University Hospital and Charite) with the computer resources of the Technical University of Berlin (TUB). distributed new medical services have been implemented and will be evaluated within the highspeed MAN of Berlin. The network with its data transmission rates of up to 155 Mbit/s renders these medical services externally available to practicing physicians. Resource and application sharing is demonstrated by the use of two softwaresystems. The first software system is an interactive 3D reconstruction tool (3D- Medbild), based on a client-server mechanism. This structure allows the use of high- performance computers at the TUB from the low-level workstations in the hospitals. A second software system, RAMSES, utilizes a tissue database of Magnetic Resonance Images. For the remote control of the software, the developed applications use standards such as DICOM 3.0 and features of the World Wide Web. Data security concepts are being tested and integrated for the needs of the sensitive medical data. The highspeed network is the necessary prerequisite for the clinical evaluation of data in a joint teleconference. The transmission of digitized real-time sequences such as video and ultrasound and the interactive manipulation of data are made possible by Multi Media tools.
The proceedings contain 35 papers. The special focus in this conference is on Design and Implementation of Symbolic Computation systems. The topics include: Problem-oriented applications of automated theorem proving;a...
ISBN:
(纸本)3540616977
The proceedings contain 35 papers. The special focus in this conference is on Design and Implementation of Symbolic Computation systems. The topics include: Problem-oriented applications of automated theorem proving;a strongly-typed embeddable computer algebra library;a general framework for implementing calculi and strategies;equality elimination for the tableau method;towards lean proof checking;high performance equational theorem proving;a reflective language based on conditional term rewriting;term rewriting systems.generative geometric modeling in a functional environment;exploiting SML for experimenting with algebraic algorithms;conditional categories and domains;parameterizing object specifications;analyzing the dynamics of a Z specification;integer and rational arithmetic on maspar;parallel 3-primes FFT algorithm;a master-slave approach to parallel term rewriting on a hierarchical multiprocessor;concepts and applications;document-centered presentation of computing software;animating a non-executable formal specification with a distributed symbolic language;uniform representation of basic algebraic structures in computer algebra;integrating computer algebra with proof planning;structures for symbolic mathematical reasoning and computation;an approach to class reasoning in symbolic computation;an intelligent interface to numerical routines;computer algebra and the world wide web;software architectures for computer algebra;a deductive database for mathematical formulas;a system for computer aided constructive algebraic geometry;making systems.communicate and cooperate;a database for number fields;compiling residuation for a multiparadigm symbolic programming language and pluggability issues in the multi protocol.
Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located and corrected using a policy-driven f...
详细信息
Management policies can be used to specify requirements about the desired behaviour of distributedsystems. Violations of policies (faults) can then be detected, isolated, located and corrected using a policy-driven fault management system. Other work in this area to date has focused on network-level faults. We believe that in a distributed system it is more appropriate to focus on faults at the application level. Furthermore, this work has been largely domain-specific-a generic, structured approach to this problem is needed. Our work has focused on policy-driven fault management in distributedsystems.at the application level. In this paper, we define a generic architecture for policy-driven fault management and present a prototype system based on this architecture. We also discuss experience to date using and experimenting with our prototype system.
暂无评论