Increasing design complexity for current and future generations of microelectronic technologies leads to an increased sensitivity to transient bit-flip errors. These errors can cause unpredictable behaviors and corrup...
详细信息
Increasing design complexity for current and future generations of microelectronic technologies leads to an increased sensitivity to transient bit-flip errors. These errors can cause unpredictable behaviors and corrupt data integrity and system availability. This paper proposes new solutions to detect all classes of faults, including those that escape conventional software detection mechanisms, allowing full protection against transient bit-flip errors. The proposed solutions, particularly well suited for low-cost safety-critical microprocessor-based applications, have been validated through exhaustive fault injection experiments performed on a set of real and synthetic benchmark programs. The fault model taken into consideration was single bit-flip errors corrupting memory cells accessible to the user by means of the processor instruction set. The obtained results demonstrate the effectiveness of the proposed solutions.
This paper evaluates the concurrent error detection capabilities of system-level checks, using fault and error injection. The checks comprise application and system level mechanisms to detect controlflow errors. We p...
详细信息
This paper evaluates the concurrent error detection capabilities of system-level checks, using fault and error injection. The checks comprise application and system level mechanisms to detect controlflow errors. We propose Enhanced control-flowchecking Using Assertions (ECCA). In ECCA, branch-free intervals (BFI) in a given high or intermediate level program are identified and the entry and exit points of the intervals are determined. BFIs are then grouped into blocks, the size of which is determined through a performance/overhead analysis. The blocks are then fortified with preinserted assertions. For the high level ECCA, we describe an implementation of ECCA through a preprocessor that will automatically insert the necessary assertions into the program. Then, we describe the intermediate implementation possible through modifications made on gee to make it ECCA capable. The fault detection capabilities of the checks are evaluated both analytically and experimentally. Fault injection experiments are conducted using FERRARI [1] to determine the fault coverage of the proposed techniques.
A new control flow checking scheme is presented, based on assigned-signature checking using a watchdog processor. This scheme is suitable for a multitasking, multiprocessor environment. The hardware overhead is compar...
详细信息
A new control flow checking scheme is presented, based on assigned-signature checking using a watchdog processor. This scheme is suitable for a multitasking, multiprocessor environment. The hardware overhead is comparatively low because of three reasons: first, hierarchically structured, the scheme uses only a single watchdog processor to monitor processes on multiple processors. Second, as an assigned-signature scheme, it does not require monitoring the instruction bus of the processors. Third, the run-time and reference signatures are embedded into the checked program;thus, in the watchdog processor neither a reference database nor a time-consuming search and compare engine is required.
This paper presents new principles for the on-line monitoring in the context of multiprocessors (especially massively parallel processors) and then focuses on the effect of the aliasing probability on the error detect...
详细信息
This paper presents new principles for the on-line monitoring in the context of multiprocessors (especially massively parallel processors) and then focuses on the effect of the aliasing probability on the error detection process. In the proposed test architecture, the concurrent testing (or on-line monitoring) at system level is accomplished by enforcing the runtime test of the data and control dependences of the algorithm currently executed in the parallel computer. In order to help in this process, each message contains both source and destination addresses. At each message source, the sequence of destination addresses of the outgoing messages is compressed on a block basis. At the same time, at each destination;the sequence of source addresses of all incoming messages is compressed, also on a block basis. Concurrent compression of the instructions executed by the PE's is also possible. As a result of this procedure, an image of the data dependences and bf the controlflow of the currently run algorithm is created. This image is compared at the end of each computational block with a reference image created at compilation time. The main results of this work are in proposing new principles for the on-line system-level test of multiprocessors systems,based on signaturing and monitoring the data dependences together with the control dependences, and in providing an analytical model and analysis for the address compression process used for the monitoring the data routing process.
If off-line testing is complemented by on-line checks, in general some of the test hardware is only used either for on-line checking (e.g., controlflow monitors) or for production testing (e.g., pattern generators). ...
详细信息
In a previous paper, we introduced a new concurrent testing (or on-line monitoring) architecture for Massively-Parallel Computers. In the proposed test architecture, on-line checks for both controlflow and data Touti...
详细信息
In a previous paper, we introduced a new concurrent testing (or on-line monitoring) architecture for Massively-Parallel Computers. In the proposed test architecture, on-line checks for both controlflow and data Touting are accomplished by enforcing the run-time test of compressed (signatured) versions of the control and data dependences of the algorithm executed in the parallel computer. This paper focuses on the results of simulation experiments on the error detection of the proposed test architecture as applied to the routing process. Four sets of experiments were executed, with two compressors or signature analyzers (an MISR and an LFSR) and two error models (the 2m-ary and the Binary Symmetric Channel). Using a randomized routing process and a randomized fault insertion, we have obtained detailed figures for the undetected errors at all crucial detecting points of our proposed detection method: the source, the expected destination and the false destination of the messages. High detection ratios for multiple errors were obtained for compressors of only moderate size, supporting the use of this method in practical applications. The results are independent of the topology of the interconnection network and the detailed routing algorithm.
An approach to verifying controlflow in distributed computer systems (DCS) is presented. The approach is based on control flow checking among software components distributed over processors and cooperating among them...
详细信息
An approach to verifying controlflow in distributed computer systems (DCS) is presented. The approach is based on control flow checking among software components distributed over processors and cooperating among them. In this approach, controlflow behavior of DCS software is modeled and contained in special software components called verifiers. The verifiers are distributed over the processors and consulted to check the correctness of the controlflow in DCS soft- ware during its execution. Algorithms for deriving the verifiers are presented. This technique can detect global errors including synchronization errors as well as local errors. It can be used for sequential or concurrent software at various levels of details. Experiments show that using this technique requires no significant overhead. [ABSTRACT FROM AUTHOR]
This paper presents a new concept of on-line controlflow monitoring called Roving Monitoring. This technique utilizes a special purpose roving monitoring processor in order to provide continuous and concurrent checki...
详细信息
This paper presents a new concept of on-line controlflow monitoring called Roving Monitoring. This technique utilizes a special purpose roving monitoring processor in order to provide continuous and concurrent checking of instruction level controlflow in multiple processor systems. The roving monitoring processor is time shared among several application processors to reduce overall monitoring overhead. The design and implementation of a roving monitoring processor with a novel architecture is presented. The roving monitoring concept is shown to be quite feasible.
A control flow checking scheme capable of detecting controlflow errors of programs resulting from software coding errors, hardware malfunctions, or memory mutilation during the execution of the program is presented. ...
详细信息
A control flow checking scheme capable of detecting controlflow errors of programs resulting from software coding errors, hardware malfunctions, or memory mutilation during the execution of the program is presented. In this approach, the program is partitioned into loop-free intervals and a database containing the path information in each of the loop-free intervals is derived from the detailed design. The path in each loop-free interval actually traversed at run time is recorded and then checked against the information provided in the database, and any discrepancy indicates an error. This approach is general, and can detect all uncompensated illegal branches. Any uncompensated error that occurs during the execution of a loop-free interval and manifests itself as a wrong branch within the loop-free interval or right after the completion of execution of the loop-free interval is also detectable. The approach can also be used to check the controlflow in the testing phase of program development. The capabilities, limitations, implementation, and the overhead of using this approach are discussed.
暂无评论