检索结果-内蒙古大学图书馆

ALMOST CERTAIN fault-DIAGNOSIS THROUGH algorithm-based fault-tolerance

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1994年第5期5卷 532-539页

作者： BLOUGH, DM PELC, A UNIV QUEBEC DEPT INFORMATHULL J8X 3X7QUEBECCANADA

algorithm-based fault tolerance has been proposed as a technique to detect incorrect computations in multiprocessor systems. In algorithm-based fault tolerance, processors produce data elements that are checked by concurrent error detection mechanisms. In this paper, we investigate the efficacy of this approach for diagnosis of processor faults. Because checks are performed on data elements, the problem of location of data errors must first be solved. We propose a probabilistic model for the faults and errors in a multiprocessor system and use it to evaluate the probabilities of correct error location and fault diagnosis. We investigate the number of checks that are necessary to guarantee error location with high probability. We also give specific check assignments that accomplish this goal. We then consider the problem of fault diagnosis when the locations of erroneous data elements are known. Previous work on fault diagnosis required that the data sets produced by different processors be disjoint. We show, for the first time, that fault diagnosis is possible with high probability, even in systems where processors combine to produce individual data elements.

关键词： algorithm-based fault tolerance CONCURRENT ERROR DETECTION fault DIAGNOSIS INTERMITTENT faultS PROBABILISTIC ANALYSIS

来源：评论

学校读者我要写书评

暂无评论

CONSTRUCTION OF CHECK SETS FOR algorithm-based fault-tolerance

引用

IEEE TRANSACTIONS ON COMPUTERS 1994年第6期43卷 641-650页

作者： GU, DC ROSENKRANTZ, DJ RAVI, SS SUNY ALBANY DEPT COMP SCIALBANYNY 12222

algorithm-based fault tolerance (ABFT) is a popular approach to achieve fault and error detection in multiprocessor systems. The design problem for ABFT is concerned with the construction of a check set of minimum cardinality that detects a specified number of errors or faults. Previous work on this problem has assumed an a priori-bound on size of a check. We motivate and carry out an investigation of the problem without the bounded check size assumption. We establish upper and lower bounds on the number of checks needed to detect a given number of errors. The upper bounds are obtained through new schemes which are easy to implement, and the lower bounds are established using new types of arguments. These bounds are sharply different from those previously established under the bounded check size model. We also show that unlike error detection, the design problem for fault detection is NP-hard even for detecting only one fault.

关键词： algorithm-based fault tolerance ONLINE CHECK ERROR fault DETECTION UPPER LOWER BOUNDS NP-COMPLETE

来源：评论

学校读者我要写书评

暂无评论

ERROR-CORRECTING CODES OVER Z(2M) FOR algorithm-based fault-tolerance

引用

IEEE TRANSACTIONS ON COMPUTERS 1994年第3期43卷 370-374页

作者： FENG, GL RAO, TRN KOLLURU, MS Center for Adv. Comput. Studies Southwestern Louisiana Univ. Lafayette LA USA

algorithm-based fault tolerance is a scheme of low-cost error protection in real-time digital signal processing environments and other computation-intensive tasks. In this paper, a new method for encoding data is proposed and, furthermore, tow kinds of error-correcting codes over Z2m, which can be used with fixed-point arithmetic in practical algorithm-based fault tolerant systems, are introduced.

关键词： algorithm-based fault tolerance BCH-LIKE CODES DECODING DATA ENCODING DATA ERROR-CORRECTING CODES OVER A RING REED SOLOMON-LIKE CODES

来源：评论

学校读者我要写书评

暂无评论

PARTITIONED ENCODING-SCHEMES FOR algorithm-based fault-tolerance IN MASSIVELY-PARALLEL SYSTEMS

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1994年第6期5卷 649-653页

作者： REXFORD, J JHA, NK PRINCETON UNIV DEPT ELECT ENGNPRINCETONNJ 08544

This short note considers the applicability of algorithm-based fault tolerance (ABFT) to massively parallel scientific computation. Existing ABFT schemes can provide effective fault tolerance at a low cost for computation on matrices of moderate size;however, the methods do not scale well to floating-point operations on large systems. This short note proposes the use of a partitioned linear encoding scheme to provide scalability. Matrix algorithms employing this scheme are presented and compared to current ABFT schemes. It is shown that the partitioned scheme provides scalable linear codes with improved numerical properties with only a small increase in hardware and time overhead.

关键词： algorithm-based fault tolerance CHECKSUM CODE ERROR DETECTION ERROR CORRECTION TRANSIENT ERRORS

来源：评论

学校读者我要写书评

暂无评论

fault-tolerant QRD recursive least squares

引用

IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES 1996年第2期143卷 137-144页

作者： Connolly, MP Fitzpatrick, P National Microelectronics Research Centre Cork Ireland

The authors present an algorithm-based fault tolerant scheme for recursive least squares, appropriate for applications in adaptive signal processing. The technique is closely focused on the Gentleman-Kung-McWhirter triangular systolic array architecture for QR decomposition. Assuming that the array is subject to transient faults, widely separated in time and each affecting a single processor, an algorithm is given that corrects the full triangular array with computational overhead equivalent, on average, to the interpolation of a single extra vector into the data stream. No output residuals are lost in the fault recovery. The analysis is extended to a fault-tolerant algorithm for linearly constrained QR decomposition.

关键词： algorithm-based fault tolerance error correction QR decomposition adaptive filtering linearly constrained QRD

来源：评论

学校读者我要写书评

暂无评论

Compiler-assisted generation of error-detecting parallel programs

Compiler-assisted generation of error-detecting parallel pro...

引用

26th International Symposium on fault-Tolerant Computing

作者： RoyChowdhury, A Banerjee, P IBM CORP THOMAS J WATSON RES CTRYORKTOWN HTSNY 10598

ISBN: (纸本)0818672617

We have developed an automated, compile time approach to generating error-detecting parallel programs. The compiler is used to identify statements implementing affine transformations within the program and automatically insert code for computing, manipulating, and comparing checksums in order to detect data errors at runtime. Statements which do not implement affine transformations are checked by duplication. Checksums are reused from one loop to the next if this is possible, rather than recomputing checksums for every statement. A global dataflow analysis is performed in order to determine points at which checksums need to be recomputed. We also use a novel method of specifying the data distributions of the check data using data distribution directives so that the computations on the original data and the corresponding check computations are performed on different processors. Results on the time overhead and error coverage of the error detecting parallel programs over the original programs are presented on an Intel Paragon distributed memory multicomputer.

关键词： algorithm-based fault tolerance checksum encoding paralleling compilers compiler assisted fault tolerance

来源：评论

学校读者我要写书评

暂无评论

IMPROVED BOUNDS FOR algorithm-based fault-tolerance

引用

IEEE TRANSACTIONS ON COMPUTERS 1993年第5期42卷 630-635页

作者： ROSENKRANTZ, DJ RAVI, SS Dept. of Comput. Sci. State Univ. of New York Albany NY USA

We establish new lower and upper bounds for the combinatorial problem of constructing minimal test sets for error detection in multiprocessor systems. Our construction for detecting two errors produces minimal test sets, while that for three errors produces test sets whose size exceeds our lower bound by at most one. We also present a divide-and-conquer construction scheme for four or more errors.

关键词： algorithm-based fault tolerance ERROR fault DETECTION LOWER BOUND ONLINE TEST UPPER BOUND

来源：评论

学校读者我要写书评

暂无评论

DESIGN OF algorithm-based fault-TOLERANT MULTIPROCESSOR SYSTEMS FOR CONCURRENT ERROR-DETECTION AND fault-DIAGNOSIS

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1994年第10期5卷 1099-1106页

作者： VINNAKOTA, B JHA, NK PRINCETON UNIV DEPT ELECT ENGNPRINCETONNJ 08544

algorithm-based fault tolerance (ABFT) is a low-overhead system-level concurrent error detection and fault location scheme for multiprocessor systems. In this short note, we present new methods for the design of ABFT systems. Our design procedure is applicable to a wide range of systems in which processors share data elements. A feature of our design approach is that the type of checks to be used in the final system can be controlled by the system designer. We also present some new bounds on the number of checks needed in ABFT system design.

关键词： algorithm-based fault tolerance CONCURRENT ERROR DETECTION fault DETECTABILITY fault DIAGNOSABILITY SYSTEM-LEVEL fault tolerance

来源：评论

学校读者我要写书评

暂无评论

ERROR DETECTION IN DIGITAL NEURAL NETWORKS - AN algorithm-based APPROACH FOR INNER PRODUCT PROTECTION

ERROR DETECTION IN DIGITAL NEURAL NETWORKS - AN ALGORITHM-BA...

引用

5th SPIE Conference on Advanced Signal Processing - algorithms, Architectures, and Implementations

作者： BREVEGLIERI, L PIURI, V POLITECN MILAN DEPT ELECTR & INFORMATI-20133 MILANITALY

ISBN: (纸本)0819416207

Artificial Neural Networks are an interesting solution for several real-time applications in the area of signal and image processing, in particular since recent advances in VLSI integration technologies allow for efficient hardware realizations. The use of dedicated circuits implementing the neural networks in mission-critical applications requires a high level of protection with respect to errors due to faults to guarantee output credibility and system availability. In this paper, the problem of concurrent error detection in dedicated neural networks is discussed by adopting an algorithm-based approach to check the inner product, i.e., the most of the computation performed in the neural network. Effectiveness and efficiency of this technique is shown and evaluated for the widely-used classes of neural paradigms.

关键词： DIGITAL NEURAL NETWORKS CONCURRENT ERROR DETECTION algorithm-based fault tolerance

来源：评论

学校读者我要写书评

暂无评论

algorithm-based fault tolerance IN COMPUTATION OF POWER FLOW

ALGORITHM-BASED FAULT TOLERANCE IN COMPUTATION OF POWER FLOW

引用

33RD MIDWEST SYMP ON CIRCUITS AND SYSTEMS

作者： CHEN, YP HAN, JY Dept of Electr & Comput Eng Illinois Inst of Technol Chicago IL USA

ISBN: (纸本)0780300815

The LU decomposition followed by forward/backward substitution is a very powerful technique for power flow studies. In order to ensure the reliability of computation, the algorithm-based fault tolerance (ABFT) is applied to LU decomposition in power flow studies. This technique is proposed not only to detect and correct errors caused by hardware failure but also to debug programs. Since the ABFT often suffers from roundoff errors when applied to the floating-point number system, a new technique called significant-bit maintenance arithmetic (SBMA) is also suggested for handling numerical problems.

关键词： algorithm-based fault tolerance POWER FLOW LU DECOMPOSITION FORWARD AND BACKWARD SUBSTITUTION INSITU METHOD PROGRAM DEBUGGING FLOATING-POINT ARITHMETIC ROUND-OFF ERROR SIGNIFICANT-BIT MAINTENANCE ARITHMETIC

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：