咨询与建议

限定检索结果

文献类型

  • 51 篇 期刊文献
  • 28 篇 会议

馆藏范围

  • 79 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 78 篇 工学
    • 71 篇 计算机科学与技术...
    • 57 篇 电气工程
    • 6 篇 软件工程
    • 3 篇 电子科学与技术(可...
    • 3 篇 信息与通信工程
    • 2 篇 网络空间安全
    • 1 篇 控制科学与工程
  • 6 篇 理学
    • 5 篇 数学
    • 1 篇 物理学
  • 2 篇 管理学
    • 2 篇 管理科学与工程(可...

主题

  • 79 篇 algorithm-based ...
  • 14 篇 concurrent error...
  • 8 篇 fault tolerance
  • 8 篇 matrix multiplic...
  • 7 篇 error detection
  • 5 篇 fault tolerant s...
  • 4 篇 error correction
  • 4 篇 sparse grid comb...
  • 4 篇 checkpointing
  • 4 篇 checksum encodin...
  • 3 篇 fault diagnosis
  • 3 篇 weighted sum par...
  • 3 篇 simd
  • 3 篇 silent errors
  • 3 篇 silent data corr...
  • 3 篇 avx-512
  • 3 篇 high-performance...
  • 3 篇 parallel computi...
  • 3 篇 high performance...
  • 3 篇 pde solvers

机构

  • 6 篇 univ calif river...
  • 6 篇 princeton univ d...
  • 6 篇 univ calif davis...
  • 2 篇 princeton univ d...
  • 2 篇 univ calif river...
  • 2 篇 chinese acad sci...
  • 2 篇 australian natl ...
  • 2 篇 oak ridge natl l...
  • 1 篇 italian natl agc...
  • 1 篇 penn state univ ...
  • 1 篇 univ calif davis...
  • 1 篇 univ quebec dept...
  • 1 篇 national microel...
  • 1 篇 sungkyunkwan uni...
  • 1 篇 georgia inst tec...
  • 1 篇 oak ridge natl l...
  • 1 篇 univ lyon inria ...
  • 1 篇 politecn milan d...
  • 1 篇 carnegie mellon ...
  • 1 篇 sandia natl labs...

作者

  • 9 篇 chen zizhong
  • 8 篇 jha nk
  • 8 篇 redinbo gr
  • 4 篇 wu panruo
  • 4 篇 zhai yujia
  • 4 篇 chen jieyang
  • 4 篇 banerjee p
  • 4 篇 zhao kai
  • 3 篇 nguyen c
  • 3 篇 ouyang kaiming
  • 3 篇 liang xin
  • 3 篇 strazdins peter ...
  • 3 篇 harding brendan
  • 3 篇 li sihuan
  • 3 篇 vinnakota b
  • 3 篇 abraham ja
  • 2 篇 grover pulkit
  • 2 篇 liu jinyang
  • 2 篇 mayo jackson r.
  • 2 篇 tao dingwen

语言

  • 78 篇 英文
  • 1 篇 其他
检索条件"主题词=algorithm-Based fault tolerance"
79 条 记 录,以下是31-40 订阅
排序:
Highly Scalable algorithms for the Sparse Grid Combination Technique  29
Highly Scalable Algorithms for the Sparse Grid Combination T...
收藏 引用
29th IEEE International Parallel and Distributed Processing Symposium (IPDPS)
作者: Strazdins, Peter E. Ali, Md Mohsin Harding, Brendan Australian Natl Univ Res Sch Comp Sci Canberra ACT Australia Australian Natl Univ Inst Math Sci Canberra ACT Australia
Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for s... 详细信息
来源: 评论
Combining backward and forward recovery to cope with silent errors in iterative solvers  29
Combining backward and forward recovery to cope with silent ...
收藏 引用
29th IEEE International Parallel and Distributed Processing Symposium (IPDPS)
作者: Fasi, Massimiliano Robert, Yves Ucar, Bora Ecole Normale Super Lyon Lyon France CNRS F-75700 Paris France INRIA Rocquencourt France Univ Bologna I-40126 Bologna Italy Univ Knoxville Knoxville TN USA
Several recent papers have introduced a periodic verification mechanism to detect silent errors in iterative solvers. Chen [PPoPP' 13, pp. 167-176] has shown how to combine such a verification mechanism (a stabili... 详细信息
来源: 评论
Failure Mitigation in Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement  21
Failure Mitigation in Linear, Sesquilinear and Bijective Ope...
收藏 引用
21st IEEE International On-Line Testing Symposium (IOLTS)
作者: Anam, Mohammad Ashraful Andreopoulos, Yiannis UCL Dept Elect & Elect Engn Roberts BldgTorrington Pl London WC1E 7JE England
A new roll-forward technique is proposed that recovers from any single fail-stop failure in M integer data streams (M >= 3) when undergoing linear, sesquilinear or bijective (LSB) operations, such as: scaling, addi... 详细信息
来源: 评论
Correcting DFT Codes with a Modified Berlekamp-Massey algorithm and Kalman Recursive Syndrome Extension
收藏 引用
IEEE TRANSACTIONS ON COMPUTERS 2014年 第1期63卷 196-203页
作者: Redinbo, G. Robert Univ Calif Davis Dept Elect & Comp Engn Davis CA 95616 USA
Real number block codes derived from the discrete Fourier transform (DFT) are corrected by coupling a very modified Berlekamp-Massey (BM) algorithm with a syndrome extension process. The modified BM algorithm determin... 详细信息
来源: 评论
algorithm-based fault tolerance for Fail-Stop Failures
收藏 引用
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2008年 第12期19卷 1628-1641页
作者: Chen, Zizhong Dongarra, Jack Colorado Sch Mines Dept Math & Comp Sci Golden CO 80401 USA Univ Tennessee Dept Elect Engn & Comp Sci Knoxville TN 37996 USA
Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging. In this paper, we show that fail-stop process failures in ScaLAPACK matrix-matrix multiplication kernel can be to... 详细信息
来源: 评论
Supporting the Development of Resilient Message Passing Applications using Simulation
Supporting the Development of Resilient Message Passing Appl...
收藏 引用
22nd Euromicro International Conference on Parallel, Distributed, and Network-based Processing (PDP)
作者: Naughton, Thomas Engelmann, Christian Vallee, Geoffroy Boehm, Swen Oak Ridge Natl Lab Comp Sci & Math Div Oak Ridge TN 37831 USA
An emerging aspect of high-performance computing (HPC) hardware/software co-design is investigating performance under failure. The work in this paper extends the Extreme-scale Simulator (xSim), which was designed for ... 详细信息
来源: 评论
Reducing Overheads for fault-tolerant Datapaths with Dynamic Partial Reconfiguration  22
Reducing Overheads for Fault-tolerant Datapaths with Dynamic...
收藏 引用
22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines ((FCCM)
作者: Davis, James J. Cheung, Peter Y. K. Imperial Coll London Dept Elect & Elect Engn London SW7 2AZ England
As process scaling and transistor count inflation continue, silicon chips are becoming increasingly susceptible to faults. Although FPGAs are particularly vulnerable to these effects, their runtime reconfigurability o... 详细信息
来源: 评论
On-line soft error correction in matrix-matrix multiplication
收藏 引用
JOURNAL OF COMPUTATIONAL SCIENCE 2013年 第6期4卷 465-472页
作者: Wu, Panruo Ding, Chong Chen, Longxiang Davies, Teresa Karlsson, Christer Chen, Zizhong Colorado Sch Mines Dept Elect Engn & Comp Sci Golden CO 80401 USA Univ Calif Riverside Dept Comp Sci & Engn Riverside CA 92521 USA
Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Soft errors normally do not interrupt the execution of the affected program, but the affected computation... 详细信息
来源: 评论
A Case Study of Designing Efficient algorithm-based fault Tolerant Application for Exascale Parallelism
A Case Study of Designing Efficient Algorithm-based Fault To...
收藏 引用
26th IEEE International Parallel and Distributed Processing Symposium (IPDPS) / Workshop on High Performance Data Intensive Computing
作者: Yao, Erlin Wang, Rui Chen, Mingyu Tan, Guangming Sun, Ninghui Chinese Acad Sci Inst Comp Technol State Key Lab Comp Architecture Beijing Peoples R China
fault tolerance overhead of high performance computing (HPC) applications is becoming critical to the efficient utilization of HPC systems at large scale. Today's HPC applications typically tolerate fail-stop fail... 详细信息
来源: 评论
Correcting DFT Codes with Modified Berlekamp-Massey algorithm and Syndrome Extension
Correcting DFT Codes with Modified Berlekamp-Massey Algorith...
收藏 引用
17th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC)
作者: Redinbo, Robert Univ Calif Davis ECE Dept Davis CA 95616 USA
Real number block codes derived from the discrete Fourier transform (DFT) are corrected by coupling a very modified Berlekamp-Massey algorithm with a syndrome extension process. Enhanced extension recursions based on ... 详细信息
来源: 评论