咨询与建议

限定检索结果

文献类型

  • 1 篇 会议

馆藏范围

  • 1 篇 电子文献
  • 0 种 纸本馆藏

日期分布

学科分类号

  • 1 篇 工学
    • 1 篇 计算机科学与技术...

主题

  • 1 篇 reliability
  • 1 篇 large-scale dist...
  • 1 篇 fault-tolerance
  • 1 篇 checkpoint/resta...
  • 1 篇 hpc

机构

  • 1 篇 louisiana tech u...
  • 1 篇 oak ridge natl l...

作者

  • 1 篇 naksinehaboon ni...
  • 1 篇 nassar raja
  • 1 篇 paun mihaela
  • 1 篇 liu yudan
  • 1 篇 leangsuksun chok...
  • 1 篇 scott stephen l.

语言

  • 1 篇 英文
检索条件"主题词=large-scale distributed system events log analysis"
1 条 记 录,以下是1-10 订阅
排序:
An optimal checkpoint/restart model for a large scale High Performance Computing system
An optimal checkpoint/restart model for a large scale High P...
收藏 引用
10th Workshop on Advances in Parallel and distributed Computational Models/22nd IEEE International Parallel and distributed Processing Symposium
作者: Liu, Yudan Nassar, Raja Leangsuksun, Chokchai (Box) Naksinehaboon, Nichanion Paun, Mihaela Scott, Stephen L. Louisiana Tech Univ Coll Engn & Sci Ruston LA 71270 USA Oak Ridge Natl Lab Comp Sci & Math Div Oak Ridge TN 37831 USA
The increase in the physical size of High Performance Computing (HPC) platform makes system reliability more challenging. In order to minimize the performance loss (rollback and checkpoint overheads) due to unexpected... 详细信息
来源: 评论