检索结果-内蒙古大学图书馆

26th International Conference on Architecture of Computing Systems (ARCS)

作者： Lim, JongBeom Chung, Kwang-Sik Gil, Joon-Min Suh, TaeWeon Yu, HeonChang Korea Univ Dept Comp Sci Educ Seoul South Korea Korea Natl Open Univ Dept Comp Sci Seoul South Korea Catholic Univ Sch Comp & Informat Commun Engn Daegu South Korea

ISBN: (纸本)9783642364242;9783642364235

Determining termination in dynamic environments is hard due to node joining and leaving. In previous studies on termination detection, some structures, such as spanning tree or computational tree, are used. In this work, we present an unstructured termination detection algorithm, which uses a gossip based scheme to cope with scalability and fault-tolerance issues. This approach allows the algorithm not to maintain specific structures even when nodes join and leave during runtime. These dynamic behaviors are prevalent in cloud computing environments and little attention has been paid by existing approaches. To measure the complexity of our proposed algorithm, a new metric, self-centered message complexity is used. Our evaluation over scalable settings shows that an unstructured approach has a significant merit to solve scalability and fault-tolerance problems with lower message complexity over existing algorithms.

关键词： Termination detection unstructured algorithm Gossip Cloud computing

来源：评论

学校读者我要写书评

暂无评论

unstructured deadlock detection technique with scalability and complexity-efficiency in clouds

引用

INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS 2014年第6期27卷 852-870页

作者： Lim, JongBeom Suh, Taeweon Yu, Heonchang Korea Univ Dept Comp Sci Educ Seoul South Korea

To detect deadlock in distributed systems, the initiator should construct an efficient explicit or implicit global wait-for graph. In this paper, we present an unstructured deadlock detection algorithm using a gossip protocol in cloud computing environments, where constituting nodes may join and leave at any time. Because of the inherit properties of a gossip protocol, we argue that our proposed deadlock detection algorithm is scalable, fault-tolerant, and efficient, retaining safety and liveness properties. The correctness proof of the algorithm is also provided. The message complexity of our proposed algorithm is O(n), where n is the number of nodes. Our performance evaluation with scalable settings shows that our approach has a significant advantage over previous deadlock detection algorithms in terms of solving scalability, fault-tolerance, and complexity-efficiency issues. Copyright (c) 2013 John Wiley & Sons, Ltd.

关键词： deadlock detection unstructured algorithm gossip protocol cloud computing

来源：评论

学校读者我要写书评

暂无评论

HPC formulations of optimization algorithms for tensor completion

引用

PARALLEL COMPUTING 2018年 74卷 99-117页

作者： Smith, Shaden Park, Jongsoo Karypis, George Univ Minnesota Dept Comp Sci & Engn Minneapolis MN 55455 USA Facebook Menlo Pk CA USA

Tensor completion is a powerful tool used to estimate or recover missing values in multi way data. It has seen great success in domains such as product recommendation and healthcare. Tensor completion is most often accomplished via low-rank sparse tensor factorization, a computationally expensive non-convex optimization problem which has only recently been studied in the context of parallel computing. In this work, we study three optimization algorithms that have been successfully applied to tensor completion: alternating least squares (ALS), stochastic gradient descent (SGD), and coordinate descent (CCD++). We explore opportunities for parallelism on shared- and distributed-memory systems and address challenges such as memory- and operation-efficiency, load balance, cache locality, and communication. Among our advancements are a communication efficient CCD++ algorithm, an ALS algorithm rich in level-3 BLAS routines, and an SGD algorithm which combines stratification with asynchronous communication. Furthermore, we show that introducing randomization during ALS and CCD++ can accelerate convergence. We evaluate our parallel formulations on a variety of real datasets on a modern supercomputer and demonstrate speedups through 16384 cores. These improvements reduce time-to-solution from hours to seconds on real-world datasets. We show that after our optimizations, ALS is advantageous on parallel systems of small-to-moderate scale, while both ALS and CCD++ provide the lowest time-to-solution on large-scale distributed systems. (C) 2017 Elsevier B.V. All rights reserved.

关键词： Sparse tensor Tensor completion unstructured algorithm Machine learning Factorization Recommender system

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：