检索结果-内蒙古大学图书馆

26th International Conference on Parallel and Distributed Computing (Euro-Par)

作者： Jeong, Haewon Yang, Yaoqing Gupta, Vipul Engelmann, Christian Low, Tze Meng Cadambe, Viveck Ramchandran, Kannan Grover, Pulkit Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Calif Berkeley Berkeley CA USA Oak Ridge Natl Lab Oak Ridge TN USA Penn State Univ State Coll PA USA

ISBN: (纸本)9783030576752;9783030576745

In this paper, we propose a novel fault-tolerant parallel matrix multiplication algorithm called 3D Coded SUMMA that achieves higher failure-tolerance than replication-based schemes for the same amount of redundancy. This work bridges the gap between recent developments in coded computing and fault-tolerance in high-performance computing (HPC). The core idea of coded computing is the same as algorithm-based fault-tolerance (ABFT), which is weaving redundancy in the computation using error-correcting codes. In particular, we show that MatDot codes, an innovative code construction for parallel matrix multiplications, can be integrated into three-dimensional SUMMA (Scalable Universal Matrix Multiplication Algorithm [30]) in a communication-avoiding manner. To tolerate any two node failures, the proposed 3D Coded SUMMA requires similar to 50% less redundancy than replication, while the overhead in execution time is only about 5-10%.

关键词： Parallel matrix multiplication Fault-tolerant algorithms Algorithm-based fault tolerance Coded computing communication-efficient algorithms Error detection and correction

来源：评论

学校读者我要写书评

暂无评论

communication-efficient Distributed Skyline Computation 17

Communication-Efficient Distributed Skyline Computation

引用

ACM Conference on Information and Knowledge Management (CIKM)

作者： Zhang, Haoyu Zhang, Qin Indiana Univ Bloomington Bloomington IN 47408 USA

ISBN: (纸本)9781450349185

In this paper we study skyline queries in the distributed computational model, where we have s remote sites and a central coordinator;each site holds a piece of data, and the coordinator wants to compute the skyline of the union of the s datasets. The computation is in terms of rounds, and the goal is to minimize both the total communication cost and the round cost. We first give an algorithm with a small communication cost but potentially a large round cost;we show information-theoretically that the communication cost is optimal even if we allow an infinite number of communication rounds. We next give algorithms with smooth communication-round tradeoffs. We also show a strong lower bound for the communication cost if we can only use one round of communication. Finally, we demonstrate the superiority of our algorithms over existing ones by an extensive set of experiments on both synthetic and real world datasets.

关键词： skyline computation communication-efficient algorithms distributed computation

来源：评论

学校读者我要写书评

暂无评论

Minimizing Staleness and communication Overhead in Distributed SGD for Collaborative Filtering

引用

IEEE TRANSACTIONS ON COMPUTERS 2023年第10期72卷 2925-2937页

作者： Abubaker, Nabil Caglayan, Orhun Karsavuran, M. Ozan Aykanat, Cevdet Bilkent Univ Dept Comp Engn TR-06800 Ankara Turkiye Facebook London London W1T 1FB England Lawrance Berkely Natl Lab Berkeley CA 94720 USA

Distributed asynchronous stochastic gradient descent (ASGD) algorithms that approximate low-rank matrix factorizations for collaborative filtering perform one or more synchronizations per epoch where staleness is reduced with more synchronizations. However, high number of synchronizations would prohibit the scalability of the algorithm. We propose a parallel ASGD algorithm, ?-PASGD, for efficiently handling ? synchronizations per epoch in a scalable fashion. The proposed algorithm puts an upper limit of K on ?, for a K-processor system, such that per-forming ? = K synchronizations per epoch would eliminate the staleness completely. The rating data used in collaborative filtering are usually represented as sparse matrices. The sparsity allows for reduction in the staleness and communication overhead combinatorially via intelligently distributing the data to processors. We analyze the staleness and the total volume incurred during an epoch of ?-PASGD. Following this analysis, we propose a hypergraph par-titioning model to encapsulate reducing staleness and volume while minimizing the maximum number of synchronizations required for a stale-free SGD. This encapsulation is achieved with a novel cutsize metric that is realized via a new recursive-bipartitioning-based algorithm. Experiments on up to 512 processors show the impor-tance of the proposed partitioning method in improving staleness, volume, RMSE and parallel runtime.

关键词： Recommender systems collaborative filtering matrix completion distributed-memory parallel stochastic gradient descent communication-efficient algorithms MPI hypergraph partitioning

来源：评论

学校读者我要写书评

暂无评论

High-performance direct algorithms for computing the sign function of triangular matrices

引用

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS 2018年第2期25卷 1-1页

作者： Stotland, Vadim Schwartz, Oded Toledo, Sivan Tel Aviv Univ Blavatnik Sch Comp Sci Tel Aviv Israel Hebrew Univ Jerusalem Benin Sch Comp Sci & Engn Jerusalem Israel

algorithms and implementations for computing the sign function of a triangular matrix are fundamental building blocks for computing the sign of arbitrary square real or complex matrices. We present novel recursive and cache-efficient algorithms that are based on Higham's stabilized specialization of Parlett's substitution algorithm for computing the sign of a triangular matrix. We show that the new recursive algorithms are asymptotically optimal in terms of the number of cache misses that they generate. One algorithm that we present performs more arithmetic than the nonrecursive version, but this allows it to benefit from calling highly optimized matrix multiplication routines;the other performs the same number of operations as the nonrecursive version, suing custom computational kernels instead. We present implementations of both, as well as a cache-efficient implementation of a block version of Parlett's algorithm. Our experiments demonstrate that the blocked and recursive versions are much faster than the previous algorithms and that the inertia strongly influences their relative performance, as predicted by our analysis.

关键词： blocked matrix algorithms cache-efficient algorithms communication-efficient algorithms matrix functions partitioned matrix algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：