检索结果-内蒙古大学图书馆

BSF: A parallel computation model for scalability estimation of iterative numerical algorithms on cluster computing systems

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2021年 149卷 193-206页

作者： Sokolinsky, Leonid B. South Ural State Univ Natl Res Univ 76 Lenin Prospekt Chelyabinsk 454080 Russia

This paper examines a novel parallel computation model called bulk synchronous farm (BSF) that focuses on estimating the scalability of compute-intensive iterative algorithms aimed at cluster computing systems. The main advantage of the proposed model is that it allows to estimate the scalability of a parallel algorithm before its implementation. Another important feature of the BSF model is the representation of problem data in the form of lists that greatly simplifies the logic of building applications. In the BSF model, a computer is a set of processor nodes connected by a network and organized according to the master/slave paradigm. A cost metric of the BSF model is presented. This cost metric requires the algorithm to be represented in the form of operations on lists. This allows us to derive an equation that predicts the scalability boundary of a parallel program: the maximum number of processor nodes after which the speedup begins to decrease. The paper includes examples of applying the BSF model to designing and analyzing parallel numerical algorithms. The large-scale computational experiments conducted on a cluster computing system confirm the adequacy of the analytical estimations obtained using the BSF model. (C) 2020 Elsevier Inc. All rights reserved.

关键词： parallel computation model Cluster computing systems Iterative numerical algorithms BSF model Bulk synchronous farm

来源：评论

学校读者我要写书评

暂无评论

HLog_nGP: A parallel computation model for GPU clusters

引用

CONCURRENCY AND computation-PRACTICE & EXPERIENCE 2015年第17期27卷 4880-4896页

作者： Lu, Fengshun Song, Junqiang Pang, Yufei China Aerodynam Res & Dev Ctr Mianyang 621000 Sichuan Peoples R China Natl Univ Def Technol Coll Comp Changsha 410073 Hunan Peoples R China

parallel computation model is an abstraction for the performance characteristics of parallel computers, and should evolve with the development of computational infrastructure. The heterogeneous CPU/Graphics Processing Unit (GPU) systems have been and will be important platforms for scientific computing, which introduces an urgent demand for new parallel computation models targeting this kind of supercomputers. In this research, we propose a parallel computation model called HLog(n)GP to abstract the computation and communication features of heterogeneous platforms like TH-1A. All the substantial parameters of HLog(n)GP are in vector form and deal with the new features in GPU clusters. A simplified version HLog(3)GP of the proposed model is mapped to a specific GPU cluster and verified with two typical benchmarks. Experimental results show that HLog(3)GP outperforms the other two evaluated models and can well model the new particularities of GPU clusters. Copyright (c) 2015 John Wiley & Sons, Ltd.

关键词： parallel computation model GPU cluster heterogeneous system performance evaluation LogGP TH-1A NPB

来源：评论

学校读者我要写书评

暂无评论

A Structured Light 3D Measurement System Based On Heterogeneous parallel computation model 15

A Structured Light 3D Measurement System Based On Heterogene...

引用

2015 15th IEEE ACM International Symposium on Cluster Cloud and Grid Computing (CCGrid 2015)

作者： Liu, Xiaoyu Sheng, Hao Zhang, Yang Xiong, Zhang Beihang Univ Sch Comp Sci & Engn State Key Lab Software Dev Environm Beijing 100191 Peoples R China Beihang Univ Shenzhen Res Inst Shenzhen 518057 Peoples R China

ISBN: (纸本)9781479980062

We present a structured light measurement system to collect high accuracy surface information of the measured object with a good real-time performance. Utilizing phase-shifting method in conjunction with a matching method proposed in this paper which can significantly reduce the noisy points, we can achieve high accuracy and noiseless point cloud in a complex industrial environment. Due to the use of the heterogeneous parallel computation model, the parallelism of the algorithm is developed in a deep way. The OpenMP+CUDA hybrid computing model is then used in the system to get a better real-time performance.

关键词： structured light phase matching parallel computation model real-time

来源：评论

学校读者我要写书评

暂无评论

BSF-skeleton: A template for parallelization of iterative numerical algorithms on cluster computing systems

引用

METHODSX 2021年 8卷 101437页

作者： Sokolinsky, Leonid B. South Ural State Univ 76 Lenin Prospekt Chelyabinsk 454080 Russia

This article describes a method for creating applications for cluster computing systems using the parallel BSF-skeleton based on the original BSF (Bulk Synchronous Farm) model of parallel computations developed by the author earlier. This model uses the master/slave paradigm. The main advantage of the BSF model is that it allows to estimate the scalability of a parallel algorithm before its implementation. Another important feature of the BSF model is the representation of problem data in the form of lists that greatly simplifies the logic of building applications. The BSF-skeleton is designed for creating parallel programs in C++ using the MPI library. The scope of the BSF-skeleton is iterative numerical algorithms of high computational complexity. The BSF-skeleton has the following distinctive features. The BSF-skeleton completely encapsulates all aspects that are associated with parallelizing a program. The BSF-skeleton allows error-free compilation at all stages of application development. The BSF-skeleton supports OpenMP programming model and workflows. (C) 2021 The Author(s). Published by Elsevier B.V.

关键词： parallel computation model C++ MPI Master/slave framework Higher-order function Map/Reduce Scalability boundary prediction

来源：评论

学校读者我要写书评

暂无评论

Analytical Estimation of the Scalability of Iterative Numerical Algorithms on Distributed Memory Multiprocessors

引用

LOBACHEVSKII JOURNAL OF MATHEMATICS 2018年第4期39卷 571-575页

作者： Sokolinsky, L. B. Natl Res Univ South Ural State Univ Lenin Prospekt 76 Chelyabinsk 454080 Russia

This article presents a new high-level parallel computational model named BSF "- Bulk Synchronous Farm. The BSF model extends the BSP model to deal with the compute intensive iterative numerical methods executed on distributed-memory multiprocessor systems. The BSF model is based on the master-worker paradigm and the SPMD programming model. The BSF model makes it possible to predict the upper scalability bound of a BSF-program with great accuracy. The BSF model also provides equations for estimating the speedup and parallel efficiency of a BSF-program.

关键词： parallel computation model bulk synchronous farm BSF model iterative algorithms distributed memory scalability bound

来源：评论

学校读者我要写书评

暂无评论

Massively parallel Cellular Matrix model for Self-organizing Map Applications

Massively Parallel Cellular Matrix Model for Self-organizing...

引用

IEEE Conference on Electronics, Circuits, and Systems (ICECS)

作者： Wang, Hongjian Mansouri, Abdelkhalek Creput, Jean-Charles Univ Technol Belfort Montbeliard IRTES SET F-90010 Belfort France

ISBN: (纸本)9781509002467

We propose the concept of parallel cellular matrix which partitions the Euclidean plane defined by input data into an appropriate number of uniform cell units. Each cell is responsible of a certain part of the data and the network of the self-organizing map (SOM), and carries out massive parallel spiral searches based on the cellular matrix topology. The advantage of the proposed model is that it is decentralized and based on data decomposition. The required processing units and memory are with linearly increasing relationship to the problem size. Based on the cellular matrix model, the parallel SOM is implemented to deal with various applications including the traveling salesman problem, structured mesh generation, and superpixel adaptive segmentation map. Experimental results of our GPU implementation show that the running time increases in a linear way with a very weak increasing coefficient according to the input size. The proposed cellular matrix model is suitable to deal with large scale problems in a massively parallel way.

关键词： parallel computation model self-organizing map traveling salesman problem mesh generation superpixel GPU implementation

来源：评论

学校读者我要写书评

暂无评论

Massively parallel Cellular Matrix model for Self-organizing Map Applications

Massively Parallel Cellular Matrix Model for Self-organizing...

引用

IEEE International Conference on Electronics, Circuits, and Systems

作者： Hongjian Wang Abdelkhalek Mansouri Jean-Charles Creput IRTES-SET Universite de Technologie de Belfort-Montbeliard

ISBN: (纸本)9781509002474

关键词： parallel computation model Self-organizing map Traveling salesman problem Mesh generation Superpixel GPU implementation

来源：评论

学校读者我要写书评

暂无评论

Performance analysis and optimization of MPI collective operations on multi-core clusters

引用

JOURNAL OF SUPERCOMPUTING 2012年第1期60卷 141-162页

作者： Tu, Bibo Fan, Jianping Zhan, Jianfeng Zhao, Xiaofang Chinese Acad Sci Inst Comp Technol Beijing 100190 Peoples R China Chinese Acad Sci Shenzhen Inst Adv Technol Shenzhen 518067 Peoples R China

Memory hierarchy on multi-core clusters has twofold characteristics: vertical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstract memory hierarchy on multi-core clusters in vertical and horizontal levels. Experimental results show that new model can predict communication costs for message passing on multi-core clusters more accurately than previous models, only incorporated vertical memory hierarchy. The new model provides the theoretical underpinning for the optimal design of MPI collective operations. Aimed at horizontal memory hierarchy, our methodology for optimizing collective operations on multi-core clusters focuses on hierarchical virtual topology and cache-aware intra-node communication, incorporated into existing collective algorithms in MPICH2. As a case study, multi-core aware broadcast algorithm has been implemented and evaluated. The results of performance evaluation show that the above methodology for optimizing collective operations on multi-core clusters is efficient.

关键词： parallel computation model Multi-core clusters Memory hierarchy MPI collective operations Data tiling

来源：评论

学校读者我要写书评

暂无评论

THE TRANSITIVE CLOSURE AND RELATED ALGORITHMS OF DIGRAPH ON THE RECONFIGURABLE ARCHITECTURE

引用

parallel PROCESSING LETTERS 2011年第1期21卷 27-43页

作者： Pan, Tien-Tai Lin, Shun-Shii Natl Taiwan Normal Univ Dept Comp Sci & Informat Engn Taipei 10610 Taiwan

The reconfigurable architecture is a parallel computation model that consists of many processor elements (PEs) and a reconfigurable bus system. There are many variant proposed reconfigurable architectures, for example, reconfigurable mesh (R-Mesh), directional reconfigurable mesh (DR-Mesh), processor arrays with reconfigurable bus systems (PARBS), complete directional processor arrays with reconfigurable bus systems (CD-PARBS), reconfigurable multiple bus machine (RMBM), directional reconfigurable multiple bus machine (directional RMBM), and etc. In this paper, a transitive closure (TC) algorithm of digraph is proposed on the models without the directional capability (non-directional). Some related digraph problems, such as strongly connected digraph, strongly connected component (SCC), cyclic checking, and tree construction, can also be resolved by modifying our transitive closure algorithm. All the proposed algorithms are designed on a three-dimensional (3-D) nxnxn non-directional reconfigurable mesh, n is the number of vertices in a digraph D, and can resolve the respective problems in O(log d(D)) time, d(D) is the diameter of the digraph D. The cyclic checking problem can be further reduced to O(log c(D)) time, c(D) is the minimum distance of cycles in the digraph D. There exist two different approaches: the matrix multiplication approach on the non-directional models for algebraic path problems (APP) and s-t connectivity approach on the directional models. In this paper, we will use the tree construction algorithm to prove those two approaches are insufficient to resolve all digraph problems and demonstrate why our approach is so important and innovative for digraph problems on the reconfigurable models.

关键词： Reconfigurable architecture parallel computation model reconfigurable mesh directional transitive closure strongly connected component digraph

来源：评论

学校读者我要写书评

暂无评论

LilyTask: A task-oriented parallel computation model

引用

5th International Workshop on Advanced parallel Processing Technologies

作者： Wang, T Li, XM Peking Univ Sch Elect Engn & Comp Sci Comp Networks & Distributed Syst Lab Beijing 100871 Peoples R China

ISBN: (纸本)3540200541

While BSP model is a concise parallel computation model, it has some limitations in functionality and performance. To overcome those limitations, we propose a task-oriented parallel computation model, named LilyTask, while remaining the virtue of conciseness. In LilyTask, tasks are taken as scheduling units in parallel computation, which may directly reflect the decomposition process of original problems. Further more, a task in LilyTask is allowed to generate subtasks of various granularities at runtime, which makes LilyTask very suitable for irregular problems. LilyTask model may also be applied to computational Grid to solve coarse-granularity parallel problems and loosely-coupled problems sets.

关键词： BSP parallel computation model task irregular problem computational Grid

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：