This paper examines a novel parallel computation model called bulk synchronous farm (BSF) that focuses on estimating the scalability of compute-intensive iterative algorithms aimed at cluster computing systems. The ma...
详细信息
This paper examines a novel parallel computation model called bulk synchronous farm (BSF) that focuses on estimating the scalability of compute-intensive iterative algorithms aimed at cluster computing systems. The main advantage of the proposed model is that it allows to estimate the scalability of a parallel algorithm before its implementation. Another important feature of the BSF model is the representation of problem data in the form of lists that greatly simplifies the logic of building applications. In the BSF model, a computer is a set of processor nodes connected by a network and organized according to the master/slave paradigm. A cost metric of the BSF model is presented. This cost metric requires the algorithm to be represented in the form of operations on lists. This allows us to derive an equation that predicts the scalability boundary of a parallel program: the maximum number of processor nodes after which the speedup begins to decrease. The paper includes examples of applying the BSF model to designing and analyzing parallel numerical algorithms. The large-scale computational experiments conducted on a cluster computing system confirm the adequacy of the analytical estimations obtained using the BSF model. (C) 2020 Elsevier Inc. All rights reserved.
parallel computation model is an abstraction for the performance characteristics of parallel computers, and should evolve with the development of computational infrastructure. The heterogeneous CPU/Graphics Processing...
详细信息
parallel computation model is an abstraction for the performance characteristics of parallel computers, and should evolve with the development of computational infrastructure. The heterogeneous CPU/Graphics Processing Unit (GPU) systems have been and will be important platforms for scientific computing, which introduces an urgent demand for new parallel computation models targeting this kind of supercomputers. In this research, we propose a parallel computation model called HLog(n)GP to abstract the computation and communication features of heterogeneous platforms like TH-1A. All the substantial parameters of HLog(n)GP are in vector form and deal with the new features in GPU clusters. A simplified version HLog(3)GP of the proposed model is mapped to a specific GPU cluster and verified with two typical benchmarks. Experimental results show that HLog(3)GP outperforms the other two evaluated models and can well model the new particularities of GPU clusters. Copyright (c) 2015 John Wiley & Sons, Ltd.
We present a structured light measurement system to collect high accuracy surface information of the measured object with a good real-time performance. Utilizing phase-shifting method in conjunction with a matching me...
详细信息
ISBN:
(纸本)9781479980062
We present a structured light measurement system to collect high accuracy surface information of the measured object with a good real-time performance. Utilizing phase-shifting method in conjunction with a matching method proposed in this paper which can significantly reduce the noisy points, we can achieve high accuracy and noiseless point cloud in a complex industrial environment. Due to the use of the heterogeneous parallel computation model, the parallelism of the algorithm is developed in a deep way. The OpenMP+CUDA hybrid computing model is then used in the system to get a better real-time performance.
This article describes a method for creating applications for cluster computing systems using the parallel BSF-skeleton based on the original BSF (Bulk Synchronous Farm) model of parallelcomputations developed by the...
详细信息
This article describes a method for creating applications for cluster computing systems using the parallel BSF-skeleton based on the original BSF (Bulk Synchronous Farm) model of parallelcomputations developed by the author earlier. This model uses the master/slave paradigm. The main advantage of the BSF model is that it allows to estimate the scalability of a parallel algorithm before its implementation. Another important feature of the BSF model is the representation of problem data in the form of lists that greatly simplifies the logic of building applications. The BSF-skeleton is designed for creating parallel programs in C++ using the MPI library. The scope of the BSF-skeleton is iterative numerical algorithms of high computational complexity. The BSF-skeleton has the following distinctive features. The BSF-skeleton completely encapsulates all aspects that are associated with parallelizing a program. The BSF-skeleton allows error-free compilation at all stages of application development. The BSF-skeleton supports OpenMP programming model and workflows. (C) 2021 The Author(s). Published by Elsevier B.V.
This article presents a new high-level parallelcomputational model named BSF "- Bulk Synchronous Farm. The BSF model extends the BSP model to deal with the compute intensive iterative numerical methods executed ...
详细信息
This article presents a new high-level parallelcomputational model named BSF "- Bulk Synchronous Farm. The BSF model extends the BSP model to deal with the compute intensive iterative numerical methods executed on distributed-memory multiprocessor systems. The BSF model is based on the master-worker paradigm and the SPMD programming model. The BSF model makes it possible to predict the upper scalability bound of a BSF-program with great accuracy. The BSF model also provides equations for estimating the speedup and parallel efficiency of a BSF-program.
We propose the concept of parallel cellular matrix which partitions the Euclidean plane defined by input data into an appropriate number of uniform cell units. Each cell is responsible of a certain part of the data an...
详细信息
ISBN:
(纸本)9781509002467
We propose the concept of parallel cellular matrix which partitions the Euclidean plane defined by input data into an appropriate number of uniform cell units. Each cell is responsible of a certain part of the data and the network of the self-organizing map (SOM), and carries out massive parallel spiral searches based on the cellular matrix topology. The advantage of the proposed model is that it is decentralized and based on data decomposition. The required processing units and memory are with linearly increasing relationship to the problem size. Based on the cellular matrix model, the parallel SOM is implemented to deal with various applications including the traveling salesman problem, structured mesh generation, and superpixel adaptive segmentation map. Experimental results of our GPU implementation show that the running time increases in a linear way with a very weak increasing coefficient according to the input size. The proposed cellular matrix model is suitable to deal with large scale problems in a massively parallel way.
We propose the concept of parallel cellular matrix which partitions the Euclidean plane defined by input data into an appropriate number of uniform cell units. Each cell is responsible of a certain part of the data an...
详细信息
ISBN:
(纸本)9781509002474
We propose the concept of parallel cellular matrix which partitions the Euclidean plane defined by input data into an appropriate number of uniform cell units. Each cell is responsible of a certain part of the data and the network of the self-organizing map (SOM), and carries out massive parallel spiral searches based on the cellular matrix topology. The advantage of the proposed model is that it is decentralized and based on data decomposition. The required processing units and memory are with linearly increasing relationship to the problem size. Based on the cellular matrix model, the parallel SOM is implemented to deal with various applications including the traveling salesman problem, structured mesh generation, and superpixel adaptive segmentation map. Experimental results of our GPU implementation show that the running time increases in a linear way with a very weak increasing coefficient according to the input size. The proposed cellular matrix model is suitable to deal with large scale problems in a massively parallel way.
Memory hierarchy on multi-core clusters has twofold characteristics: vertical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstract memory hierarchy ...
详细信息
Memory hierarchy on multi-core clusters has twofold characteristics: vertical memory hierarchy and horizontal memory hierarchy. This paper proposes new parallel computation model to unitedly abstract memory hierarchy on multi-core clusters in vertical and horizontal levels. Experimental results show that new model can predict communication costs for message passing on multi-core clusters more accurately than previous models, only incorporated vertical memory hierarchy. The new model provides the theoretical underpinning for the optimal design of MPI collective operations. Aimed at horizontal memory hierarchy, our methodology for optimizing collective operations on multi-core clusters focuses on hierarchical virtual topology and cache-aware intra-node communication, incorporated into existing collective algorithms in MPICH2. As a case study, multi-core aware broadcast algorithm has been implemented and evaluated. The results of performance evaluation show that the above methodology for optimizing collective operations on multi-core clusters is efficient.
The reconfigurable architecture is a parallel computation model that consists of many processor elements (PEs) and a reconfigurable bus system. There are many variant proposed reconfigurable architectures, for example...
详细信息
The reconfigurable architecture is a parallel computation model that consists of many processor elements (PEs) and a reconfigurable bus system. There are many variant proposed reconfigurable architectures, for example, reconfigurable mesh (R-Mesh), directional reconfigurable mesh (DR-Mesh), processor arrays with reconfigurable bus systems (PARBS), complete directional processor arrays with reconfigurable bus systems (CD-PARBS), reconfigurable multiple bus machine (RMBM), directional reconfigurable multiple bus machine (directional RMBM), and etc. In this paper, a transitive closure (TC) algorithm of digraph is proposed on the models without the directional capability (non-directional). Some related digraph problems, such as strongly connected digraph, strongly connected component (SCC), cyclic checking, and tree construction, can also be resolved by modifying our transitive closure algorithm. All the proposed algorithms are designed on a three-dimensional (3-D) nxnxn non-directional reconfigurable mesh, n is the number of vertices in a digraph D, and can resolve the respective problems in O(log d(D)) time, d(D) is the diameter of the digraph D. The cyclic checking problem can be further reduced to O(log c(D)) time, c(D) is the minimum distance of cycles in the digraph D. There exist two different approaches: the matrix multiplication approach on the non-directional models for algebraic path problems (APP) and s-t connectivity approach on the directional models. In this paper, we will use the tree construction algorithm to prove those two approaches are insufficient to resolve all digraph problems and demonstrate why our approach is so important and innovative for digraph problems on the reconfigurable models.
While BSP model is a concise parallel computation model, it has some limitations in functionality and performance. To overcome those limitations, we propose a task-oriented parallel computation model, named LilyTask, ...
详细信息
ISBN:
(纸本)3540200541
While BSP model is a concise parallel computation model, it has some limitations in functionality and performance. To overcome those limitations, we propose a task-oriented parallel computation model, named LilyTask, while remaining the virtue of conciseness. In LilyTask, tasks are taken as scheduling units in parallelcomputation, which may directly reflect the decomposition process of original problems. Further more, a task in LilyTask is allowed to generate subtasks of various granularities at runtime, which makes LilyTask very suitable for irregular problems. LilyTask model may also be applied to computational Grid to solve coarse-granularity parallel problems and loosely-coupled problems sets.
暂无评论