An interconnection network can be abstracted into a graph, and the basic mathematical research in the graph can provide a good reference for the research in the practical application. The study of node-independent spa...
详细信息
ISBN:
(纸本)9781665496391
An interconnection network can be abstracted into a graph, and the basic mathematical research in the graph can provide a good reference for the research in the practical application. The study of node-independent spanning trees (node-ISTs) in a graph has received extensive attention because of their application in reliable communication, fault-tolerant broadcasting and secure message distribution, and has achieved remarkable results on many special networks. But there are few results in the line graph of them. As one of the typical variations of hypercube, locally twisted cube has many excellent properties, whose line graph has all the advantages of locally twisted cube. So it makes sense to do some research on the line graph of locally twisted cube. In this paper, we propose a parallel algorithm to construct 2n -2 node-ISTs rooted at node [u,N(u, 2)], where u is an arbitrary node on locally twisted cube and n >= 1. And the correctness of our algorithm is proved.
The Thick Control Flow (TCF) model simplifies parallelprogramming by bundling computations with the same control flow into single flows of variable thickness, and has the prospect of alleviating redundant usage of so...
详细信息
The Thick Control Flow (TCF) model simplifies parallelprogramming by bundling computations with the same control flow into single flows of variable thickness, and has the prospect of alleviating redundant usage of software and hardware resources. While architectures that can support the TCF model have been proposed, current proposals cannot support concurrent memory accesses that can both simplify programming and speed up many parallel algorithms by a logarithmic factor. In this paper, we extend current TCF architectures to efficiently support concurrent read as well as write memory accesses. The solution is based on bounded size step-caches, and exploit the two-part, hybrid, frontend-backend structure of current TCF processors, and synchronization properties of the TCF model itself. According to our simulation-based evaluation, a concurrent memory access TCF processor with B backends can execute algorithms with substantial concurrent memory accesses up to B times faster than a baseline TCF processor not supporting concurrent memory access. The hardware overhead of the solution is estimated to be modest. We include parallel program code to illustrate the gains by supporting concurrent memory accesses. (C) 2018 Elsevier B.V. All rights reserved.
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carr...
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and without change. Their inclusion in this publication does not necessarily constitute endorsement by the editors or the Institute of Electrical and Electronics Engineers, Inc.
Many traditional clustering algorithms have the scalability problem while dealing with large data sets. One common strategy to handle the problem is to parallelize the algorithms and execute them along with the input ...
详细信息
ISBN:
(纸本)9780769548982;9781467345668
Many traditional clustering algorithms have the scalability problem while dealing with large data sets. One common strategy to handle the problem is to parallelize the algorithms and execute them along with the input data on high-performance computers. However, many graph-based clustering algorithms are hard to be parallelized since they need to calculate the similarity of all-pairs of all data nodes. In this paper, we propose a new parallel clustering algorithm, called the Para-CPLM (parallel Clustering based on Partitions of Local Minimal-spanning-trees), which is based on three strategies - graph-based clustering, granular computing, and partition-and-merge. The Para-CPLM partitions the data domain into several regions for parallel execution, and then establishes a local minimal spanning tree in each region. After being established, the Para-CPLM combines those local minimal spanning trees and applies a method, namely the GBC method, to determine the best number of clusters. After the first phase of clustering, it repeatedly finds better pairs (edges) of the inter-clusters to reform the merged tree structure, such that the tree becomes closer to a global minimal spanning tree. Consequently, it uses the GBC method again to find the best number of clusters. From our experimental results, the Para-CPLM achieves significantly shorter execution time and better scalability while compared with the sequential GBC method. In addition, the clustering results are almost identical to those produced by the sequential GBC method.
In this paper we consider some of the central issues involved in the design of parallel algorithms. We describe several efficient algorithms for idealised shared memory architectures and draw some conclusions as to wh...
详细信息
In this paper we consider some of the central issues involved in the design of parallel algorithms. We describe several efficient algorithms for idealised shared memory architectures and draw some conclusions as to what would be required to implement them on a realistic physical architecture, i.e. one with distributed memory. We also describe some systolic algorithms for matrix computations, sequence comparison and molecular modelling, and briefly discuss their implementation on arrays of transputers. In the final section we discuss the question of whether the current preoccupation with architectural details in parallel algorithm design is likely to persist. We briefly describe some techniques which show that a physically realistic general purpose parallel architecture based on distributed memory can be constructed which will execute any shared memory parallel algorithm with no significant overhead due to communication. We thus have the attractive prospect in the very near future of architectural independence in parallel algorithm design.
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carr...
Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page. They reflect the authors' opinions and, in the interests of timely dissemination, are published as presented and without change. Their inclusion in this publication does not necessarily constitute endorsement by the editors or the Institute of Electrical and Electronics Engineers, Inc.
Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover...
ISBN:
(纸本)0769509363
Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original conference proceedings.
In this paper, we describe the design and implementation of a portable run-time system for GOP, a graph-oriented programming framework aiming at providing high-level abstractions for configuring and programming cooper...
ISBN:
(纸本)0769509363
In this paper, we describe the design and implementation of a portable run-time system for GOP, a graph-oriented programming framework aiming at providing high-level abstractions for configuring and programming cooperative parallel processes. The runtime system provides an interface with a library of programming primitives to the low-level facilities required to support graph-oriented communications and synchronization. The implementation is on top of the parallel Virtual Machine (PVM) in a local area network of Sun workstations. Issues related to the implementation of a graph operations in a distributed environment are discussed. Performance of the runtime system is evaluated by estimating the overheads associated with using GOP primitives as opposed to PVM.
We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallelprogramming on may-core architectures. We show that the NB-FEB primitive is universal, s...
详细信息
ISBN:
(纸本)9781605583976
We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallelprogramming on may-core architectures. We show that the NB-FEB primitive is universal, scalable and feasible. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently mitigates performance degradation due to synchronization "hot spots" (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility).
The split and merge model is a reasonable method for architecture-independent programming of global image processing operations on parallelarchitectures. We consider image connected components from the point of view ...
详细信息
暂无评论