We investigate synchronization activities in applications executing on distributed-memory MIMD architectures. Three applications are used to quantify the performance impact of synchronization as the number of processo...
详细信息
ISBN:
(纸本)081864222X
We investigate synchronization activities in applications executing on distributed-memory MIMD architectures. Three applications are used to quantify the performance impact of synchronization as the number of processors is increased. We also investigate the performance improvement possible when synchronization is supported in hardware. The results show that significant performance improvement can be achieved. The hardware support should include barrier synchronization, operate-and-broadcast, and operations over subsets of processors.
For a database system executing on a multiprocessor system, performance can be improved by inter-query parallelism as well as by intra-query parallelism. Since database applications tend to be 1/0 bound, it is importa...
详细信息
Graph coloring problems are among the most fundamental problems in parallel and distributed computing, and have been studied extensively in both settings. In this context, designing efficient deterministic algorithms ...
详细信息
ISBN:
(纸本)9798350387117;9798350387124
Graph coloring problems are among the most fundamental problems in parallel and distributed computing, and have been studied extensively in both settings. In this context, designing efficient deterministic algorithms for these problems has been found particularly challenging. In this work we consider this challenge, and design a novel framework for derandomizing algorithms for coloring-type problems in the Massively parallel Computation (MPC) model with sublinear space. We give an application of this framework by showing that a recent (degree + 1)-list coloring algorithm by Halldorsson et al. (STOC'22) in the LOCAL model of distributed computation can be translated to the MPC model and efficiently derandomized. Our algorithm runs in O(log log log n) rounds, which matches the complexity of the state of the art algorithm for the (Delta + 1)-coloring problem.
This paper addresses the problem of analyzing the performance of parallel algorithms for the training procedure of a neural network based Fingerprint Image Comparison (FIC) system. The target architecture is assumed t...
详细信息
This paper addresses the problem of analyzing the performance of parallel algorithms for the training procedure of a neural network based Fingerprint Image Comparison (FIC) system. The target architecture is assumed to be a coarse-grain distributed memory parallel architecture. Two types of parallelism: node parallelism and training set parallelism (TSP) are investigated. These algorithms are implemented on a 32 node CM-5. Theoretical analysis and experimental results comparing the performance of these algorithms are presented.
This paper provides necessary and sufficient conditions for deadlock-free unicast and multicast routing with the path-based routing model in interconnection networks which use the wormhole switching technique. The the...
详细信息
This paper provides necessary and sufficient conditions for deadlock-free unicast and multicast routing with the path-based routing model in interconnection networks which use the wormhole switching technique. The theory is developed around three central concepts: channel waiting, False Resource Cycles, and valid destination sets. The first two concepts are suitable extensions to those developed for unicast routing by two authors of this paper;the third concept has been developed by Lin and Ni. The necessary and sufficient conditions can be applied in a straightforward manner to prove deadlock freedom and to find more adaptive routing algorithms for collective communication. The latter point is illustrated by developing two routing algorithms for multicast communication in 2-D mesh architectures. The first algorithm uses fewer resources (channels) than an algorithm proposed in the literature but achieves the same adaptivity. The second achieves full adaptivity for multicast routing.
We describe our implementation, with virtual processing, of several parallel graph algorithms on a 16,384-processor MasPar MP-1. We present extensive test data on our code.
We describe our implementation, with virtual processing, of several parallel graph algorithms on a 16,384-processor MasPar MP-1. We present extensive test data on our code.
B-trees are used for accessing large database files, stored in lexicographic order on the secondary storage devices. Algorithms for concurrent B-tree data structures achieve only limited speedup when implemented on a ...
详细信息
B-trees are used for accessing large database files, stored in lexicographic order on the secondary storage devices. Algorithms for concurrent B-tree data structures achieve only limited speedup when implemented on a parallel computer. To improve the performance, a variant of Blink-tree, called Bmad-tree, which allows insertion without node splits, with multiple access in its leaf nodes, and dilation in both the index and the leaf nodes is proposed.
Branch-and-bound algorithms are general methods applicable to various combinatorial optimization problems and parallelization is one of the most hopeful methods to improve these algorithms. parallel branch-and-bound a...
详细信息
ISBN:
(纸本)0818677937
Branch-and-bound algorithms are general methods applicable to various combinatorial optimization problems and parallelization is one of the most hopeful methods to improve these algorithms. parallel branch-and-bound algorithm implementations can be divided in two types based on whether a central or a distributed control scheme is used. Central control schemes have reduced scalability because of bottleneck problems frequently encountered. In order to solve problem cases that cannot be solved with sequential branch-and-bound algorithm, distributed control schemes are necessary. However compared to central control schemes, higher efficiency is not always achieved through the use of a distributed control scheme. In this paper a mixed control scheme is proposed, changing between the two different types of control schemes during execution. Irt addition, a dynamic load balancing strategy is applied in the distributed control scheme. Performance evaluation for three different cases is carried out: central, distributed, and mixed control scheme. Several TSP instances from the TSPLIB are experimentally solved, using up to 101 workstations. The results of these experiments show that the mixed control scheme is one of the most promising control schemes and furthermore, the hybrid selection rule which was introduced in our previous work has an advantage in parallel branch-and-bound algorithms.
PACK/UNPACK are Fortran 90/HPF array construction functions which derive new arrays from existing arrays. We present algorithms for performing these operations on coarse-grained parallel machines. Our algorithms are r...
详细信息
ISBN:
(纸本)0818672552
PACK/UNPACK are Fortran 90/HPF array construction functions which derive new arrays from existing arrays. We present algorithms for performing these operations on coarse-grained parallel machines. Our algorithms are relatively architecture independent and can be applied to arrays of arbitrary dimensions with arbitrary distribution along every dimension. Experimental results are presented on the CM-5.
Detecting similar pairs in large biological sequence collections is one of the most commonly performed tasks in computational biology. With the advent of high throughput sequencing technologies the problem regained si...
详细信息
ISBN:
(纸本)9781479941162
Detecting similar pairs in large biological sequence collections is one of the most commonly performed tasks in computational biology. With the advent of high throughput sequencing technologies the problem regained significance as data sets with millions of sequences became ubiquitous. This paper is an initial report on our parallel, distributed memory and sketching-based approach to constructing large-scale sequence similarity graphs. We develop load balancing techniques, derived from multi-way number partitioning and work stealing, to manage computational imbalance and ensure scalability on thousands of processors. Our experimental results show that the method is efficient, and can be used to analyze data sets with millions of DNA sequences in acceptable time limits.
暂无评论