Most static scheduling algorithms that schedule parallel programs represented by directed acyclic graphs (DAGs) are sequential. this paper discusses the essential issues on parallelization of static scheduling algorit...
详细信息
Most static scheduling algorithms that schedule parallel programs represented by directed acyclic graphs (DAGs) are sequential. this paper discusses the essential issues on parallelization of static scheduling algorithms. An efficient parallel scheduling algorithm, the HPMCP algorithm, is proposed. It produces high-quality scheduling and is much faster than existing algorithms.
this paper proposes a linear, deterministic, logical time model for distributed systems. We give an account of causality within distributed systems which undergirds the time model. We discuss some advantages for the a...
详细信息
this paper proposes a linear, deterministic, logical time model for distributed systems. We give an account of causality within distributed systems which undergirds the time model. We discuss some advantages for the application programmer in using our time model.
We empirically analyze and compare two distributed, low-overhead policies for scheduling dynamic tree-structured computations on rings of identical PEs. Our experiments show that both policies give significant paralle...
详细信息
We empirically analyze and compare two distributed, low-overhead policies for scheduling dynamic tree-structured computations on rings of identical PEs. Our experiments show that both policies give significant parallel speedup on large classes of computations, and that one yields almost optimal speedup on moderate size rings. We believe that our methodology of experiment design and analysis will prove useful in other such studies.
this paper addresses the problem of analyzing the performance of parallel algorithms for the training procedure of a neural network based Fingerprint Image Comparison (FIC) system. the target architecture is assumed t...
详细信息
this paper addresses the problem of analyzing the performance of parallel algorithms for the training procedure of a neural network based Fingerprint Image Comparison (FIC) system. the target architecture is assumed to be a coarse-grain distributed memory parallel architecture. Two types of parallelism: node parallelism and training set parallelism (TSP) are investigated. these algorithms are implemented on a 32 node CM-5. theoretical analysis and experimental results comparing the performance of these algorithms are presented.
A sorting device capable of sorting p items in constant time is cabled a p-sorter. It is known that the task of sorting N items using a p-sorter requires at least Omega (N log N/p log p) applications of the p-sorter. ...
详细信息
ISBN:
(纸本)0818676833
A sorting device capable of sorting p items in constant time is cabled a p-sorter. It is known that the task of sorting N items using a p-sorter requires at least Omega (N log N/p log p) applications of the p-sorter. this bound is tight: there exist algorithms that use O (N log N/p log p) calls to the p-sorter to sort N items. However, there is no known implementable algorithm that can sort N items in O (N log N/p log p) time using a p-sorter. the main contribution of this paper is to propose a simple VLSI architecture and to show that in our architecture N items can be sorted in O (N log N/p log p) calls to the p-sorter, while enforcing conflict-free memory accesses. An important feature of our design is that the total additional VLSI area for hardware, other than the memory for data and the p-sorter, is kept to a minimum.
Index-shuffle graphs are introduced as candidate interconnection networks for parallel computers. the comparative advantages of index-shuffle graphs over the standard bounded-degree ''approximations'' ...
详细信息
ISBN:
(纸本)0818676833
Index-shuffle graphs are introduced as candidate interconnection networks for parallel computers. the comparative advantages of index-shuffle graphs over the standard bounded-degree ''approximations'' of the hypercube, namely butterfly-like and shuffle-like graphs, are demonstrated in the theoretical framework of graph embedding and network emulations. An N-node index-shuffle graph emulates: (1) an N-node shuffle-exchange graph with no slowdown, while the currently best emulations of shuffle-like graphs by hypercubes and butterflies incur a slowdown of Omega(log N). (2) its like-sized buttelfly graph with a slowdown O(log log log N), while the currently best emulations of butterfly-like graphs by shuffle-like graphs incur a slow-down of Omega (log log N). (3) an N-node hypercube that executes an on-line leveled algorithm with a slowdown O(log log N) and without data circulation, while the slowdown of currently best such emulations of the hypercube by its bounded-degree shuffle-like and butterfly-like derivatives remains Omega(log N), and only if the entire local data set of every processor is allowed to circulate through the network.
We consider the problem of designing efficient parallel algorithms for summing and prefix summing. In this paper, we present optimal algorithms for summing on a latency-dependent distributed-memory model and show that...
详细信息
We consider the problem of designing efficient parallel algorithms for summing and prefix summing. In this paper, we present optimal algorithms for summing on a latency-dependent distributed-memory model and show that any optimal summing algorithm must have an inherent structure. Moreover, we present optimal or near-optimal algorithms for prefix summing for both non-commutative and commutative binary operators. Furthermore, we show that the optimal algorithms for prefix summing for these two types of operators are not equivalent.
B-trees are used for accessing large database files, stored in lexicographic order on the secondary storage devices. Algorithms for concurrent B-tree data structures achieve only limited speedup when implemented on a ...
详细信息
B-trees are used for accessing large database files, stored in lexicographic order on the secondary storage devices. Algorithms for concurrent B-tree data structures achieve only limited speedup when implemented on a parallel computer. To improve the performance, a variant of Blink-tree, called Bmad-tree, which allows insertion without node splits, with multiple access in its leaf nodes, and dilation in boththe index and the leaf nodes is proposed.
To avoid signal interference in mobile communication it is necessary that the channels used by base stations for broadcast communication within their cells are chosen so that the same channel is never concurrently use...
详细信息
ISBN:
(纸本)0818676833
To avoid signal interference in mobile communication it is necessary that the channels used by base stations for broadcast communication within their cells are chosen so that the same channel is never concurrently used by two neighboring stations. We model this channel allocation problem as a generalized list coloring problem and we provide two distributed solutions which are also able to cope with crash failures by limiting the size of the network affected by a faulty station in terms of the distance from that station. Our first solution uses a powerful synchronization mechanism to achieve a response time that depends only on Delta, the maximum degree of the signal interference graph, and a failure locality of 4. Our second solution is a simple randomized solution in which each node can expect to pick f/(4 Delta) colors where f is the size of the list at the node;the response time of this solution is a constant and the failure locality 1. Besides being efficient (their complexity measures involve only small constants), the protocols presented in this work are simple and easy to apply in practice, provided the existence of distributed infrastructure in networks that are in use.
We give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem the normalized edit distance between two strings X and Y with lengths n greater than or equal to m is ...
详细信息
ISBN:
(纸本)0818676833
We give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem the normalized edit distance between two strings X and Y with lengths n greater than or equal to m is the minimum quotient of the sum of the costs of edit operations transforming X into Y by the length of the edit path corresponding to those edit operations. Marzal and Vidal proposed a sequential algorithm with a time complexity of O(nm(2)). Ne show that this algorithm can be parallelized work-optimally on an array of n (or m) processors, and on a mesh of n x m processors. We then propose a sublinear time algorithm that is almost work-optimal: using O(mn(1.75)) processors, the time complexity of the algorithm is O(n(0.75) log n) and the total number of operations is O(mn(2.5) log n). this algorithm runs on a CREW PRAM, but is likely to work an weaker PRAM models and hypercubes with minor modifications. Finally, we present a polylogarithmic O(log(2) n) time algorithm based an matrix multiplication which runs on a O(n(6)/log n) processor hypercube.
暂无评论