The proceedings contain 38 papers. The topics discussed include: efficient compilation of high-level data parallelalgorithms;improved abstract parity-declustered layouts for disk arrays;experiences with parallel n-bo...
ISBN:
(纸本)0897916719
The proceedings contain 38 papers. The topics discussed include: efficient compilation of high-level data parallelalgorithms;improved abstract parity-declustered layouts for disk arrays;experiences with parallel n-body simulation;a comparison of parallelalgorithms for connected components;studying overheads in massively parallel min/max-tree evaluation;scheduling trees using FIFO queues: a control-memory tradeoff;SIMD instruction cache;dynamic parallel tree contraction;improved bounds for routing and sorting on multi-dimensional meshes;parallel sorting by overpartitioning;an optimal randomized logarithmic time connectivity algorithm for the EREW PRAM;list ranking and list scan on the CRAY C-90;modeling communication in parallelalgorithms: a fruitful interaction between theory and systems?;on testing cache-coherent shared memories;diffracting trees;programming abstract DEC-alpha based multiprocessors the easy way;scheduling parallelizable tasks to minimize average response time;and an analysis of diffusive load-balancing.
A new approach to parallel sorting called parallel Sorting by OverPartitioning (PSOP) is presented. The approach limits the communication cost by moving each element between processors at most once, and leads to good ...
详细信息
We consider the problem of finding the row maxima of a totally monotone n×n matrix. Many geometric and combinatorial problems can be solved efficiently by reductions to this problem. It is known that the row maxi...
详细信息
ISBN:
(纸本)0898713293
We consider the problem of finding the row maxima of a totally monotone n×n matrix. Many geometric and combinatorial problems can be solved efficiently by reductions to this problem. It is known that the row maxima of a totally monotone n×n matrix can be computed sequentially in O(n) time. We give randomized parallelalgorithms for this problem which run in O(log n) time and O(log log n) time with high probability on the EREW and CRCW PRAMs respectively, and have a time-processor product of O(n). The run-times are the best possible, even for randomized parallelalgorithms. Previous polylogarithmic time parallel solutions to this problem had a time-processor product of Θ(n log n/log log n) or worse. Our result directly improves the parallel complexity of several problems.
A malleable parallel task is one that can be executed on any number of processors, with its execution time being a function of the number of processors allotted to it. A nonmalleable parallel task is one that requires...
详细信息
ISBN:
(纸本)0898713293
A malleable parallel task is one that can be executed on any number of processors, with its execution time being a function of the number of processors allotted to it. A nonmalleable parallel task is one that requires a specific number of processors. Given n independent parallel tasks and m identical processors, we consider the problem of scheduling these tasks to minimize makespan. We show that any algorithm for scheduling nonmalleable tasks can be extended to an algorithm for scheduling malleable tasks. The approximation factor of the new algorithm is identical to that of the original algorithm. Meanwhile, only an O(mn) term is added to the running time. Thus we get the best known approximation factor of a polynomial-time algorithm for scheduling malleable parallel tasks, and the fastest running time in which that factor can be achieved. These results can be extended to specific parallelarchitectures, where not only the number of processors allotted to a task, but also their configuration is a factor. Furthermore, we provide an algorithm for scheduling malleable tasks on a linear array of processors. In this case the addresses of the processors assigned to each task must be contiguous. Our algorithm has an approximation factor of 2, under a natural assumption on the task execution times. In contrast, the best known approximation factor for nonmalleable tasks in this case is 2.5. To the best of our knowledge, this is the only case where the best known approximation factor for malleable tasks is strictly less than that for nonmalleable tasks.
parallelalgorithms for various versions of the stable matching problem are presented. The algorithms are based on the primal-dual interior path-following method for linear programming. The main result is that a stabl...
详细信息
ISBN:
(纸本)0898713293
parallelalgorithms for various versions of the stable matching problem are presented. The algorithms are based on the primal-dual interior path-following method for linear programming. The main result is that a stable matching can be found in O*(√m) time by a polynomial number of processors, where m is the total length of preference lists of individuals.
We devise optimum parallelalgorithms for solving a banded linear system of equations and for computing the determinant of a banded matrix, substantially improving the previous record computational complexity estimate...
详细信息
ISBN:
(纸本)0898713293
We devise optimum parallelalgorithms for solving a banded linear system of equations and for computing the determinant of a banded matrix, substantially improving the previous record computational complexity estimates of [E]. Our algorithms are in NC or RNC and support the optimum bounds on the potential work (the product of time and processor bounds). Moreover, these algorithms are in NC1 or RNC1 if the bandwidth is a constant.
We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multi-level storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n reco...
详细信息
ISBN:
(纸本)0898713293
We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multi-level storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n records on an n-processor hypercubic network. In that context, it is not known whether Sharesort is asymptotically optimal. Nonetheless, we find that Sharesort achieves optimal time bounds for parallel sorting in multi-level storage, under a variety of models that have been defined in the literature.
In this paper we present the Segment router, a novel router design for the interconnection networks of massively parallel computers. The design decisions of the Segment router are motivated by the need to improve the ...
详细信息
ISBN:
(纸本)9780897916714
In this paper we present the Segment router, a novel router design for the interconnection networks of massively parallel computers. The design decisions of the Segment router are motivated by the need to improve the network performance when the traffic consists of messages with widely different *** key novelty of the Segment router is that it provides different queueing policies for the two classes of messages it services. Short messages are stored in a centralized, dynamically allocated queue that guarantees storage availability for the whole packet of the short message. Long messages on the other hand are stored in small FIFO buffers associated with the input channels of the router. Thus when a long message becomes blocked, it is stored in segments in the FIFO buffers of a number of routing elements in the network. Furthermore, the physical channels of the Segment router are fairly multiplexed between the two supported classes of messages without the potential for *** simulations we compare the performance of our technique to the performance of multipacket messages and networks implementing lanes. The results clearly demonstrate the performance advantages of our technique.
暂无评论