The proceedings contain 38 papers. The topics discussed include: efficient compilation of high-level data parallelalgorithms;improved abstract parity-declustered layouts for disk arrays;experiences with parallel n-bo...
ISBN:
(纸本)0897916719
The proceedings contain 38 papers. The topics discussed include: efficient compilation of high-level data parallelalgorithms;improved abstract parity-declustered layouts for disk arrays;experiences with parallel n-body simulation;a comparison of parallelalgorithms for connected components;studying overheads in massively parallel min/max-tree evaluation;scheduling trees using FIFO queues: a control-memory tradeoff;SIMD instruction cache;dynamic parallel tree contraction;improved bounds for routing and sorting on multi-dimensional meshes;parallel sorting by overpartitioning;an optimal randomized logarithmic time connectivity algorithm for the EREW PRAM;list ranking and list scan on the CRAY C-90;modeling communication in parallelalgorithms: a fruitful interaction between theory and systems?;on testing cache-coherent shared memories;diffracting trees;programming abstract DEC-alpha based multiprocessors the easy way;scheduling parallelizable tasks to minimize average response time;and an analysis of diffusive load-balancing.
A new approach to parallel sorting called parallel Sorting by OverPartitioning (PSOP) is presented. The approach limits the communication cost by moving each element between processors at most once, and leads to good ...
详细信息
In this paper we present the Segment router, a novel router design for the interconnection networks of massively parallel computers. The design decisions of the Segment router are motivated by the need to improve the ...
详细信息
ISBN:
(纸本)9780897916714
In this paper we present the Segment router, a novel router design for the interconnection networks of massively parallel computers. The design decisions of the Segment router are motivated by the need to improve the network performance when the traffic consists of messages with widely different *** key novelty of the Segment router is that it provides different queueing policies for the two classes of messages it services. Short messages are stored in a centralized, dynamically allocated queue that guarantees storage availability for the whole packet of the short message. Long messages on the other hand are stored in small FIFO buffers associated with the input channels of the router. Thus when a long message becomes blocked, it is stored in segments in the FIFO buffers of a number of routing elements in the network. Furthermore, the physical channels of the Segment router are fairly multiplexed between the two supported classes of messages without the potential for *** simulations we compare the performance of our technique to the performance of multipacket messages and networks implementing lanes. The results clearly demonstrate the performance advantages of our technique.
This paper presents a comparison of the pragmatic aspects of some parallelalgorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar al...
详细信息
We present a high-level parallel calculus for nested sequences, NSC, offered as a possible theoretical “core” of an entire class of collection-oriented parallel languages. NSC is based on while-loops as opposed to g...
详细信息
ISBN:
(纸本)9780897916714
We present a high-level parallel calculus for nested sequences, NSC, offered as a possible theoretical “core” of an entire class of collection-oriented parallel languages. NSC is based on while-loops as opposed to general recursion. A formal, machine independent definition of the parallel time complexity and the work complexity of programs in NSC is given. Our main results are: (1) We give a translation method for a particular form of recursion, called map-recursion, into NSC, that preserves the time complexity and adds an arbitrarily small overhead to the work complexity, and (2) We give a compilation method for NSC into a very simple vector parallel machine, which preserves the time complexity and again adds an arbitrarily small overhead to the work complexity.
Recently, several theoretical models of parallelarchitectures have been proposed to replace the PRAM as the model that is presented to an algorithm designer. A primary focus of the new models is to include the cost o...
详细信息
ISBN:
(纸本)9780897916714
Recently, several theoretical models of parallelarchitectures have been proposed to replace the PRAM as the model that is presented to an algorithm designer. A primary focus of the new models is to include the cost of interprocessor communication, which is increasingly important in modern parallelarchitectures. We argue that modeling the communication costs in the architecture or system is only one part of the problem. The other, and usually much more difficult, part is modeling the communication properties of the algorithm itself, which provides necessary inputs into the architectural model to determine overall complexity. In this context, we make three main points in this paper: (i) It is incomplete to describe communication without regard to its relationship with replication. We propose a description of the communication-replication relationship in terms of the working set hierarchy of an algorithm. (ii) Both inherent communication and the communication-replication relationship can be very difficult to model in irregular, dynamic computations that are crucial in many real-world applications. We present some examples that demonstrate this difficulty. (iii) We believe that substantial leverage can be obtained in this effort from the computer systems community, which can provide a hierarchy of simulation and profiling tools—from abstract to detailed—tailored to the needs of the algorithm designers. We propose an initial set of simulation tools, and we discuss possible future refinements to this set.
暂无评论