In PRAM emulations, universal hashing is a well-known method for distributing the address space among memory modules. However, if the memory access patterns of an application often result in high module congestion, it...
详细信息
ISBN:
(纸本)081864222X
In PRAM emulations, universal hashing is a well-known method for distributing the address space among memory modules. However, if the memory access patterns of an application often result in high module congestion, it is necessary to rehash by choosing another hash function and redistributing data on the fly. For the case of linear hash functions h(x) = ax mod m we present an algorithm to rehash an address space of size m on a p processor PRAM emulation in time O(m/p+log p). the algorithm requires O(log m) words of local storage per processor.
Highways is a distributed-programming system, we are building, with high-performance as a major goal. the suite of send primitives implemented in Highways, called Global-Flush Primitives, have three notable aspects. (...
详细信息
ISBN:
(纸本)081864222X
Highways is a distributed-programming system, we are building, with high-performance as a major goal. the suite of send primitives implemented in Highways, called Global-Flush Primitives, have three notable aspects. (1) Global-Flush Primitives permit making an assertion about messages sent in the past of sending m, in the future of sending m, about both, or neither. (2) the past and the future of an event is defined using the relation `happened before.' (3) A message can be sent to any subgroup of processes specified as a parameter.
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallel applications on nonuniform and adaptive environments. the approach presented is reasonably general and is app...
详细信息
ISBN:
(纸本)0818675829
In this paper we discuss the runtime support required for the parallelization of unstructured data-parallel applications on nonuniform and adaptive environments. the approach presented is reasonably general and is applicable to a wide variety of regular as well as irregular applications. We present performance results for the solution of an unstructured mesh on a cluster of heterogeneous workstations.
Multithreading is often seen as a solution to the problem of large memory latencies that occur when remote data is needed for local computation. this paper quantifies the costs and benefits of software multithreading ...
详细信息
ISBN:
(纸本)081864222X
Multithreading is often seen as a solution to the problem of large memory latencies that occur when remote data is needed for local computation. this paper quantifies the costs and benefits of software multithreading on a distributed memory multiprocessor. We describe the design of a machine-independent software multithreading system as part of a runtime system for a high-level parallel programming language, and present a quantitative analysis of the costs of our multithreading system, as well as its performance on the nCUBE/2 multiprocessor. We show that, in the presence of a sufficient number of remote references to cover the initial costs, our multithreading system provides speedup factors of between 1.27 and 1.65.
Lattice basis reduction has important applications in the areas of computer algebra, cryptography and combinatorial optimization. Several efficient sequential algorithms are known. Recently, parallel algorithms have b...
详细信息
ISBN:
(纸本)081864222X
Lattice basis reduction has important applications in the areas of computer algebra, cryptography and combinatorial optimization. Several efficient sequential algorithms are known. Recently, parallel algorithms have been developed but until now a formal proof for the efficiency of parallel algorithms with n2 processors has been omitted, where n denotes the dimension of the lattice. In this paper, a variant of the well-known basis reduction algorithms is presented that is well suited for the computation with fast floating point arithmetic and for the implementation on a mesh-connected array of n2 processors. In addition, an error analysis and a proof of the parallel efficiency is provided.
this paper presents experimental evidence that the ability to exploit a small degree of control parallelism can provide a significant improvement in performance for SIMD machines. this evidence has been obtained throu...
详细信息
ISBN:
(纸本)081864222X
this paper presents experimental evidence that the ability to exploit a small degree of control parallelism can provide a significant improvement in performance for SIMD machines. this evidence has been obtained through analysis of SIMD instruction traces gathered from a MasPar MP-1 system. Potential speedups of up to 2.5 times conventional SIMD were found. this paper also discusses details of such an architecture, called superscalar SIMD, and of the trace analysis process. We also consider some synchronization and communication issues that may arise.
In this paper, we present algorithms for solving several basic geometric problems of size n in a network of p processors each with O(n/p) local memory. Our algorithms achieve the best possible (up to a constant factor...
详细信息
ISBN:
(纸本)081864222X
In this paper, we present algorithms for solving several basic geometric problems of size n in a network of p processors each with O(n/p) local memory. Our algorithms achieve the best possible (up to a constant factor) time bound in hypercubic parallel architectures such as hypercube, shuffle exchange and cube connected cycles, provided that n≥p1+Ε for some positive constant Ε. Our algorithms use only sorting and parallel prefix that involve interprocessor communications, and can be easily implemented in commercially available parallel computers. Experimentation on nCUBE parallel computers shows that our algorithms run efficiently.
We present a parallel algorithm for the Euclidean distance transformation (EDT). It is a `divide-and-conquer' algorithm based on a fast sequential algorithm for the Signed EDT (SEDT). the combining step that follo...
详细信息
ISBN:
(纸本)081864222X
We present a parallel algorithm for the Euclidean distance transformation (EDT). It is a `divide-and-conquer' algorithm based on a fast sequential algorithm for the Signed EDT (SEDT). the combining step that follows the local partial calculation of the SEDT can be done efficiently after reformulating the SEDT problem as the partial calculation of a Voronoi Diagram. this leads to an algorithm with two local calculation steps whose computational complexity is proportional to the number of pixels of the subregions and a global calculation step with complexity proportional to the image perimeter. this article contains a description of the algorithm, a complexity analysis, a discussion on load balance and timings obtained on an iPSC/2.
Connectivity of a network dictates the routability and survivability of a network. this paper is to investigate the edge connectivity problems in distributed environments. For a given network or graph G, one can distr...
详细信息
ISBN:
(纸本)081864222X
Connectivity of a network dictates the routability and survivability of a network. this paper is to investigate the edge connectivity problems in distributed environments. For a given network or graph G, one can distributively find bridges in the network first then find the 2-edge connected components. Both algorithms proposed in this paper, bridge finding and 2-edge connected component identifying, require only O(m) message complexity, where m is the number of edges in G. Besides, this paper also shows an efficient distributed 2-edge cutset algorithm that has O(n2) message complexity, where n is the number of nodes in G.
Assignment of tasks of a parallel program onto processors of a distributed-memory system is critical to obtain minimal program completion time by minimizing communication overhead. this paper presents task assignment ...
详细信息
ISBN:
(纸本)081864222X
Assignment of tasks of a parallel program onto processors of a distributed-memory system is critical to obtain minimal program completion time by minimizing communication overhead. this paper presents task assignment heuristics for wormhole-routed systems. A Temporal Communication Graph is used to model task graphs and to identify spatial and temporal link contention. the interplay between degree of routing adaptivity, topology, application characteristics, and task assignment is studied by evaluating random task graphs using an event-driven simulator. the study indicates that even for systems supporting fully-adaptive routing, efficient task assignment is necessary to reduce program completion time especially for communication-bound applications.
暂无评论