The PRAM model of parallel computation is examined with respect to wordsize, the number of bits which can be held in each global memory cell. First, adversary arguments are used to show the incomparability of certain ...
详细信息
Andrews et al. [Automatic method for hiding latency in high bandwidth networks, in: proceedings of the acmsymposium on Theory of Computing, 1996, pp. 257-265;Improved methods for hiding latency in high bandwidth netw...
详细信息
Andrews et al. [Automatic method for hiding latency in high bandwidth networks, in: proceedings of the acmsymposium on Theory of Computing, 1996, pp. 257-265;Improved methods for hiding latency in high bandwidth networks, in: proceedings of the Eighth annualacmsymposium on parallelalgorithms and architectures, 1996, pp. 52-61] introduced a number of techniques for automatically hiding latency when performing simulations of networks with unit delay links on networks with arbitrary unequal delay links. In their work, they assume that processors of the host network are identical in computational power to those of the guest network being simulated. They further assume that the links of the host are able to pipeline messages, i.e., they are able to deliver P packets in time O(P + d) where d is the delay on the link. In this paper we examine the effect of eliminating one or both of these assumptions. In particular, we provide an efficient simulation of a linear array of homogeneous processors connected by unit-delay links on a linear array of heterogeneous processors connected by links with arbitrary delay. We show that the slowdown achieved by our simulation is optimal. We then consider the case of simulating cliques by cliques;i.e., a clique of heterogeneous processors with arbitrary delay links is used to simulate a clique of homogeneous processors with unit delay links. We reduce the slowdown from the obvious bound of the maximum delay link to the average of the link delays. In the case of the linear array we consider both links with and without pipelining. For the clique simulation the links are not assumed to support pipelining. The main motivation of our results (as was the case with Andrews et al.) is to mitigate the degradation of performance when executing parallel programs designed for different architectures on a network of workstations (NOW). In such a setting it is unlikely that the links provided by the NOW will support pipelining and it is quite probab
We give two optimal parallelalgorithms for constructing the arrangement of n lines in the plane. The first method is quite simple and runs in O(log2n) time using O(n2) work, and the second method, which is more sophi...
详细信息
We present Dreadlocks. an efficient new shared-memory spin lock that actively detects deadlocks. Instead of spinning on a Boolean value, each thread spins on the lock owner's per-thread digest, a compact represent...
详细信息
ISBN:
(纸本)9781595939739
We present Dreadlocks. an efficient new shared-memory spin lock that actively detects deadlocks. Instead of spinning on a Boolean value, each thread spins on the lock owner's per-thread digest, a compact representation of a portion of the lock's waits-for graph. Digests can be implemented either as bit vectors (for small numbers of threads) or as Bloom filters (for larger numbers of threads). Updates to digests are propagated dynamically as locks are acquired and released. Dreadlocks can be applied to any spin lock algorithm that allows threads to time out. Experimental results show that Dreadlocks outperform timeouts under many circumstances, and almost never do worse.
We give a randomized (Las-Vegas) parallel algorithm for computing strongly connected components of a graph with n vertices and m edges. The runtime is dominated by O(log(2) n) multi-source parallel reachability querie...
详细信息
ISBN:
(纸本)9781595939739
We give a randomized (Las-Vegas) parallel algorithm for computing strongly connected components of a graph with n vertices and m edges. The runtime is dominated by O(log(2) n) multi-source parallel reachability queries;i.e. O(log(2) n) calls to a subroutine that computes the union of the descendants of a given set of vertices in a given digraph. Our algorithm also topologically sorts the strongly connected components. Using Ullman and Yannakakis's [22] techniques for the reachability subroutine gives our algorithm runtime (O) over tilde (t) using mn/t(2) processors for any (n(2)/m)(1/3) <= t <= n. On sparse graphs, this improves the number of processors needed to compute strongly connected components and topological sort within time n(1/3) <= t <= n from the previously best known (n/t)(3) [20] to (n/t)(2).
In this paper, we address the question how efficiently a single constant-degree processor network can simulate the computation of any constant-degree processor network. We show the following lower bound trade-off: If ...
详细信息
In this paper, we address the question how efficiently a single constant-degree processor network can simulate the computation of any constant-degree processor network. We show the following lower bound trade-off: If M is an arbitrary constant-degree processor network of size m that can simulate all constant-degree processor networks of size n with slowdown s, then m·s = Ω(n log m). Our trade-off holds for a very general model of simulations. It covers all previously considered models and all known techniques for simulations among networks. For m ≥ n, this improves a previous lower bound by a factor of log log n, proved for a weaker simulation model. For m < n, this is the first non-trivial lower bound for this problem. In this case, this lower bound is asymptotically tight.
The proceedings contain 50 papers. The topics discussed include: graph expansion and communication costs of fast matrix multiplication;near linear-work parallel SDD solvers, low-diameter decomposition, and low-stretch...
ISBN:
(纸本)9781450307437
The proceedings contain 50 papers. The topics discussed include: graph expansion and communication costs of fast matrix multiplication;near linear-work parallel SDD solvers, low-diameter decomposition, and low-stretch subgraphs;linear-work greedy parallel approximate set cover and variants;optimizing hybrid transactional memory: the importance of nonspeculative operations;parallelism and data movement characterization of contemporary application classes;work-stealing for mixed-mode parallelism by deterministic team-building;full reversal routing as a linear dynamical system;reclaiming the energy of a schedule, models and algorithms;a tight runtime bound for synchronous gathering of autonomous robots with limited visibility;convergence of local communication chain strategies via linear transformations: or how to trade locality for speed;and convergence to equilibrium of logit dynamics for strategic games.
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads ...
详细信息
ISBN:
(纸本)9780897918909
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads with a global view. This paper presents, for the first time, the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. Previous results indicated that mesh repartitioning and data remapping are potential bottlenecks for performing large-scale scientific calculations. We resolve these issues and demonstrate that our framework remains viable on a large number of processors.
Let iT be a bipartite graph with bipartition (A, B) where \A\-n and every subset X of A with at most a n elements has at least b\X\ neighbors (o 1). We consider the problem of computing a matching from a given subset...
详细信息
The Program Dependence Graph (PDG), which represents the data and the control dependences of a program is considered. Attention is limited to the subgraph of the PDG which contains only the control dependence edges. T...
详细信息
ISBN:
(纸本)0897913701
The Program Dependence Graph (PDG), which represents the data and the control dependences of a program is considered. Attention is limited to the subgraph of the PDG which contains only the control dependence edges. This subgraph is called the Control Dependence Graph (CDG). The authors formalize what it means for a CDG to have a corresponding sequential version and characterize the conditions under which there is such a corresponding sequentialization not requiring duplication.
暂无评论