We present a parallel algorithm on a CRCW PRAM for three-dimensional pattern matching, i.e., finding all occurrences of a pattern of size m3 in a text of size n3. The text search of the algorithm is alphabet-independe...
详细信息
ISBN:
(纸本)9780897918909
We present a parallel algorithm on a CRCW PRAM for three-dimensional pattern matching, i.e., finding all occurrences of a pattern of size m3 in a text of size n3. The text search of the algorithm is alphabet-independent and runs in optimal constant time. The preprocessing computes a witness array for the pattern, and it runs in O(log m) time using m3 processors. The witness computation uses multilayer suffix trees and it works for any dimensions. Our result is based on a new characterization of three-dimensional periodicity. The parallel algorithm can be translated into a sequential algorithm for three-dimensional pattern matching whose text search is alphabet-independent and runs in O(n3) time and whose preprocessing runs in O(m3 log σ) time, where σ = min(|Σ|, m3) and Σ is the alphabet of symbols.
We study integrated prefetching and caching in single and parallel disk systems. There exist two very popular approximation algorithms called Aggressive and Conservative for minimizing the total elapsed time in the si...
详细信息
ISBN:
(纸本)9781581136616
We study integrated prefetching and caching in single and parallel disk systems. There exist two very popular approximation algorithms called Aggressive and Conservative for minimizing the total elapsed time in the single disk problem. For D parallel disks, approximation algorithms are known for both the elapsed time and stall time performance measures. In particular, there exists a D-approximation algorithm for the stall time measure that uses D-1 additional memory locations in cache. In the first part of the paper we investigate approximation algorithms for the single disk problem. We give a refined analysis of the Aggressive algorithm, showing that the original analysis was too pessimistic. We prove that our new bound is tight. Additionally we present a new family of prefetching and caching strategies and give algorithms that perform better than Aggressive and Conservative. In the second part of the paper we investigate the problem of minimizing stall time in parallel disk systems. We present a polynomial time algorithm for computing a prefetching/caching schedule whose stall time is bounded by that of an optimal solution. The schedule uses at most 3(D - 1) extra memory locations in cache. This is the first polynomial time algorithm for computing schedules with a minimum stall time. Our algorithm is based on the linear programming approach of [1]. However, in order to achieve minimum stall times, we introduce the new concept of synchronized schedules in which fetches on the D disks are performed completely in parallel.
As parallel machines scale to one million nodes and beyond, it becomes increasingly difficult to build a reliable network that is able to guarantee packet delivery. Eventually large systems will need to employ fault-t...
详细信息
ISBN:
(纸本)9781581135299
As parallel machines scale to one million nodes and beyond, it becomes increasingly difficult to build a reliable network that is able to guarantee packet delivery. Eventually large systems will need to employ fault-tolerant messaging protocols that afford correct execution in the presence of a lossy network. In this paper we present a lightweight protocol that preserves message idempotence and is easy to implement in hardware. We identify the requirements for a correct implementation of the protocol. Experiments are performed in simulation to determine implementation parameters that optimize performance. We find that an aggressive implementation on a fat tree network results in a slowdown of less than 2x compared to buffered wormhole routing on a fault-free network.
We present a parallel solution to the problem of determining the triconnected components of an undirected graph. We obtain significant speedups over the only published optimal (linear-time) serial implementation of a ...
详细信息
ISBN:
(纸本)9781450312134
We present a parallel solution to the problem of determining the triconnected components of an undirected graph. We obtain significant speedups over the only published optimal (linear-time) serial implementation of a triconnected components algorithm running on a modern CPU. This is accomplished on the PRAM-inspired XMT many-core architecture. To our knowledge, no other parallel implementation of a triconnected components algorithm has been published for any platform. Copyright is held by the author/owner(s).
Two hardware barrier synchronization schemes are presented which can support deep levels of control nesting in data parallel programs. Hardware barriers are usually an order of magnitude faster than software implement...
详细信息
ISBN:
(纸本)9780897917179
Two hardware barrier synchronization schemes are presented which can support deep levels of control nesting in data parallel programs. Hardware barriers are usually an order of magnitude faster than software implementations. Since large data parallel programs often have several levels of nested barriers, these schemes provide significant speedups in the execution of such programs on MIMD computers. The first scheme performs code transformations and uses two single-bit-trees to implement unlimited levels of nested barriers. However, this scheme increases the code size. The second scheme uses a more expensive integer-tree to support an exponential number of nested barriers without increasing the code size. Using hardware already available on commercial MIMD computers, this scheme can support more than four billion levels of nesting.
A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in parallel. The user model of a subset ...
详细信息
ISBN:
(纸本)089791483X
A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in parallel. The user model of a subset barrier is straight forward;a processor that participates in a subset barrier needs to know only the name of the barrier and the number of participating processors. This paper identifies two general communication models for private-memory parallel systems: the bounded buffer broadcast model and the anonymous destination message passing model and presents algorithms for barrier synchronization in the terms of these models. The models are detailed enough to allow meaningful cost estimates for their primitives, yet independent of a specific architecture and can be supported efficiently by a modern private memory parallel system. The anonymous destination message passing model is the most attractive. The time complexity to synchronize over a uni-directional ring of N processors is O(log N) for common cases, and O(√N) in the worst case. The algorithms have been implemented on iWarp, a private-memory parallel system and are now in daily use. The paper concludes with timing measurements obtained on a 64-node system.
We describe an efficient parallel implementation of Goldberg's maximum flow algorithm for a shared-memory multiprocessor. Our main technical innovation is a method that allows a 'global relabeling' heurist...
详细信息
ISBN:
(纸本)089791483X
We describe an efficient parallel implementation of Goldberg's maximum flow algorithm for a shared-memory multiprocessor. Our main technical innovation is a method that allows a 'global relabeling' heuristic to be executed concurrently with the main algorithm. This heuristic is essential for good performance in practice. We present performance results from a Sequent Symmetry for a variety of input distributions. We achieve speed-ups of up to 8.8 with 16 processors, relative to the parallel program with 1 processor (5.8 when compared to our best sequential program). We consider these speed-ups very good and we provide evidence that hardware effects and insufficient parallelism in certain inputs are the main obstacles to achieving better performance.
We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obta...
详细信息
ISBN:
(纸本)9781450328210
We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel framework of Peng and Spielman for solving symmetric diagonally dominant linear systems, we get a parallel solver which is much closer to being practical and significantly more efficient in terms of the total work.
The PRAM model of parallel computation is examined with respect to wordsize, the number of bits which can be held in each global memory cell. First, adversary arguments are used to show the incomparability of certain ...
详细信息
Diffracting trees are an effective and highly scalable distributed-parallel technique for shared counting and load balancing. This paper presents the first steady-state combinatorial model and analysis for diffracting...
详细信息
Diffracting trees are an effective and highly scalable distributed-parallel technique for shared counting and load balancing. This paper presents the first steady-state combinatorial model and analysis for diffracting trees, and uses it to answer several critical algorithmic design questions. Our model is simple and sufficiently high level to overcome many implementation specific details, and yet as we will show it is rich enough to accurately predict empirically observed behaviors. As a result of our analysis we were able to identify starvation problems in the original diffracting tree algorithm and modify it to a create a more stable version. We are also able to identify the range in which the diffracting tree performs most efficiently, and the ranges in which its performance degrades. We believe our model and modeling approach open the way to steady-state analysis of other distributed-parallel structures such am counting networks and elimination trees.
暂无评论