The original LogP model for parallel computation use four parameters;the communication latency (L), overhead (n), bandwidth (g), and the number of processors (P). This basic model is extended with a linear model for l...
详细信息
The original LogP model for parallel computation use four parameters;the communication latency (L), overhead (n), bandwidth (g), and the number of processors (P). This basic model is extended with a linear model for long messages. The resulting combination which is called the LogGP model for parallel computation has one additional parameter, G, which captures the bandwidth obtained for long messages. In this paper, the algorithm design and analysis under the new model are discussed, and the all-to-all remap, FFT, and radix sort are examined. In addition, the single node scatter problem is, examined. Finally, solutions are derived for the problem, and the their optimality under the LogGP model is proven.
The time required to compute any function of a collection of circular doubly linked lists on a CROW PRAM is shown to be at most a constant factor more than on a CREW PRAM, but this is not true for singly linked lists....
详细信息
ISBN:
(纸本)0897913701
The time required to compute any function of a collection of circular doubly linked lists on a CROW PRAM is shown to be at most a constant factor more than on a CREW PRAM, but this is not true for singly linked lists. A tight lower bound of Ω(log log n) for colouring an n node doubly linked list on a CROW PRAM using a constant number of colours is also obtained.
We present a parallel algorithm on a CRCW PRAM for three-dimensional pattern matching, i.e., finding all occurrences of a pattern of size m3 in a text of size n3. The text search of the algorithm is alphabet-independe...
详细信息
ISBN:
(纸本)9780897918909
We present a parallel algorithm on a CRCW PRAM for three-dimensional pattern matching, i.e., finding all occurrences of a pattern of size m3 in a text of size n3. The text search of the algorithm is alphabet-independent and runs in optimal constant time. The preprocessing computes a witness array for the pattern, and it runs in O(log m) time using m3 processors. The witness computation uses multilayer suffix trees and it works for any dimensions. Our result is based on a new characterization of three-dimensional periodicity. The parallel algorithm can be translated into a sequential algorithm for three-dimensional pattern matching whose text search is alphabet-independent and runs in O(n3) time and whose preprocessing runs in O(m3 log σ) time, where σ = min(|Σ|, m3) and Σ is the alphabet of symbols.
Two hardware barrier synchronization schemes are presented which can support deep levels of control nesting in data parallel programs. Hardware barriers are usually an order of magnitude faster than software implement...
详细信息
ISBN:
(纸本)9780897917179
Two hardware barrier synchronization schemes are presented which can support deep levels of control nesting in data parallel programs. Hardware barriers are usually an order of magnitude faster than software implementations. Since large data parallel programs often have several levels of nested barriers, these schemes provide significant speedups in the execution of such programs on MIMD computers. The first scheme performs code transformations and uses two single-bit-trees to implement unlimited levels of nested barriers. However, this scheme increases the code size. The second scheme uses a more expensive integer-tree to support an exponential number of nested barriers without increasing the code size. Using hardware already available on commercial MIMD computers, this scheme can support more than four billion levels of nesting.
Padded sorting requires n input keys to be output in sorted order in an array with slightly more than n locations, unused locations being filled with a special null value. We show that a deterministic CRCW PRAM with k...
详细信息
The proceedings contain 45 papers. The topics discussed include: buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures;scheduling to minimize power consumption using su...
ISBN:
(纸本)9781450300797
The proceedings contain 45 papers. The topics discussed include: buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures;scheduling to minimize power consumption using submodular functions;collaborative scoring with dishonest participants;securing every bit: authenticated broadcast in radio networks;brief announcement: on speculative replication of transactional systems;data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory;basic network creation games;on the bit communication complexity of randomized rumor spreading;algorithms and application for grids and clouds;towards optimizing energy costs of algorithms for shared memory architectures;brief announcement: on regenerator placement problems in optical networks;best-effort group service in dynamic networks;and implementing and evaluating nested parallel transactions in software transactional memory.
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconnection network and n/p local memory per processor, where n/p...
详细信息
We present a randomized parallel algorithm for constructing the 3D convex hull on a generic p-processor coarse grained multicomputer with arbitrary interconnection network and n/p local memory per processor, where n/p ≥ p2+Ε (for some arbitrarily small Ε > 0). For any given set of n points in 3-space, the algorithm computes the 3D convex hull, with high probability, in O(n log n÷p) local computation time and O(1) communication phases with at most O(n÷p) data sent/received by each processor. That is, with high probability, the algorithm computes the 3D convex hull of an arbitrary point set in time O(n log n÷p + Γn,p), where Γn,p denotes the time complexity of one communication phase. In the terminology of the BSP model, our algorithm requires, with high probability, O(1) supersteps and a synchronization period Θ(n log n÷p). In the LogP model, the execution time of our algorithm is asymptotically optimal for several architectures.
The proceedings contain 47 papers. The topics discussed include: Quancurrent: a concurrent quantiles sketch;an efficient scheduler for task-parallel interactive applications;efficient synchronization-light work steali...
ISBN:
(纸本)9781450395458
The proceedings contain 47 papers. The topics discussed include: Quancurrent: a concurrent quantiles sketch;an efficient scheduler for task-parallel interactive applications;efficient synchronization-light work stealing;balanced allocations in batches: the tower of two choices;massively parallel tree embeddings for high dimensional spaces;deterministic massively parallel symmetry breaking for sparse graphs;an associativity threshold phenomenon in set-associative caches;increment - and - freeze: every cache, everywhere, all of the time;multidimensional approximate agreement with asynchronous fallback;a tight characterization of fast failover routing: resiliency to two link failures is possible;releasing memory with optimistic access: a hybrid approach to memory reclamation and allocation in lock-free programs;transactional composition of nonblocking data structures;applying hazard pointers to more concurrent data structures;and nearly optimal parallelalgorithms for longest increasing subsequence.
Diffracting trees are an effective and highly scalable distributed-parallel technique for shared counting and load balancing. This paper presents the first steady-state combinatorial model and analysis for diffracting...
详细信息
Diffracting trees are an effective and highly scalable distributed-parallel technique for shared counting and load balancing. This paper presents the first steady-state combinatorial model and analysis for diffracting trees, and uses it to answer several critical algorithmic design questions. Our model is simple and sufficiently high level to overcome many implementation specific details, and yet as we will show it is rich enough to accurately predict empirically observed behaviors. As a result of our analysis we were able to identify starvation problems in the original diffracting tree algorithm and modify it to a create a more stable version. We are also able to identify the range in which the diffracting tree performs most efficiently, and the ranges in which its performance degrades. We believe our model and modeling approach open the way to steady-state analysis of other distributed-parallel structures such am counting networks and elimination trees.
The proceedings contain 47 papers. The topics discussed include: parallel minimum cuts in near-linear work and low depth;trees for vertex cuts, hypergraph cuts and minimum hypergraph bisection;dynamic representations ...
ISBN:
(纸本)9781450357999
The proceedings contain 47 papers. The topics discussed include: parallel minimum cuts in near-linear work and low depth;trees for vertex cuts, hypergraph cuts and minimum hypergraph bisection;dynamic representations of sparse distributed networks: a locality-sensitive approach;constant-depth and subcubic-size threshold circuits for matrix multiplication;integrated model, batch, and domain parallelism in training neural networks;brief announcement: on approximating pagerank locally with sublinear query complexity;brief announcement: coloring-based task mapping for dragonfly systems;brief announcement: parallel transitive closure within 3D crosspoint memory;and lock-free contention adapting search trees.
暂无评论