In this paper we describe a technique for finding efficient parallelalgorithms for problems on directed graphs that involve checking the ezistence of certain kinds of paths in the graph. This technique provides effic...
详细信息
Emerging applications in multi-media and the Human Genome Project require storage and searching of large databases of strings - a task for which parallelism seems the only hope. In this paper, we consider the parallel...
详细信息
Emerging applications in multi-media and the Human Genome Project require storage and searching of large databases of strings - a task for which parallelism seems the only hope. In this paper, we consider the parallelism in some of the fundamental problems in compressing strings and in matching large dictionaries of patterns against texts. We present the first work-optimal algorithms for these well-studied problems including the classical dictionary matching problem, optimal compression with a static dictionary and the universal data compression with dynamic dictionary of Lempel and Ziv. All our algorithms are randomized and they are of the Las Vegas type. Furthermore, they are fast, working in time logarithmic in the input size. Additionally, our algorithms seem suitable for a distributed implementation.
The proceedings contains 40 papers from the conference on SPAA 2004 - Sixteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: On delivery times in packet networksun...
详细信息
The proceedings contains 40 papers from the conference on SPAA 2004 - Sixteenth annualacmsymposium on parallelism in algorithms and architectures. The topics discussed include: On delivery times in packet networksunder adversarial traffic;balanced graph partitioning;online hierarchical cooperative caching;scheduling against an adversarial network;effectively sharing a cache among threads;online algorithms for network design and dynamic analysis of the arrow distributed protocol.
A new approach to parallel sorting called parallel Sorting by OverPartitioning (PSOP) is presented. The approach limits the communication cost by moving each element between processors at most once, and leads to good ...
详细信息
We present parallelalgorithms for several graph and geometric problems, including transitive closure and topological sorting in planar st-graphs, preprocessing planar subdivisions for point location queries, and cons...
详细信息
We implemented and measured several methods to perform BMMC permutations on the MasPar MP-2. Our results indicate that, except for certain types of permutations or very high virtual processor ratios, the best method o...
详细信息
We implemented and measured several methods to perform BMMC permutations on the MasPar MP-2. Our results indicate that, except for certain types of permutations or very high virtual processor ratios, the best method overall is the naive method but with virtual-processor numbers computed in Gray-code order. For some permutations, however, the naive method performs very poorly;the best method in these cases is an adaptation of the block BMMC algorithm for parallel disk systems in which the processor elements are treated as independent devices.
In this paper, we study parallelalgorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that are scalable with the number of cores. By focusing on private-cache CMPs, w...
详细信息
ISBN:
(纸本)9781595939739
In this paper, we study parallelalgorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that are scalable with the number of cores. By focusing on private-cache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallelalgorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs.
The methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks were described. Most of the analysis were centered on the simulation of unit-delay rings on netwo...
详细信息
The methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks were described. Most of the analysis were centered on the simulation of unit-delay rings on networks of workstations (NOWs) with arbitrary delays on the links. Emulations were also derived for the wide variety of other unit-delay network architectures on a NOW with high-latency links. The lower bounds that established limits on the degree to which the high latency links were proven, can be mitigated. These bounds demonstrates that overcoming latencies in dataflow types of computations that require access to large local databases is easier.
As the gap between the cost of communication (i.e., data movement) and computation continues to grow, the importance of pursuing algorithms which minimize communication also increases. Toward this end, we seek asympto...
详细信息
ISBN:
(纸本)9781450307437
As the gap between the cost of communication (i.e., data movement) and computation continues to grow, the importance of pursuing algorithms which minimize communication also increases. Toward this end, we seek asymptotic communication lower bounds for general memory models and classes of algorithms. Recent work [2] has established lower bounds for a wide set of linear algebra algorithms on a sequential machine and on a parallel machine with identical processors. This work extends these previous bounds to a heterogeneous model in which processors access data and perform floating point operations at differing speeds. We also present an algorithm for dense matrix multiplication which attains the lower bound.
暂无评论