The goal of the μDatabase project is to understand the behaviour of data structures and their algorithms, both parallel and sequential, in a memory-mapped environment. Memory mapping allows primary-memory pointer-bas...
详细信息
The goal of the μDatabase project is to understand the behaviour of data structures and their algorithms, both parallel and sequential, in a memory-mapped environment. Memory mapping allows primary-memory pointer-based data structures to be stored on secondary storage, and subsequently traversed and modified, without transforming the embedded pointers. The project incorporates a collaboration between practitioners and theoreticians and has produced a toolkit to support a parallel memory-mapped environment, extensive concurrency tools, an analytical model for the system validated by experiments, and preliminary results involving parallel database join algorithms as well as sequential B-tree and R-tree data structures. Future work includes augmenting the toolkit, developing and testing more parallelalgorithms tuned for efficiency in a memory-mapped environment, and extending the current model to cover more general algorithms and capture more low-level details of the physical machine.
We describe a very simple deterministic parallel algorithm for linear programming in fixed dimension d that takes poly(log log n) time in the common CRCW PRAM model and does optimal O(n) work. Our algorithm is based o...
详细信息
ISBN:
(纸本)9780897918091
We describe a very simple deterministic parallel algorithm for linear programming in fixed dimension d that takes poly(log log n) time in the common CRCW PRAM model and does optimal O(n) work. Our algorithm is based on multidimensional search and an effective use of approximation algorithms to speed-up the basic search in the CRCW model. Our method also yields very fast poly(log log n) algorithm for smallest enclosing sphere and approximate ham-sandwich cuts as well as an O(log n) time work-optimal algorithm for exact ham-sandwich cuts of separable point sets. For all these problems, particularly for the fixed-dimensional linear programming, o(log n) time efficient deterministic PRAM algorithms were not known until very recently.
Consider the following NP-hard problems: Given a graph G, find the minimum 2-edge connected and 2-vertex connected subgraphs spanning all vertices of G. The past few years have produced exciting sequential algorithms ...
详细信息
ISBN:
(纸本)9780897917179
Consider the following NP-hard problems: Given a graph G, find the minimum 2-edge connected and 2-vertex connected subgraphs spanning all vertices of G. The past few years have produced exciting sequential algorithms for approximating such minimum subgraphs [6, 7]. The approximation factors are improved from 2 down to 5/4 and 3/2 respectively. Yet the techniques involved are all based on augmenting depth-first-search trees and no similar progress has been carried to the parallel context. This paper presents NC algorithms to achieve approximation factors of 3/2 + Ε and 7/4 + Ε respectively without computing depth-first-search trees.
This paper provides a new approach to labeling the connected components of an n × n image on a scan line array processor (comprised of n processing elements). Variations of this approach yield an algorithm guaran...
详细信息
ISBN:
(纸本)9780897917179
This paper provides a new approach to labeling the connected components of an n × n image on a scan line array processor (comprised of n processing elements). Variations of this approach yield an algorithm guaranteed to complete in o(n lg n) time as well as algorithms likely to approach O(n) time for all or most images. The best previous solutions require using a more complicated architecture or require Ω(n lg n) time. We also show that on a restricted version of the architecture, any algorithm requires Ω(n lg n) time in the worst case.
Data-parallelarchitectures must provide efficient support for complex control-flow constructs to support sophisticated applications coded in modern single-program multiple-data languages. As these architectures have ...
详细信息
ISBN:
(纸本)9781479969982
Data-parallelarchitectures must provide efficient support for complex control-flow constructs to support sophisticated applications coded in modern single-program multiple-data languages. As these architectures have wide datapaths that process a single instruction across parallel threads, a mechanism is needed to track and sequence threads as they traverse potentially divergent control paths through the program. The design space for divergence management ranges from software-only approaches where divergence is explicitly managed by the compiler, to hardware solutions where divergence is managed implicitly by the microarchitecture. In this paper, we explore this space and propose a new predication-based approach for handling control-flow structures in data-parallelarchitectures. Unlike prior predication algorithms, our new compiler analyses and hardware instructions consider the commonality of predication conditions across threads to improve efficiency. We prototype our algorithms in a production compiler and evaluate the tradeoffs between software and hardware divergence management on current GPU silicon. We show that our compiler algorithms make a predication-only architecture competitive in performance to one with hardware support for tracking divergence.
We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer's work...
详细信息
ISBN:
(纸本)9781450307437
We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer's workflow" and targeting XMT, a PRAM-inspired many-core architecture, we achieve significantly higher speed-ups than previous approaches. Comparison with the fastest known serial max-flow implementation on a modern CPU demonstrates for the first time potential for orders-of-magnitude performance improvement for Max-Flow. Using XMT, the PRAM Max-Flow algorithm is also much easier to program than for other parallel platforms, contributing a powerful example toward dual validation of both PRAM algorithmics and XMT.
The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challeng...
详细信息
ISBN:
(纸本)9798400704161
The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challenging, and open to a (nearly) work-efficient solution (i.e., the work is off by at most a polylogarithmic factor over the best sequential solution). In fact, sequential DP algorithms employ many advanced optimizations such as decision monotonicity or special data structures, and achieve better work than straightforward solutions. Many such optimizations are inherently sequential, which creates extra challenges for a parallel algorithm to achieve the same work bound. The goal of this paper is to achieve (nearly) work-efficient parallel DP algorithms by parallelizing classic, highly-optimized and practical sequential algorithms. We show a general framework called the Cordon Algorithm for parallel DP algorithms, and use it to solve several classic problems. Our selection of problems includes Longest Increasing Subsequence (LIS), sparse Longest Common Subsequence (LCS), convex/concave generalized Least Weight Subsequence (LWS), Optimal Alphabetic Tree (OAT), and more. We show how the Cordon Algorithm can be used to achieve the same level of optimization as the sequential algorithms, and achieve good parallelism. Many of our algorithms are conceptually simple, and we show some experimental results as proofs-of-concept.
This paper experimentally validates performance related issues for parallel computation models on several parallel platforms (a MasPar MP-1 with 1024 processors, a 64-node GCel and a CM-5 of 64 processors). Our work c...
详细信息
This paper experimentally validates performance related issues for parallel computation models on several parallel platforms (a MasPar MP-1 with 1024 processors, a 64-node GCel and a CM-5 of 64 processors). Our work consists of three parts. First, there is an evaluation part in which we investigate whether the models correctly predict the execution time of an algorithm implementation. Unlike previous work, which mostly demonstrated a close match between the measured and predicted running times, this paper shows that there are situations in which the models do not precisely predict the actual execution time of an algorithm implementation. Second, there is a comparison part in which the models are contrasted with each other in order to determine which model induces the fastest algorithms. Finally, there is an efficiency validation part in which the performance of the model derived algorithms are compared with the performance of highly optimized library routines to show the effectiveness of deriving fast algorithms through the formalisms of the models.
There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorith...
详细信息
ISBN:
(纸本)9781595939739
There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorithm, called Directed Transmission Method (DTM). DTM is a fully asynchronous, scalable and continuous-time iterative algorithm to solve the arbitrarily-large sparse linear system whose coefficient matrix is symmetric-positive-definite (SPD). DTM is able to be freely running on the heterogeneous parallel computer with arbitrary number of processors, which might be manycore microprocessors, clusters, grids, clouds, and the Internet. We proved that DTM is convergent by making use of the final value theorem of Laplacian Transformation. Numerical experiments show that DTM is efficient.
暂无评论