We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously. Given a batch of edge deletion or insertion updates to the graph, our...
详细信息
ISBN:
(纸本)9798400704161
We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously. Given a batch of edge deletion or insertion updates to the graph, our parallel algorithm adjusts the maximal matching to these updates in poly(log n) depth and using poly(log n) amortized work per update. that is, the amortized work for processing a batch of k updates is k poly(log n), while all this work is done in poly(log n) depth, with high probability. this can be seen as a parallel counterpart of the sequential dynamic algorithms for constant-approximate and maximal matching [Onak and Rubinfeld STOC'10;Baswana, Gupta, and Sen FOCS'11;and Solomon FOCS'16]. Our algorithm readily generalizes to maximal matching in hypergraphs of rank r-where each hyperedge has at most r endpoints-with a poly(r) increase in work, while retaining the poly(log n) depth.
We consider the problem of sorting a file of N records on the D-disk model of parallel I/O [VS94] in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B con...
详细信息
ISBN:
(纸本)9780897918091
We consider the problem of sorting a file of N records on the D-disk model of parallel I/O [VS94] in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D disks in parallel. We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks. SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the forecasting technique of [Knu73], our algorithm is able to read in, at any time, the `right' block from any disk, and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the `right' blocks from memory to make space for new ones to be read in. the disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamental inefficiencies of previous mergesort algorithms. Our analysis technique involves a novel reduction to various maximum occupancy problems. We prove that the expected I/O performance of SRM is efficient under varying sizes of memory and that it compares favorably in practice to disk-striped mergesort (DSM). Our studies indicate that SRM outperforms DSM even when the number D of parallel disks is fairly small.
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency o...
详细信息
ISBN:
(纸本)9798400704161
We introduce PASGAL (parallel And Scalable Graph Algorithm Library), a parallel graph library that scales to a variety of graph types, many processors, and large graphs. One special focus of PASGAL is the efficiency on large-diameter graphs, which is a common challenge for many existing parallel graph processing systems due to the high overhead in synchronizing threads when traversing the graph in the breadth-first order. the core idea in PASGAL is a technique called vertical granularity control (VGC) to hide synchronization overhead by careful algorithm redesign and new data structures. We compare PASGAL with existing parallel implementations on several fundamental graph problems. PASGAL is always competitive on small-diameter graphs, and is significantly faster on large-diameter graphs.
We introduce Autonomous SIMD (ASIMD) massively parallel architecture, then look at the flexibility, cost, and effectiveness of MIMD and ASIMD parallel systems. We show that ASIMD systems such as the MasPar MP-1 and MP...
详细信息
We consider the problem of sorting a file of N records on the D-disk model of parallel I/O in which there are two sources of parallelism, Records are transferred to and from disk concurrently in blocks of B contiguous...
详细信息
We consider the problem of sorting a file of N records on the D-disk model of parallel I/O in which there are two sources of parallelism, Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D-disks in parallel, We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks, SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the technique of forecasting, our algorithm is able to read in, at any time, the 'right' block from any disk and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the 'right' blocks from memory to make space for new ones to be read in. the disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamental inefficiencies of previous mergesort algorithms, By analysis of generalized maximum occupancy problems we are able to derive an analytical upper bound on SRM's expected overhead valid for arbitrary inputs, the upper bound derived on expected I/O performance of SRM indicates that SRM is provably better than disk-striped mergesort (DSM) for realistic parameter values D, M and B. Average-case simulations show further improvement on the analytical upper bound. Unlike previously proposed optimal sorting algorithms, SRM outperforms DSM even when the number D of parallel disks is small.
Modern processors have many levels of parallelism arising from multiple functional units and pipeline stages. In this paper, we consider the interplay between instruction scheduling performed by a compiler and instruc...
详细信息
ISBN:
(纸本)9780897918091
Modern processors have many levels of parallelism arising from multiple functional units and pipeline stages. In this paper, we consider the interplay between instruction scheduling performed by a compiler and instruction lookahead performed by hardware. Anticipatory instruction scheduling is the process of rearranging instructions within each basic block so as to minimize the overall completion time of a set of basic blocks in the presence of hardware instruction lookahead, while preserving safety by not moving any instructions beyond basic block boundaries. Anticipatory instruction scheduling delivers many of the benefits of global instruction scheduling by accounting for instruction overlap across basic block boundaries arising from hardware lookahead, without compromising safety (as in some speculative scheduling techniques) or serviceability of the compiled program. We present the first probably optimal algorithm for a special case of anticipatory instruction scheduling for a trace of basic blocks on a machine with arbitrary size lookahead windows. We extend this result for the version of the problem in which a trace of basic blocks is contained within a loop. In addition, we discuss how to modify these special-case optimal algorithms to obtain heuristics for the more general (but NP-hard) problems that occur in practice.
A semi-dynamic system is presented that is capable of predicting the performance of parallel programs at runtime. the functionality given by the system allows for efficient handling of portability and irregularity of ...
详细信息
ISBN:
(纸本)0769525091
A semi-dynamic system is presented that is capable of predicting the performance of parallel programs at runtime. the functionality given by the system allows for efficient handling of portability and irregularity of parallel programs. Two forms of parallelism are addressed: loop level parallelism and task level parallelism.
the proceedings contains 48 papers from thirteen annualacmsymposium on parallelalgorithms and architectures. Topics discussed include: compact routing schemes;simple on-line algorithms for the maximum disjoint path...
详细信息
the proceedings contains 48 papers from thirteen annualacmsymposium on parallelalgorithms and architectures. Topics discussed include: compact routing schemes;simple on-line algorithms for the maximum disjoint paths problem;competitive buffer management for shared-memory switches;attack propagation in networks;computational power of pipelined memory hierarchies;and a data tracking scheme for general networks.
We present parallelalgorithmsthat maintain a 2-3 tree under insertions and deletions. the algorithms are designed for an extension of Valiant's BSP model, BSP*, that and reduction of the overhead involved in com...
详细信息
ISBN:
(纸本)9780897918091
We present parallelalgorithmsthat maintain a 2-3 tree under insertions and deletions. the algorithms are designed for an extension of Valiant's BSP model, BSP*, that and reduction of the overhead involved in communication. the BSP*-model is introduced by Baumker et al. in [2]. Our analysis of the data structure goes beyond standard asymptotic analysis: We use Valiant's notion of c-optimality. Intuitively c-optimal algorithms tend to speedup p/c with growing input size (p denotes the number of processors), where the communication time is asymptotically smaller than the computation time. Our first approach allows 1-optimal searching and amortized c-optimal insertion and deletion for a small constant c. the second one allows 2-optimal searching, and c-optimal deletion and insertion for a small constant c. Both results hold with probability 1-o(1) for wide ranges of BSP*-parameters, where the ranges become larger with growing input sizes. the first approach allows much larger ranges. Further, both approaches are memory efficient, their total amount of memory used is proportional to the size m of the set being stored. Our results improve previous results by supporting a fully dynamic search tree rather than a static one, and by significantly reducing the communication time. Further our algorithms use blockwise communication.
暂无评论