In this paper we present a simple parallel sorting algorithm and illustrate two applications. the algorithm (called the (l, m)-merge sort (LMM)) is an extension of the bitonic and odd-even merge sorts. Literature on p...
详细信息
In this paper we present a simple parallel sorting algorithm and illustrate two applications. the algorithm (called the (l, m)-merge sort (LMM)) is an extension of the bitonic and odd-even merge sorts. Literature on parallel sorting is abundant. Many of the algorithms proposed, though being theoretically important, may not perform satisfactorily in practice owing to large constants in their time bounds. the algorithm to be presented in this paper, due partly to its simplicity, results in small constants. We present an implementation for the parallel disk sorting problem. the algorithm is asymptotically optimal (assuming that N is a polynomial in M, where N is the number of records to be sorted and M is the internal memory size). the underlying constant is very small. this algorithm has a better performance than the disk-striped mergesort (DSM) algorithm when the number of disks is large. Our implementation is as simple as that of DSM (requiring no fancy data structures or prefetch techniques). As a second application, we prove that we can get a sparse enumeration sort on the hypercube that is simpler than that of the classical algorithm of Nassimi and Sahni. We also show that Leighton's columnsort algorithm is a special case of LMM.
In this paper, we present a new type of graph decomposition called a cut-cover that combines the notions of graph separators and t-neighborhood covers. We show that graphs with good cut-covers can be emulated in hyper...
详细信息
the proceedings contain 37 papers. the topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under unc...
详细信息
ISBN:
(纸本)159593667X
the proceedings contain 37 papers. the topics discussed include: on triangulation of simple networks;strong-diameter decompositions of minor free graphs;approximation algorithms for multiprocessor scheduling under uncertainty;scheduling DAGs on asynchronous processors;scheduling to minimize gaps and power consumption;cache-oblivious streaming B-trees;an experimental comparison of cache-oblivious and cache-conscious programs;scheduling threads for constructive cache sharing on CMPs;proximity-aware directory-based coherence for multi-core processor architectures;a parallel dynamic programming algorithm on a multi-core architecture;tight bounds for distributed selection;local MST computation with short advice;distributed approximation of capacitated dominating sets;packing to angles and sectors;and the notion of a timed register and its application to indulgent synchronization.
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. there have been many parallel dynamic programming algorithms. the purpose of this paper is to study a family of dyn...
详细信息
ISBN:
(纸本)9781595936677
Dynamic programming is an efficient technique to solve combinatorial search and optimization problem. there have been many parallel dynamic programming algorithms. the purpose of this paper is to study a family of dynamic programming algorithm where data dependence appear between non-consecutive stages, in other words, the data dependence is non-uniform. this kind of dynamic programming is typically called nonserial polyadic dynamic programming. Owing to the: non-uniform data dependence;it is harder to optimize this problem for parallelism and locality on parallelarchitectures. In this paper, we address the chanllenge of exploiting fine grain parallelism and locality of nonserial polyadic dynamic programming on a multi-core architecture. We present a programming and execution model for multi-core architectures with memory hierarchy. In the framework of the new model, the parallelism and locality benifit from a data dependence transformation. We propose a parallel pipelined algorithm for filling the dynamic programming matrix by decomposing the computation operators. the new parallel algorithm tolerates the memory access latency using multi-thread and is easily improved withthe technique. We formulate and analytically solve the optimization problem determing the the size that minimizes the total execution time. the experiments on a simulator give a validation of the proposed model and show that the fine grain parallel algorithm achieves sub-linear speedup and that a potential high scalability on multi-core arichitecture.
We describe an efficient parallel algorithm to solve the single function coarsest partition problem. the algorithm runs in O(logn) time using 0(n log log n) operations on the Arbitrary CRCW PRAM. the previous best kno...
详细信息
Multi-core architectures are commonly used for network applications because the workload is highly parallelizable. Packet scheduling is a critical performance component of these applications and significantly impacts ...
详细信息
the proceedings contain 39 papers. the topics discussed include: Fast greedy algorithms in MapReduce and streaming;reduced hardware transactions: a new approach to hybrid transactional memory;recursive design of hardw...
the proceedings contain 39 papers. the topics discussed include: Fast greedy algorithms in MapReduce and streaming;reduced hardware transactions: a new approach to hybrid transactional memory;recursive design of hardware priority queues;drop the anchor: lightweight memory management for non-blocking data structures;scalable statistics counters;storage and search in dynamic peer-to-peer networks;expected sum and maximum of displacement of random sensors for coverage of a domain;on dynamics in selfish network creation;brief announcement: truly parallel burrows-wheeler compression and decompression;brief announcement: locality in wireless scheduling;brief announcement: universally truthful secondary spectrum auctions;and brief announcement: online batch scheduling for flow objectives.
the proceedings contain 52 papers. the topics discussed include: randomized approximate nearest neighbor search with limited adaptivity;encoding short ranges in TCAM without expansion: efficient algorithm and applicat...
ISBN:
(纸本)9781450342100
the proceedings contain 52 papers. the topics discussed include: randomized approximate nearest neighbor search with limited adaptivity;encoding short ranges in TCAM without expansion: efficient algorithm and applications;extending the nested parallel model to the nested dataflow model with provably efficient schedulers;latency-hiding work stealing: scheduling interacting parallel computations with work stealing;provably good and practically efficient parallel race detection for fork-join programs;dynamic determinacy race detection for task parallelism with futures;RUBIC: online parallelism tuning for co-located transactional memory applications;investigating the performance of hardware transactions on a multi-socket machine;parallelalgorithms for asymmetric read-write costs;general profit scheduling and the power of migration on heterogeneous machines;the power of migration in online machine minimization;fair online scheduling for selfish jobs on heterogeneous machines;scheduling parallelizable jobs online to minimize the maximum flow time;clairvoyant dynamic bin packing for job scheduling with minimum server usage time;parallel equivalence class sorting: algorithms, lower bounds, and distribution-based analysis;and parallel approaches to the string matching problem on the GPU.
the design and use of scalable parallel computers is discussed. Much of the knowledge gained in the study of distributd systems is relevant, although it needs to be used in new manners. the author argues that the chal...
详细信息
ISBN:
(纸本)0897916131
the design and use of scalable parallel computers is discussed. Much of the knowledge gained in the study of distributd systems is relevant, although it needs to be used in new manners. the author argues that the challenge is to achieve maximum reuse of existing technologies, microprocessor architectures, I/O systems, programming languages, operating systems, application codes, without compromising too much parallel performance and scalability. Such skillful compromises are essential to the success of the parallel processing industry, and pose even more intellectual challenges than the 'start from scratch' approach.
Chips (or chip sets) which include one or more CPUs, some local memory, and rudimentary communications and routing hardware are becoming common (eg. transputers, SRCs HNet, the nodes of most MIMD machines). these chip...
详细信息
暂无评论