This paper envisions an extension to a standard instruction set which efficiently implements PRAM-style algorithms using explicit multi-threaded instruction-level parallelism (ILP);that is, Explicit Multi-Threading (X...
详细信息
ISBN:
(纸本)9780897919890
This paper envisions an extension to a standard instruction set which efficiently implements PRAM-style algorithms using explicit multi-threaded instruction-level parallelism (ILP);that is, Explicit Multi-Threading (XMT), a fine-grained computational paradigm covering the spectrum from algorithms through architecture to implementation is introduced;new elements are added where needed.
We present parallel greedy approximation algorithms for set cover and related problems. These algorithms build on an algorithm for solving a graph problem we formulate and study called Maximal Nearly Independent Set (...
详细信息
ISBN:
(纸本)9781450307437
We present parallel greedy approximation algorithms for set cover and related problems. These algorithms build on an algorithm for solving a graph problem we formulate and study called Maximal Nearly Independent Set (MANIS)-a graph abstraction of a key component in existing work on parallel set cover. We derive a randomized algorithm for MANIS that has O(m) work and O(log(2)m) depth on input with m edges. Using MANIS, we obtain RNC algorithms that yield a (1 + epsilon)H-n-approximation for set cover, a (1 - 1/e - epsilon)-approximation for max cover and a (4 + epsilon)-approximation for min-sum set cover all in linear work;and an O(log*n)-approximation for asymmetric k-center for k <= log(O(1)) n and a (1.861 + epsilon)-approximation for metric facility location both in essentially the same work bounds as their sequential counterparts.
Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications. While shared-memory parallel partitioners have r...
详细信息
ISBN:
(纸本)9798400704161
Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications. While shared-memory parallel partitioners have recently matured to achieve the same quality as widely used sequential partitioners, there is still a pronounced quality gap between distributed partitioners and their sequential counterparts. In this work, we shrink this gap considerably by describing the engineering of an unconstrained local search algorithm suitable for distributed partitioners. We integrate the proposed algorithm in a distributed multilevel partitioner. Our extensive experiments show that the resulting algorithm scales to thousands of PEs while computing cuts that are, on average, only 3.5% larger than those of a state-of-the-art high-quality shared-memory partitioner. Compared to previous distributed partitioners, we obtain on average 6.8% smaller cuts than the best-performing competitor while being more than 9 times faster.
Techniques for quickly executing lengthy computations by the use of molecular parallelism are described. It is demonstrated that molecular computations can be done using short DNA strands by more or less conventional ...
详细信息
ISBN:
(纸本)9780897917179
Techniques for quickly executing lengthy computations by the use of molecular parallelism are described. It is demonstrated that molecular computations can be done using short DNA strands by more or less conventional biotechnology engineering techniques within a small number of laboratory steps. Two abstract models of molecular computation are proposed. The first, the parallel Associative Memory Model, is a very high level model which includes a parallel Associative Matching operation, that appears to improve the power of molecular parallelism beyond the operations previously considered by Lipton (1994). A Recombinant DNA Model is also proposed which is a low level model that allows operations that are abstractions of very well understood recombinant DNA operations and provides a representation, herein called complex, for the relevant structural properties of DNA.
NEighborhood MOdeling (NEMO) provides transparent parallelism to develop parallel spatial data processing. It addresses the following parallel issues: architecture and machine independence;communication bottlenecks;da...
详细信息
NEighborhood MOdeling (NEMO) provides transparent parallelism to develop parallel spatial data processing. It addresses the following parallel issues: architecture and machine independence;communication bottlenecks;data visualization;casualty errors;load balancing;and data coherence. NEMO is capable of processing three types of time consuming raster neighborhood models: cellular automata;propagation;and neighborhood analysis. NEMO achieves this flexibility by including five components to its design: the three application drivers such as the cellular automata driver, propagation driver, and neighborhood analysis automata driver;and the display manager and raster database manager.
The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challeng...
详细信息
ISBN:
(纸本)9798400704161
The idea of dynamic programming (DP), proposed by Bellman in the 1950s, is one of the most important algorithmic techniques. However, in parallel, many fundamental and sequentially simple problems become more challenging, and open to a (nearly) work-efficient solution (i.e., the work is off by at most a polylogarithmic factor over the best sequential solution). In fact, sequential DP algorithms employ many advanced optimizations such as decision monotonicity or special data structures, and achieve better work than straightforward solutions. Many such optimizations are inherently sequential, which creates extra challenges for a parallel algorithm to achieve the same work bound. The goal of this paper is to achieve (nearly) work-efficient parallel DP algorithms by parallelizing classic, highly-optimized and practical sequential algorithms. We show a general framework called the Cordon Algorithm for parallel DP algorithms, and use it to solve several classic problems. Our selection of problems includes Longest Increasing Subsequence (LIS), sparse Longest Common Subsequence (LCS), convex/concave generalized Least Weight Subsequence (LWS), Optimal Alphabetic Tree (OAT), and more. We show how the Cordon Algorithm can be used to achieve the same level of optimization as the sequential algorithms, and achieve good parallelism. Many of our algorithms are conceptually simple, and we show some experimental results as proofs-of-concept.
An HMM is a very simple parallel machine consisting of finite state devices that can manipulate pointers to each other. The more commonly studied PRAM is a much richer parallel machine with a shared global memory and ...
This paper experimentally validates performance related issues for parallel computation models on several parallel platforms (a MasPar MP-1 with 1024 processors, a 64-node GCel and a CM-5 of 64 processors). Our work c...
详细信息
This paper experimentally validates performance related issues for parallel computation models on several parallel platforms (a MasPar MP-1 with 1024 processors, a 64-node GCel and a CM-5 of 64 processors). Our work consists of three parts. First, there is an evaluation part in which we investigate whether the models correctly predict the execution time of an algorithm implementation. Unlike previous work, which mostly demonstrated a close match between the measured and predicted running times, this paper shows that there are situations in which the models do not precisely predict the actual execution time of an algorithm implementation. Second, there is a comparison part in which the models are contrasted with each other in order to determine which model induces the fastest algorithms. Finally, there is an efficiency validation part in which the performance of the model derived algorithms are compared with the performance of highly optimized library routines to show the effectiveness of deriving fast algorithms through the formalisms of the models.
We discuss parallel sorting algorithms and their implementations suitable for cluster architectures in order to optimize cluster resources. We focus on the time spent in computation and the load balancing properties w...
详细信息
We discuss parallel sorting algorithms and their implementations suitable for cluster architectures in order to optimize cluster resources. We focus on the time spent in computation and the load balancing properties when processors are running at different speeds, i.e. correlated by a multiplicative constant factor (our weak definition of heterogeneous platform). One scheme is under study: parallel sorting by sampling (either regular sampling technique introduced by Shi and Schaeffer [J. parallel Distrib. Comput. 14 (4) (1992) 361] or the over-partitioning scheme introduced by Li and Seveik [parallel sorting by over-partitioning, in: proceedings of the Sixth annualsymposium on parallelalgorithms and architectures, acm Press, New York, June 1994]). What is important in the paper is mainly the load balance factor and not necessary the execution time. It is clear that improved load balance leads to improved execution titre. The results presented in the paper demonstrate that load balancing for the case of computers with heterogeneous processing capacity is more challenging than for the homogeneous case. The survey, through the sorting case study, allow us to identify some algorithmic issues and software challenges to master heterogeneous cluster platforms in order to better utilize theta: data decomposition techniques, scheduling and load balancing methods. (C) 2002 Elsevier Science B.V. All rights reserved.
There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorith...
详细信息
ISBN:
(纸本)9781595939739
There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorithm, called Directed Transmission Method (DTM). DTM is a fully asynchronous, scalable and continuous-time iterative algorithm to solve the arbitrarily-large sparse linear system whose coefficient matrix is symmetric-positive-definite (SPD). DTM is able to be freely running on the heterogeneous parallel computer with arbitrary number of processors, which might be manycore microprocessors, clusters, grids, clouds, and the Internet. We proved that DTM is convergent by making use of the final value theorem of Laplacian Transformation. Numerical experiments show that DTM is efficient.
暂无评论