In this paper we consider the problem of finding perfect matchings in parallel. We present a RNC algorithm with almost optimal work with respect to sequential algorithms, i.e., it uses O(n(omega)) processors, where om...
详细信息
In this paper we consider the problem of finding perfect matchings in parallel. We present a RNC algorithm with almost optimal work with respect to sequential algorithms, i.e., it uses O(n(omega)) processors, where omega is the matrix multiplication exponent. Our algorithm is based on an RNC algorithm for computing determinant of a degree one polynomial matrix which is of independent interest.
The current paper provides preliminary statements of the panelists ahead of a panel discussion at the acm SPAA 2021 conference on the topic: algorithm-friendly architecture versus architecture-friendly algorithms. ...
详细信息
Gang scheduling is a resource management scheme for parallel and distributed systems that combines time-sharing with space-sharing to ensure short response times for interactive tasks and high overall system throughpu...
详细信息
Gang scheduling is a resource management scheme for parallel and distributed systems that combines time-sharing with space-sharing to ensure short response times for interactive tasks and high overall system throughput. In this paper, we present and analyze a queueing theoretic model for a general gang scheduling scheme that forms the basis of a multiprogramming environment currently being developed for IBM's SP2 parallel system and for clusters of workstations. Our model and analysis can be used to tune our scheduler in order to maximize its performance on each hardware platform.
This paper describes the derivation of an empirically efficient parallel two-dimensional Delaunay triangulation program from a theoretically efficient CREW PRAM algorithm. Compared to previous work, the resulting impl...
详细信息
ISBN:
(纸本)9780897918909
This paper describes the derivation of an empirically efficient parallel two-dimensional Delaunay triangulation program from a theoretically efficient CREW PRAM algorithm. Compared to previous work, the resulting implementation is not limited to datasets with a uniform distribution of points, achieves significantly better speedups over good serial code, and is widely portable due to its use of MPI as a communication mechanism. Results are presented for a loosely-coupled cluster of workstations, a distributed-memory multicomputer, and a shared-memory multiprocessor. The Machiavelli toolkit used to transform the nested data parallelism inherent in the divide-and-conquer algorithm into achievable task and data parallelism is also described and compared to previous techniques.
There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorith...
详细信息
ISBN:
(纸本)9781595939739
There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorithm, called Directed Transmission Method (DTM). DTM is a fully asynchronous, scalable and continuous-time iterative algorithm to solve the arbitrarily-large sparse linear system whose coefficient matrix is symmetric-positive-definite (SPD). DTM is able to be freely running on the heterogeneous parallel computer with arbitrary number of processors, which might be manycore microprocessors, clusters, grids, clouds, and the Internet. We proved that DTM is convergent by making use of the final value theorem of Laplacian Transformation. Numerical experiments show that DTM is efficient.
In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity usi...
详细信息
ISBN:
(纸本)9780897919890
In this paper we develop models for and analyze several randomized work stealing algorithms in a dynamic setting. Our models represent the limiting behavior of systems as the number of processors grows to infinity using differential equations. The advantages of this approach include the ability to model a large variety of systems and to provide accurate numerical approximations of system behavior even when the number of processors is relatively small. We show how this approach can yield significant intuition about the behavior of work stealing algorithms in realistic settings.
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., BSP and LOGP) account for bandwidth limitations...
详细信息
Recently there has been an increasing interest in models of parallel computation that account for the bandwidth limitations in communication networks. Some models (e.g., BSP and LOGP) account for bandwidth limitations using a per-processor parameter g>1, such that each processor can send/receive at most h messages in g·h time. Other models (e.g., PRAM(m)) account for bandwidth limitations as an aggregate parameter mΩ(√lg p) separation known previously.
A straight-line grid embedding of a planar graph is a drawing of the graph on a plane where the vertices are located at grid points and the edges are represented by nonintersecting segments of straight lines joining t...
详细信息
ISBN:
(纸本)089791483X
A straight-line grid embedding of a planar graph is a drawing of the graph on a plane where the vertices are located at grid points and the edges are represented by nonintersecting segments of straight lines joining their incident vertices. Given an n-vertex planar graph with n ≥ 3, a straight-line embedding on a grid of size (n - 2) × (n - 2) can be computed deterministically in O(log n log log n) time with O(n log log n) work on a parallel random access machine. If randomization is used, the complexity is improved to O(log n) expected time with the same work bound. The parallel random access machine used by these algorithms allows concurrent reads and concurrent writes of the shared memory;in case of a write conflict, an arbitrary processor succeeds.
The contributions of this paper are twofold. First, we outline criteria by which any model of asynchronous shared memory parallel computation can be judged. Previous models are considered with respect to these factors...
详细信息
ISBN:
(纸本)0897913701
The contributions of this paper are twofold. First, we outline criteria by which any model of asynchronous shared memory parallel computation can be judged. Previous models are considered with respect to these factors. Next, we introduce a new model, and show that this model fulfils all the listed requirements. We also analyze in our model the complexity of several fundamental parallelalgorithms.
The issue of effectiveness of private caches for processors were studied. Since time for all processors to access the shared memory simultaneously is usually much longer than the time for a processor to access its own...
详细信息
ISBN:
(纸本)9780897918091
The issue of effectiveness of private caches for processors were studied. Since time for all processors to access the shared memory simultaneously is usually much longer than the time for a processor to access its own private cache, scheduling with private caches falls into the distributed memory model where the lower bound applies. The effectiveness of private caches were shown by proving that a version of Dynamic Equi-partition Scheduling Policy (DEQ) achieves a mean response time with five times the optimal mean response time in the cache clock time for a large class of parallel jobs well accepted in the parallel scheduling community. This shows an improvement of system performance by using private caches over that of purely shared memory.
暂无评论