An SIMD implementation of a method for approximating the stationary distribution vector of a Markov chain is presented. A key feature of the implementation is the simultaneous computation of several matrix inverses. C...
详细信息
ISBN:
(纸本)081864222X
An SIMD implementation of a method for approximating the stationary distribution vector of a Markov chain is presented. A key feature of the implementation is the simultaneous computation of several matrix inverses. Computational results from a MasPar MP-1 system are discussed.
We give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem The normalized edit distance between two strings X and Y with lengths n greater than or equal to m is ...
详细信息
ISBN:
(纸本)0818676833
We give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem The normalized edit distance between two strings X and Y with lengths n greater than or equal to m is the minimum quotient of the sum of the costs of edit operations transforming X into Y by the length of the edit path corresponding to those edit operations. Marzal and Vidal proposed a sequential algorithm with a time complexity of O(nm(2)). Ne show that this algorithm can be parallelized work-optimally on an array of n (or m) processors, and on a mesh of n x m processors. We then propose a sublinear time algorithm that is almost work-optimal: using O(mn(1.75)) processors, the time complexity of the algorithm is O(n(0.75) log n) and the total number of operations is O(mn(2.5) log n). This algorithm runs on a CREW PRAM, but is likely to work an weaker PRAM models and hypercubes with minor modifications. Finally, we present a polylogarithmic O(log(2) n) time algorithm based an matrix multiplication which runs on a O(n(6)/log n) processor hypercube.
This paper proposes a linear, deterministic, logical time model for distributed systems. We give an account of causality within distributed systems which undergirds the time model. We discuss some advantages for the a...
详细信息
This paper proposes a linear, deterministic, logical time model for distributed systems. We give an account of causality within distributed systems which undergirds the time model. We discuss some advantages for the application programmer in using our time model.
On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadth-first search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we p...
详细信息
ISBN:
(纸本)9781538643686
On a GPU cluster, the ratio of high computing power to communication bandwidth makes scaling breadth-first search (BFS) on a scale-free graph extremely challenging. By separating high and low out-degree vertices, we present an implementation with scalable computation and a model for scalable communication for BFS and direction-optimized BFS. Our communication model uses global reduction for high-degree vertices, and point-to-point transmission for low-degree vertices. Leveraging the characteristics of degree separation, we reduce the graph size to one third of the conventional edge list representation. With several other optimizations, we observe linear weak scaling as we increase the number of GPUs, and achieve 259.8 GTEPS on a scale-33 Graph500 RMAT graph with 124 GPUs on the latest CORAL early access system.
Following earlier work on independent multi-walk parallel local search, we present in this paper a framework for dependent multi-walk and its implementation. The new framework provides the possibility to communicate c...
详细信息
ISBN:
(纸本)9781479941162
Following earlier work on independent multi-walk parallel local search, we present in this paper a framework for dependent multi-walk and its implementation. The new framework provides the possibility to communicate configurations between concurrent local search engines in order to better focus the overall search on promising configurations. An MPI-based implementation has been realized and its evaluation on various benchmarks is ongoing.
This paper provides necessary and sufficient conditions for deadlock-free unicast and multicast routing with the path-based routing model in interconnection networks which use the wormhole switching technique. The the...
详细信息
This paper provides necessary and sufficient conditions for deadlock-free unicast and multicast routing with the path-based routing model in interconnection networks which use the wormhole switching technique. The theory is developed around three central concepts: channel waiting, False Resource Cycles, and valid destination sets. The first two concepts are suitable extensions to those developed for unicast routing by two authors of this paper;the third concept has been developed by Lin and Ni. The necessary and sufficient conditions can be applied in a straightforward manner to prove deadlock freedom and to find more adaptive routing algorithms for collective communication. The latter point is illustrated by developing two routing algorithms for multicast communication in 2-D mesh architectures. The first algorithm uses fewer resources (channels) than an algorithm proposed in the literature but achieves the same adaptivity. The second achieves full adaptivity for multicast routing.
Presents an implementation of a depth-first heuristic tree search on the single-instruction, multiple-data (SIMD) Connection Machine. The algorithm is based on Iterative-Deepening-A. Until recently, only highly regula...
详细信息
We develop some stochastic lower and upper bounds for parallel program execution times when there are limited processors. Such analysis can provide important information for job scheduling and resource allocation. For...
详细信息
We develop some stochastic lower and upper bounds for parallel program execution times when there are limited processors. Such analysis can provide important information for job scheduling and resource allocation. For several typical classes of parallel programs, we derive very accurate closed form approximations for the bounds. Examples are also given to demonstrate the quality of the bounds.
This paper is concerned with distributed receding horizon prediction for continuous-time linear stochastic systems with multiple sensors. A distributed fusion with the weighted sum structure is applied to the optimal ...
详细信息
ISBN:
(纸本)9781424443475
This paper is concerned with distributed receding horizon prediction for continuous-time linear stochastic systems with multiple sensors. A distributed fusion with the weighted sum structure is applied to the optimal local receding horizon predictors. The distributed prediction algorithm represents the optimal linear fusion by weighting matrices under the minimum mean square criterion. The algorithm has the parallel structure and allows parallelprocessing of observations making it reliable since the rest faultless sensors can continue to the fusion estimation if some sensors occur faulty. The derivation of equations for error cross-covariances between the local predictors is the key of this paper. Example demonstrates effectiveness of the distributed receding horizon predictor.
Results on how to place a limited number of resources in two dimensional torus-based parallel systems are described. The resources are placed so that every non-resource node is within a given distance d from some reso...
详细信息
ISBN:
(纸本)0818684038
Results on how to place a limited number of resources in two dimensional torus-based parallel systems are described. The resources are placed so that every non-resource node is within a given distance d from some resource node. It is proved that the proposed methods are optimal in terms of reducing the maximum distance between the resource and the non-resource nodes. Simulation results show that the proposed methods are superior to the existing methods in terms of the average message latency.
暂无评论