A class of parallel and parallel/pipeline algorithms for computation of the manipulator inertial matrix is presented. An algorithm based on the composite rigid-body spatial inertia method, which results in less data d...
详细信息
A class of parallel and parallel/pipeline algorithms for computation of the manipulator inertial matrix is presented. An algorithm based on the composite rigid-body spatial inertia method, which results in less data dependency and hence better parallelization efficiency, is used for computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound of O((log/sub 2/ n))+O(1) in the computation with O(n/sup 2/) processors. The architectural features required for perfect mapping of these algorithms and their communication complexity are analyzed. The performance of the algorithms when mapped on two- and one-dimensional (linear) processor arrays with nearest-neighbor connection is investigated. Mapping on the linear array results in new algorithms with a computational complexity of k/sub 1/n(log/sub 2/n)+k/sub 2/(log/sub 2/n)+k/sub 3/. A parallel/pipeline algorithm is also presented which achieves the computation time of k/sub 1/n+k/sub 2/(log/sub 2/ n)+k/sub 3/ on the linear array. An architecture-oriented approach is used in the design of the algorithms.< >
Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an irregular application, fine-grained s...
详细信息
Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an irregular application, fine-grained synchronization is one technique for managing the coordination of work;but in practice the actual performance for irregular problems depends on the input, the access pattern to shared data structures, the relative speed of processors, and the hardware support of synchronization primitives. In this paper, we focus on lock-free and mutual exclusion protocols for handling fine-grained synchronization. Mutual exclusion and lock-free protocols have received a fair amount of attention in coordinating accesses to shared data structures from concurrent processes. Mutual exclusion offers a simple programming abstraction, while lock-free data structures provide better fault tolerance and eliminate problems associated with critical sections such as priority inversion and deadlock. These synchronization protocols, however, are seldom used in parallel algorithm designs, especially for algorithms under the SPMD paradigm, as their implementations are highly hardware dependent and their costs are hard to characterize. Using graph-theoretic algorithms for illustrative purposes, we show experimental results on two shared-memory multiprocessors, the IBM pSeries 570 and the Sun Enterprise 4500, that irregular parallel algorithms with efficient fine-grained synchronization may yield good performance. (C) 2006 Elsevier Inc. All rights reserved.
Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. Here we present p...
详细信息
Given a set of n intervals representing an interval graph, the problem of finding a maximum matching between pairs of disjoint (nonintersecting) intervals has been considered in the sequential model. Here we present parallel algorithms for computing maximum cardinality matchings among pairs of disjoint intervals in interval graphs in the EREW PRAM and hypercube models. For the general case of the problem, our algorithms compute a maximum matching in O(log3 n) time using O(n/log2 n) processors on the EREW PRAM and using O(n) processors on the hypercubes. For the case of proper interval graphs, our algorithm runs in O(log n) time using O(n) processors if the input intervals are not given already sorted and using O(n/log n) processors otherwise, on the EREW PRAM. On n-processor hypercubes, our algorithm for this case takes O(log n log log n) time for unsorted input and O(log n) time for sorted input. Our parallel results also lead to optimal sequential algorithms for computing maximum matchings among disjoint intervals. We also present an improved parallel algorithm for maximum matching between overlapping intervals in proper interval graphs.
In this paper, we consider some shortest path related problems Iron interval and circular-arc graphs. For the all-pair shortest path query problem on interval and circular-arc graphs, instead of using the sophisticate...
详细信息
In this paper, we consider some shortest path related problems Iron interval and circular-arc graphs. For the all-pair shortest path query problem on interval and circular-arc graphs, instead of using the sophisticated technique, we propose simple parallel algorithms using only the parallel prefix and suffix computations and the Euler tour technique. Our preprocessing algorithms run in O(log n) time using O(nlog n) processors. Using the data structure constructed by our preprocessing algorithms, a query of the length of a shortest path between any two vertices can be answered n constant time by using a single processor. For the hinge vertex problem on interval graphs, we propose an O(log n) time algorithm using O(nlog n) processors. It leads to a linear time sequential algorithm. Our algorithms work on the EREW PRAM model.
External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are spec...
详细信息
External memory (EM) algorithms are designed for large-scale computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Typical EM algorithms are specially crafted for the EM situation. In the past, several attempts have been made to relate the large body of work on parallel algorithms to EM, but with limited success. The combination of EM computing, on multiple disks, with multiprocessor parallelism has been posted as a challenge by the ACM Working Group on Storage I/O for Large-Scale Computing. In this paper we provide a simulation technique which produces efficient parallel EM algorithms from efficient BSP-like parallel algorithms. The techniques obtained can accommodate one or multiple processors on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine. When applied to existing BSP-like algorithms, our simulation technique produces improved parallel EM algorithms for a large number of problems.
In this paper we give parallel algorithms for a number of problems defined on point sets and polygons. All our algorithms have optimal T(n) * P(n) products, where T(n) is the time complexity and P(n) is the number of ...
详细信息
In this paper we give parallel algorithms for a number of problems defined on point sets and polygons. All our algorithms have optimal T(n) * P(n) products, where T(n) is the time complexity and P(n) is the number of processors used, and are for the EREW PRAM or CREW PRAM models. Our algorithms provide parallel analogues to well-known phenomena from sequential computational geometry, such as the fact that problems for polygons can oftentimes be solved more efficiently than point-set problems, and that nearest-neighbor problems can be solved without explicitly constructing a Voronoi diagram.
In this paper, we explore some properties of identification matrices and exhibit some uses of identification matrices in studying the graph isomorphism problem, a famous open problem. We show that, given two graphs in...
详细信息
In this paper, we explore some properties of identification matrices and exhibit some uses of identification matrices in studying the graph isomorphism problem, a famous open problem. We show that, given two graphs in the form of a certain identification matrix, isomorphism can be tested efficiently in parallel if at least one matrix satisfies the circular is property, and more efficiently in parallel ii at least one matrix satisfies the consecutive Is property. Graphs which have identification matrices satisfying the consecutive Is property include, among others, proper interval graphs and doubly convex bipartite graphs. The result presented here substantially broadens the class of graphs for which there are known efficient parallel isomorphism testing algorithms.
In this paper, sequential and parallel algorithms using derivatives for solving unconstrained one-dimensional global optimization problems are described. Sufficient conditions of convergence to all global minimizers a...
详细信息
In this paper, sequential and parallel algorithms using derivatives for solving unconstrained one-dimensional global optimization problems are described. Sufficient conditions of convergence to all global minimizers are established for both methods. parallel algorithm conditions, which guarantee significant speed up in comparison to the sequential version of the method, are presented. The sequential method is numerically compared with the algorithms of Breiman and Cutler, Pijavskii, and Strongin on a set of 20 test functions taken from literature. We also present results of numerical experiments illustrating the performance of the parallel method. All experiments have been executed on the parallel computer ALLIANT FX/80. (C) 1999 Elsevier Science Ltd. All rights reserved.
With the continuous development of hardware and software, Graphics Processor Units (GPUs) have been used in the general-purpose computation field. They have emerged as a computational accelerator that dramatically red...
详细信息
With the continuous development of hardware and software, Graphics Processor Units (GPUs) have been used in the general-purpose computation field. They have emerged as a computational accelerator that dramatically reduces the application execution time with CPUs. To achieve high computing performance, a GPU typically includes hundreds of computing units. The high density of computing resource on a chip brings in high power consumption. Therefore power consumption has become one of the most important problems for the development of GPUs. This paper analyzes the energy consumption of parallel algorithms executed in GPUs and provides a method to evaluate the energy scalability for parallel algorithms. Then the parallel prefix sum is analyzed to illustrate the method for the energy conservation, and the energy scalability is experimentally evaluated using Sparse Matrix-Vector Multiply (SpMV). The results show that the optimal number of blocks, memory choice and task scheduling are the important keys to balance the performance and the energy consumption of GPUs.
We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O(f(n)/p + I(n)log n) on the PRAM using p processors, where I(n) is log n on the EREW PRAM, lo...
详细信息
We present parallel algorithms for computing all pair shortest paths in directed graphs. Our algorithm has time complexity O(f(n)/p + I(n)log n) on the PRAM using p processors, where I(n) is log n on the EREW PRAM, log log n on the CCRW PRAM, f(n) is o(n(3)). On the randomized CRCW PRAM we are able to achieve time complexity O (n(3)/p + log n) using p processors.
暂无评论