We study scalable parallel computational geometry algorithms for the coarse grained multicomputer model: p processors solving a problem on n data items, were each processor has O(n/p) much greater than O(1) local memo...
详细信息
We study scalable parallel computational geometry algorithms for the coarse grained multicomputer model: p processors solving a problem on n data items, were each processor has O(n/p) much greater than O(1) local memory and all processors are connected via some arbitrary interconnection network (e.g. mesh, hypercube, fat tree). We present O(T-sequential/p + T-s(n,p)) time scalable parallel algorithms for several computational geometry problems. T-s(n,p) refers to the time of a global sort operation. Our results are independent of the multicomputer's interconnection network. Their time complexities become optimal when T-sequential/p dominates T-s(n,p) or when T-s(n,p) is optimal. This is the case for several standard architectures, including meshes and hypercubes, and a wide range of ratios n/p that include many of the currently available machine configurations. Our methods also have some important practical advantages: For interprocessor communication, they use only a small fixed number of one global routing operation, global sort, and all other programming is in the sequential domain. Furthermore, our algorithms use only a small number of very large messages, which greatly reduces the overhead for the communication protocol between processors. (Note however, that our time complexities account for the lengths of messages.) Experiments show that our methods are easy to implement and give good timing results.
This paper proposes an optimal algorithm for detecting fine or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular it is optimal...
详细信息
This paper proposes an optimal algorithm for detecting fine or medium grain parallelism in nested loops whose dependences are described by an approximation of distance vectors by polyhedra. In particular it is optimal for direction vectors, which generalizes Wolf and Lam's algorithm (1991) to the case of several statements. It relies on a dependence uniformization process and an parallelization techniques related to system of uniform recurrence equations.
The authors propose a new class of interconnection networks called recursive hierarchical swapped networks (RHSN) for general-purpose parallel processing. The node degrees of RHSNs can vary from a small number to as l...
详细信息
The authors propose a new class of interconnection networks called recursive hierarchical swapped networks (RHSN) for general-purpose parallel processing. The node degrees of RHSNs can vary from a small number to as large as required, depending on recursive and hierarchical composition parameters and the nucleus graph chosen. The diameter of an RHSN can be asymptotically optimal within a small constant factor. They present efficient routing, semigroup computation, ascend/descend, matrix-matrix multiplication, and emulation algorithms, thus proving the versatility of RHSNs. In particular on suitably constructed RHSNs, matrix multiplication can be performed faster than the DNS algorithm on a hypercube. Furthermore, ascend/descend algorithms, semigroup computation, and parallel prefix computation can be done using algorithms with asymptotically fewer communication steps than on a hypercube.
A fundamental problem in parallel computing is to design high-level, architecture independent, algorithms that execute efficiently on general purpose parallel machines. The aim is to be able to achieve portability and...
详细信息
A fundamental problem in parallel computing is to design high-level, architecture independent, algorithms that execute efficiently on general purpose parallel machines. The aim is to be able to achieve portability and high performance simultaneously. A key to accomplishing this is the existence of a computation model that can bridge the gap between the high level programming models and the underlying hardware models. There are currently two factors that make this fundamental problem more tractable. The first is the emergence of a dominant parallel architecture consisting of a number of powerful microprocessors interconnected by either a proprietary interconnect, or a standard off-the-shelf interconnect (such as an ATM switch). The second factor is the emergence of standards, such as the message passing standard MPI, for which efficient implementations are either available or about to appear on most machines. Our recent work has exploited these two developments by developing a methodology based on (1) a simple computation model for the current MIMD platforms that incorporates communication cost into the complexity of the algorithms, and (2) a SPMD programming model that makes effective use of communication primitives. We describe our approach for validating the computation model based on extensive experimentation and the development of benchmarks, and discuss its extension to the emerging clusters of Symmetric Multiprocessors (SMPs) architecture.
Most parallel databases exploit two types of parallelism: intra-query parallelism and inter-transaction concurrency. Between these two cases lies another type of parallelism: inter-query parallelism within a transacti...
详细信息
Most parallel databases exploit two types of parallelism: intra-query parallelism and inter-transaction concurrency. Between these two cases lies another type of parallelism: inter-query parallelism within a transaction or application. Exploiting inter-query parallelism requires either compiler support to automatically parallelize the existing embedded query programs; or programming support to write explicitly parallel query programs. The authors present compiler analysis to automatically detect parallelism in the embedded query programs. They present compiler algorithms for detecting dependences in such programs. They show that the properties of some aggregate functions such as MIN and MAX can help reduce statically computed dependences.
This paper proposes a technique of iterative dynamic programming to plan minimum energy consumption trajectories for robotic manipulators. The dynamic programming method is modified to perform a series of dynamic prog...
详细信息
This paper proposes a technique of iterative dynamic programming to plan minimum energy consumption trajectories for robotic manipulators. The dynamic programming method is modified to perform a series of dynamic programming passes over a small reconfigurable grid covering only a portion of the solution space at any one pass. Although strictly no longer a global optimization process, this iterative approach retains the ability to avoid some poor local minima while avoiding the curse of dimensionality associated with a pure dynamic programming approach. The algorithm has an inherent parallel structure, allowing for reduced computation time on parallel architecture computers. No limiting assumptions are made about the performance index, or function to be optimized. As such, extremely complex functions and constraints are easily handled. Joint actuator and time constraints are considered in this work. The modified dynamic programming approach is verified experimentally by planning and executing a minimum energy consumption path for a Reis V15 industrial manipulator.
In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric affine loop bounds requires fundamental mathematical results. The most common geometrical model of iteration ...
详细信息
In the area of automatic parallelization of programs, analyzing and transforming loop nests with parametric affine loop bounds requires fundamental mathematical results. The most common geometrical model of iteration spaces, called the polytope model, is based on mathematics dealing with convex and discrete geometry, linear programming, combinatorics and geometry of numbers. In this paper, we present an automatic method for computing the number of integer points contained in a convex polytope or in a union of convex polytopes. The procedure consists of first, computing the parametric vertices of a polytope defined by a set of parametric linear constraints, and then computing the Ehrhart polynomial, i.e. a parametric expression of the number of integer points. The paper is illustrated with the computation of the maximum available parallelism of a given loop nest.
The authors give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem. The normalized edit distance between two strings X and Y with lengths n/spl ges/m is the min...
详细信息
The authors give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem. The normalized edit distance between two strings X and Y with lengths n/spl ges/m is the minimum quotient of the sum of the costs of edit operations transforming X into Y by the length of the edit path corresponding to those edit operations. Marzal and Vidal (1993) proposed a sequential algorithm with a time complexity of O(nm/sup 2/). They show that this algorithm can be parallelized work-optimally on an array of n (or m) processors, and on a mesh of n/spl times/m processors. They then propose a sublinear time algorithm that is almost work-optimal: using O(mn/sup 1.75/) processors, the time complexity of the algorithm is O(n/sup 0.75/ log n) and the total number of operations is O (mn/sup 2.5/ log n). This algorithm runs on a CREW PRAM, but is likely to work on weaker PRAM models and hypercubes with minor modifications. Finally, they present a polylogarithmic O(log/sup 2/ n) time algorithm based on matrix multiplication which runs on a O(n/sup 6//log n) processor hypercube.
The authors present a general framework for approximation schemes on parallel processor scheduling. They propose /spl epsiv/-approximation algorithms for scheduling on identical, uniform and unrelated machines when th...
详细信息
The authors present a general framework for approximation schemes on parallel processor scheduling. They propose /spl epsiv/-approximation algorithms for scheduling on identical, uniform and unrelated machines when the number of processors is fixed. For each of the three problems considered, they perform grouping on job processing times in order to produce a transformed scheduling instance where the number of distinct task types is bounded. They optimally solve the corresponding mixed integer program and prove that the optimal makespans for the initial and the transformed problems can differ at most by a factor of 1+/spl epsiv/ The complexity of all /spl epsiv/-approximation algorithms is O(n), where n is the number of jobs to be scheduled.
In order to use networks of workstations in parallel processing applications, several schemes have been devised to allow processes on different, possibly heterogeneous, platforms to communicate with one another. The M...
详细信息
In order to use networks of workstations in parallel processing applications, several schemes have been devised to allow processes on different, possibly heterogeneous, platforms to communicate with one another. The Message-Passing Interface (MPI) is one such scheme that allows for message-passing across different architectures. The MPI specification does not make provisions for the migration of a process between machines. This paper describes the work required to modify, an MPI implementation to allow for task migration. It also describes "Hector", our heterogeneous computing task allocator that is used to migrate tasks automatically and improve the overall performance of a parallel program.
暂无评论