We present a new parallel algorithm for generating binary trees;it generates trees in A-order using A-sequences representation. This algorithm is adaptive and cost-optimal and is executed on a shared memory multiproce...
详细信息
We present a new parallel algorithm for generating binary trees;it generates trees in A-order using A-sequences representation. This algorithm is adaptive and cost-optimal and is executed on a shared memory multiprocessor. We know of no other parallel algorithm in the literature that generates trees in A-order. This parallel algorithm is designed based on a presented sequential generation algorithm for A-sequences, with O(1) average time complexity. (c) 2005 Elsevier B.V. All rights reserved.
This paper gives improved parallel methods for several exact factorizations of some classes of symmetric positive definite (SPD) matrices. Our factorizations also provide us similarly efficient algorithms for exact co...
详细信息
This paper gives improved parallel methods for several exact factorizations of some classes of symmetric positive definite (SPD) matrices. Our factorizations also provide us similarly efficient algorithms for exact computation of the solution of the corresponding linear systems (which need not be SPD), and for finding rank and determinant magnitude. We assume the input matrices have entries that are rational numbers expressed as a ratio of integers with at most a polynomial number of bits beta. We assume a parallel random access machine (PRAM) model of parallel computation, with unit cost arithmetic operations, including division, over a finite field Z(p), where p is a prime number whose binary representation is linear in the size of the input matrix and is randomly chosen by the algorithm. We require only bit precision O(n(beta + log n)), which is the asymptotically optimal bit precision for beta >= log n. Our algorithms are randomized, giving the outputs with high likelihood >= 1 - 1/n(Omega(1)). We compute LU and QR factorizations for dense matrices, and LU factorizations of sparse matrices which are s(n)-separable, reducing the known parallel time bounds for these factorizations from Omega(log(3) n) to O(log(2) n), without an increase in processors (matching the best known work bounds of known parallel algorithms with polylog time bounds). Using the same parallel algorithm specialized to structured matrices, we compute LU factorizations for Toeplitz matrices and matrices of bounded displacement rank in time O(log(2) n) with n log log n processors, reducing by a nearly linear factor the best previous processor bounds for polylog times (however, these prior works did not generally require unit cost division over a finite field). We use this result to solve in the same bounds: polynomial resultant;and Pade approximants of rational functions;and in a factor O(log n) more time: polynomial greatest common divisors (GCD) and extended GCD;again reducing the best process
This article deals with the parallel computing solution of fluid flow problems using a meshless element-free Galerkin (EFG) method. The EFG method utilizes moving least- square (MLS) approximants to approximate the un...
详细信息
This article deals with the parallel computing solution of fluid flow problems using a meshless element-free Galerkin (EFG) method. The EFG method utilizes moving least- square (MLS) approximants to approximate the unknown field variables. The MLS approximant consists of three components: a weight function, a basis function, and a set of nonconstant coefficients. A new parallel algorithm is proposed for the EFG method. The code has been written in FORTRAN language using MPI message passing library and implemented on a MIMD (multiple-instruction multiple- data)-type PARAM 10000 supercomputer. The code has been validated by solving three model fluid flow problems. A comparison is made among the results (velocities values) obtained by the EFG method with those obtained by the finite-element method (FEM) at a few typical locations. For 8 processors, speedup and efficiency have been obtained as 2.73 and 34.22% for N 1,552 in 1-D (Example I), 7.20 and 90.00% for N 1,462 in 2- D (Example II), and 7.30 and 91.27% for N 1,346 in 2- D (Example III), respectively.
This paper shows how to effectively combine a sampling-based method primarily designed for multiple-query motion planning [probabilistic roadmap method (PRM)] with sampting-based tree methods primarily designed for si...
详细信息
This paper shows how to effectively combine a sampling-based method primarily designed for multiple-query motion planning [probabilistic roadmap method (PRM)] with sampting-based tree methods primarily designed for single-query motion planning (expansive space trees, rapidly exploring random trees, and others) in a novel planning framework that can be efficiently parallelized. Our planner not only achieves a smooth spectrum between multiple-query and single-query planning, but it combines advantages of both. We present experiments which show that our planner is capable of solving problems that cannot be addressed efficiently with PRM or single-query planners. A key advantage of our planner is that it is significantly more decoupled than PRIM and sampling-based tree planners. Exploiting this property, we designed and implemented a parallel version of our planner. Our experiments show that our planner distributes well and can easily solve high-dimensional problems that exhaust resources available to single machines and cannot be addressed with existing planners.
Program slicing is the process of deleting statements in a program that do not affect a given set of variables at a chosen point in the program. In this paper the parallel slicing algorithm is introduced. It is shown ...
详细信息
Program slicing is the process of deleting statements in a program that do not affect a given set of variables at a chosen point in the program. In this paper the parallel slicing algorithm is introduced. It is shown how the control flow graph of the program to be sliced is converted into a network of concurrent processes, thereby producing a parallel version of Weiser's original static slicing algorithm. Keywords:.
Decision trees have been found very effective for classification especially in data mining. Although classification is a well studied problem, most of the current classification algorithms need an in-memory data struc...
详细信息
Automated parametric design is becoming an important issue in engineering. Especially in analog integrated circuit design this trend is becoming more and more obvious. Parametric optimization is a key element of autom...
详细信息
Automated parametric design is becoming an important issue in engineering. Especially in analog integrated circuit design this trend is becoming more and more obvious. Parametric optimization is a key element of automated parametric design. For the reason of computational complexity of the cost function evaluation (which is usually simulation based) optimization can take a lot of time even for solving small design problems. One of the possible ways for attacking the computational complexity of the typical optmization problems is the use of parallel optimization algorithms. Two large families of such algorithms exist: synchronous and asynchronous algorithms. Due to their flexibility and efficiency asynchronous algorithms are becoming a viable option for parallel optimization. An asynchronous parallel simplex algorithm for unconstrained minimization is presented based on the convergent variant of the Nelder-Mead simplex algorithm. Several different approaches to information exchange between workers in a parallel asynchronous optimization system are discussed and evaluated on a set of testbench unconstrained optimization problems. Speedup results are presented for the best-performing information exchange algorithm. The results show a significant speedup for moderately dimensional optimization problems. The defficiencies of the algorithm are discussed and guidelines for future work are given.
We present provably efficient parallel algorithms for sweep scheduling on unstructured meshes. Sweep scheduling is a commonly used technique in radiation transport problems, and involves inverting an operator by itera...
详细信息
We present provably efficient parallel algorithms for sweep scheduling on unstructured meshes. Sweep scheduling is a commonly used technique in radiation transport problems, and involves inverting an operator by iteratively sweeping across a mesh. Each sweep involves solving the operator locally at each cell. However, each direction induces a partial order in which this computation can proceed. On a distributed computing system, the goal is to schedule the computation, so that the length of the schedule is minimized. Several heuristics have been proposed for this problem; but none of the heuristics have worst case performance guarantees. We present a simple, almost linear time randomized algorithm which (provably) gives a schedule of length at most O(log/sup 2/n) times the optimal schedule for instances with n cells, when the communication cost is not considered, and a slight variant, which coupled with a much more careful analysis, gives a schedule of (expected) length O(logmlogloglogm) times the optimal schedule for m processors. These are the first such provable guarantees for this problem. We also design a priority based list schedule using these ideas, with the same theoretical guarantee, but much better performance in practice. We complement our theoretical results with extensive empirical analysis. The results show that (i) our algorithm performs very well and has significantly better performance guarantee in practice and (ii) the algorithm compares favorably with other natural and efficient parallel algorithms proposed in the literature [S. Pautz, (2002), S. Plimpton et al., (2001)].
We present a parallel bisection mesh refinement algorithm based on ALBERT (Adaptive multi-Level finite element toolbox using Bisection refinement and Error control by Residual Techniques). The goal is to develop a par...
详细信息
We present a parallel bisection mesh refinement algorithm based on ALBERT (Adaptive multi-Level finite element toolbox using Bisection refinement and Error control by Residual Techniques). The goal is to develop a parallel adaptive finite element code suitable for distributed memory parallel computers or PC clusters. An overview on the basic strategy for the parallelization of ALBERT is given. Issues on the parallel mesh refinement are addressed. A modified mesh refinement algorithm, which can be implemented efficiently on distributed memory parallel computers, is proposed and its properties are discussed. Numerical experiments with parallel bisection mesh refinement algorithm are shown.
We present an experimental study of parallel biconnected components algorithms employing several fundamental parallel primitives, e.g., prefix sum, list ranking, sorting, connectivity, spanning tree, and tree computat...
详细信息
We present an experimental study of parallel biconnected components algorithms employing several fundamental parallel primitives, e.g., prefix sum, list ranking, sorting, connectivity, spanning tree, and tree computations. Previous experimental studies of these primitives demonstrate reasonable parallel speedups. However, when these algorithms are used as subroutines to solve higher-level problems, there are two factors that hinder fast parallel implementations. One is parallel overhead, i.e., the large constant factors hidden in the asymptotic bounds; the other is the discrepancy among the data structures used in the primitives that brings non-negligible conversion cost. We present various optimization techniques and a new parallel algorithm that significantly improve the performance of finding biconnected components of a graph on symmetric multiprocessors (SMPs). Finding biconnected components has application in fault-tolerant network design, and is also used in graph planarity testing. Our parallel implementation achieves speedups up to 4 using 12 processors on a Sun E4500 for large, sparse graphs, and the source code is freely-available at our Web site.
暂无评论