A distributed memory parallel Gauss-Seidel algorithm for linear algebraic systems is presented, in which a parameter is introduced to adapt the algorithm to different distributed memory parallel architectures. In this...
详细信息
A distributed memory parallel Gauss-Seidel algorithm for linear algebraic systems is presented, in which a parameter is introduced to adapt the algorithm to different distributed memory parallel architectures. In this algorithm, the coefficient matrix and the right-hand side of the linear algebraic system are first divided into row-blocks in the natural rowwise-order according to the performance of the parallel architecture in use. And then these row-blocks are distributed among local memories of all processors through torus-wrap mapping techniques. The solution iteration vector is cyclically conveyed among processors at each iteration so as to decrease the communication. The algorithm is a true Gauss-Seidel algorithm which maintains the convergence rate of the serial Gauss-Seidel algorithm and allows existing sequential codes to run in a parallel environment with a little investment in recoding. Numerical results are also given which show that the algorithm is of relatively high efficiency. (c) 2009 Elsevier Ltd. All rights reserved.
We present simple and fast parallel proximity algorithms for rigid polygonal models. Given two polygon-soup models in space, if they overlap, our algorithm can find all the intersected primitives between them;otherwis...
详细信息
We present simple and fast parallel proximity algorithms for rigid polygonal models. Given two polygon-soup models in space, if they overlap, our algorithm can find all the intersected primitives between them;otherwise, it reports their Euclidean minimum distance. Our algorithm is performed in a parallel fashion and shows scalable performance in terms of the number of available computing cores. The key ingredient of our algorithm is a simple load-balancing metric based on the penetration depth (PD) (for collision detection) and approximate Euclidean distance (for Euclidean distance computation) between bounding volumes. To compute the PD between oriented bounding boxes (OBBs), we present a novel algorithm based on the well-known separating axis theorem (SAT) and also shows that the PD can be trivially obtained as a byproduct of SAT. We have implemented these algorithms on a commodity PC with eight cores and benchmarked their performance on complicated geometric models. In practice, the performance of our algorithm shows up to 5 and 9.7 times improvement for collision and distance queries, respectively, compared to single core computation. Copyright (C) 2010 John Wiley & Sons, Ltd.
Column scan, or predicate evaluation and filtering over a column of data in a database table, is an important primitive for data mining and data warehousing. In this paper, we present our study on accelerating column ...
详细信息
ISBN:
(纸本)9783642142451
Column scan, or predicate evaluation and filtering over a column of data in a database table, is an important primitive for data mining and data warehousing. In this paper, we present our study on accelerating column scan using a massively parallel accelerator. With a design that takes full advantage of the characteristics of the memory hierarchy and parallel execution in such processors, we have achieved very attractive speedup performance that significantly exceeds previously reported results, making the use of such an accelerator for this type of primitives much more viable. Running on a general purpose graphic processor unit (GPGPU), NVidia GTX 280 GPU, the GPU version is about 5-6 times faster than an implementation on an eight-core CPU, or over 40 times faster than that on a single-core CPU.
In this paper, a Novel parallel Quantum Genetic algorithm (NPQGA) is proposed for the stochastic job Shop Scheduling Problem with the objective of minimizing the expected value of makespan, where the processing times ...
详细信息
In this paper, a Novel parallel Quantum Genetic algorithm (NPQGA) is proposed for the stochastic job Shop Scheduling Problem with the objective of minimizing the expected value of makespan, where the processing times are subjected to independent normal distributions. Based on the parallel evolutionary idea and some concepts of quantum theory, we Simulate a model of parallel quantum computation. In this frame, there are some demes (sub-populations) and some universes (groups of populations), which are structured in super star-shaped topologies. A new migration scheme based on penetration theory is developed to control migration rate and direction adaptively between demes, and a novel quantum crossover strategy is devised among universes. The quantum evolution is executed in every deme by applying some improvement operators (the coding mechanism aiming at job shop, the new quantum rotation angle and the catastrophe operator). Experiment results show NPQGA's effectiveness and applicability. (c) 2009 Elsevier Inc. All rights reserved.
A group of Saul'yev asymmetric difference schemes based on follow-flow scheme to approach KdV equation is given here. Using these new asymmetric difference schemes, we construct the parallel alternating group sche...
详细信息
ISBN:
(纸本)9781424472352
A group of Saul'yev asymmetric difference schemes based on follow-flow scheme to approach KdV equation is given here. Using these new asymmetric difference schemes, we construct the parallel alternating group schemes based on follow-flow scheme, which are absolutely stable by analysis of linearization procedure. Numerical experiments for the cases of single soliton solution and double soliton solution are performed, which show that the accuracy and stability of the schemes are better than parallel algorithms existed.
Finding a vast array of applications, the problem of computing the convex hull of a set of sorted points in the plane is one of the fundamental tasks in pattern recognition, morphology and image processing. The main c...
详细信息
ISBN:
(纸本)9781424452910
Finding a vast array of applications, the problem of computing the convex hull of a set of sorted points in the plane is one of the fundamental tasks in pattern recognition, morphology and image processing. The main contribution of this paper is to show a simple parallel algorithm for computing the convex hull of a set of n sorted points in the plane and evaluate the performance on the dual quad-core processors. The experimental results show that, our implementation achieves a speed-up factor of approximately 7 using 8 processors. Since the speed-up factor of more than 8 is not possible, our parallel implementation for computing the convex hull is close to optimal. Also, for 2 or 4 processors, we achieved a super linear speed up.
This paper presents the huge but useful formulation of expected security cost optimal power flow (ESC-OPF). Corrector equation, which has special structure, of interior-point method (IPM) for ESC-OPF has been given. A...
详细信息
ISBN:
(纸本)9780769536880
This paper presents the huge but useful formulation of expected security cost optimal power flow (ESC-OPF). Corrector equation, which has special structure, of interior-point method (IPM) for ESC-OPF has been given. A novel parallel IPM based on Multiple Centrality Weighted Correctors (MCWC) has been proposed to solve the formulation of this ESC-OPF model. In our novel algorithm, with the cooperation of MCWC and parallel solver for corrector equation, iterative times can be reduced and CPU time can be shut down because of the good speedup.
A simple yet common scheduling problem is identified, as a special case of the R parallel to C-max problem. We name it Linear Makespan Minimization on Unrelated parallel Machines (LMMUPM). A novel algorithm, MOBSA (Mu...
详细信息
ISBN:
(纸本)9781424449217
A simple yet common scheduling problem is identified, as a special case of the R parallel to C-max problem. We name it Linear Makespan Minimization on Unrelated parallel Machines (LMMUPM). A novel algorithm, MOBSA (Multi-Objective Based Scheduling algorithm), is presented to solve it. Two auxiliary problems are introduced as the basis of our algorithm. The first one can be reduced to a Multi-Objective Integer Program, while the second is constructed based on the solution of the first one. Results on random datasets revealed that MOBSA produced smaller and more stable makespans than other scheduling algorithms. Additionally, the makespan produced by MOBSA was within 1% of the optimum for every case. Presently, MOBSA has been applied to parallelize EMAN, one of the most popular software packages for cryo-electron microscopy single particle reconstruction. High speedups and ideal load balancing have been obtained. It is expected that MOBSA is also applicable to other similar applications.
The paper introduced recursive algorithm of fractal graphics,put forward fractal graphics parallel algorithm. Analyzing recursive algorithmic time complexity and speedup rate of the parallel *** experimental results o...
详细信息
The paper introduced recursive algorithm of fractal graphics,put forward fractal graphics parallel algorithm. Analyzing recursive algorithmic time complexity and speedup rate of the parallel *** experimental results of PC cluster show that the theoretical analysis and the experimental results of fractal graphics parallel algorithm are consistency with a marked speedup rate.
Generally, Hardware/Software (HW/SW) partitioning can be approximately resolved through some kinds of optimal algorithms. Based oil both characteristics of HW/SW partitioning and Particle Swarm Optimization (PSO) algo...
详细信息
ISBN:
(纸本)9783642030949
Generally, Hardware/Software (HW/SW) partitioning can be approximately resolved through some kinds of optimal algorithms. Based oil both characteristics of HW/SW partitioning and Particle Swarm Optimization (PSO) algorithm, a novel parallel FlW/SW partitioning method is proposed in this paper. A model of parallel HW/SW partitioning on the basis of PSO algorithm is established after analyzing the particularity of HW/SW partitioning. A hybrid strategy of PSO and Tabu Search (TS) is proposed in this paper, which uses the intrinsic parallelism of PSO and the memory function of TS to speed tip and improve the performance of PSO. To settle the problem of premature convergence, the reproduction and crossover operation of genetic algorithm (GA) is also introduced into procedure of PSO. Experimental results indicate that the parallel PSO algorithm can efficiently reduce the running time even for large task graphs.
暂无评论