This paper presents a parallel algorithm for constructing Voronoi diagrams based on point-set adaptive grouping. The binary tree splitting method is used to adaptively group the point set in the plane and construct su...
详细信息
This paper presents a parallel algorithm for constructing Voronoi diagrams based on point-set adaptive grouping. The binary tree splitting method is used to adaptively group the point set in the plane and construct sub-Voronoi diagrams for each group. Given that the construction of Voronoi diagrams in each group consumes the majority of time and that construction within one group does not affect that in other groups, the use of a parallel algorithm is *** constructing the sub-Voronoi diagrams, we extracted the boundary points of the four sides of each sub-group and used to construct boundary site Voronoi diagrams. Finally, the sub-Voronoi diagrams containing each boundary point are merged with the corresponding boundary site Voronoi diagrams. This produces the desired Voronoi diagram. Experiments demonstrate the efficiency of this parallel algorithm, and its time complexity is calculated as a function of the size of the point set, the number of processors, the average number of points in each block, and the number of boundary points. Copyright (c) 2013 John Wiley & Sons, Ltd.
A parallel method for globally minimizing a linear program with an additional reverse convex constraint is proposed which combines the outer approximation technique and the cutting plane method. Basically p (less than...
详细信息
A parallel method for globally minimizing a linear program with an additional reverse convex constraint is proposed which combines the outer approximation technique and the cutting plane method. Basically p (less than or equal to n) processors are used for a problem with a variables and a globally optimal solution is found effectively in a finite number of steps. Computational results are presented for test problems with a number of variables up to 80 and 63 linear constraints (plus nonnegativity constraints). These results were obtained on a distributed-memory MIMD parallel computer, DELTA, by running both serial and parallel algorithms with double precision. Also, based on 40 randomly generated problems of the same size, with 16 variables and 32 linear constraints (plus x greater than or equal to 0), the numerical results from different number processors are reported, including the serial algorithm's. (C) 1997 Academic Press.
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. A layered PMC is a PMC in which all input nodes are in the external face, and the gates c...
详细信息
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. A layered PMC is a PMC in which all input nodes are in the external face, and the gates can be assigned to layers in such a way that every wire goes between gates in successive layers. Goldschlager, Cook and Dymond, and others have developed NC2 algorithms to evaluate a layered PMC when the output node is in the same face as the input nodes. These algorithms require a large number of processors (Omega(n(6)), where n is the size of the input circuit). In this paper we give an efficient parallel algorithm that evaluates a layered PMC of size n in O (log(2)n) time using only a linear number of processors on an EREW PRAM. Our parallel algorithm is the best possible to within a polylog factor, and is a substantial improvement over the earlier algorithms for the problem.
In this paper there is considered a flexible job shop problem of operations scheduling. The new, very fast method of determination of cycle time is presented. In the design of heuristic algorithm there was the neighbo...
详细信息
In this paper there is considered a flexible job shop problem of operations scheduling. The new, very fast method of determination of cycle time is presented. In the design of heuristic algorithm there was the neighborhood inspired by the game of golf applied. Lower bound of the criterion function was used in the search of the neighborhood.
Slicing of concurrent programs is a compute-intensive task. To speed up the slicing process, we have developed a parallel algorithm. For this purpose we used the concurrent control flow graph (CCFG) as the intermediat...
详细信息
Slicing of concurrent programs is a compute-intensive task. To speed up the slicing process, we have developed a parallel algorithm. For this purpose we used the concurrent control flow graph (CCFG) as the intermediate representation. We used a network of communicating processes to develop our parallel algorithm. We have implemented our parallel algorithm and the experimental results appear promising. Copyright (C) 2004 John Wiley Sons, Ltd.
parallel algorithms for accurate summation and dot product are proposed, They are parallelized versions of fast and accurate algorithms of calculating sum and dot product using error-free transformations which are rec...
详细信息
parallel algorithms for accurate summation and dot product are proposed, They are parallelized versions of fast and accurate algorithms of calculating sum and dot product using error-free transformations which are recently proposed by Ogita et al. [T. Ogita, S.M. Rump, S. Oishi, Accurate sum and dot product, SIAM J. Sci. Comput. 26 (6) (2005) 1955-1988]. They have shown their algorithms are fast in terms of measured computing time. However, due to the strong data dependence in the process of their algorithms, it is difficult to parallelize them. Similarly to their algorithms, the proposed parallel algorithms in this paper are designed to achieve the results as if computed in K-fold working precision with keeping the fastness of their algorithms. Numerical results are presented showing the performance of the proposed parallel algorithm of calculating dot product. (C) 2008 Elsevier B.V. All rights reserved.
Data flow acyclic directed graphs (digraph) are widely used to describe the data dependency of mesh-based scientific computing. The parallel execution of such digraphs can approximately depict the flowchart of paralle...
详细信息
Data flow acyclic directed graphs (digraph) are widely used to describe the data dependency of mesh-based scientific computing. The parallel execution of such digraphs can approximately depict the flowchart of parallel computing. During the period of parallel execution, vertex priorities are key performance factors. This paper firstly takes the distributed digraph and its resource-constrained parallel scheduling as the vertex priorities model, and then presents a new parallel algorithm for the solution of vertex priorities using the well-known technique of forward-backward iterations. Especially, in each iteration, a more efficient vertex ranking strategy is proposed. In the case of simple digraphs, both theoretical analysis and benchmarks show that the vertex priorities produced by such an algorithm will make the digraph scheduling time converge non-increasingly with the number of iterations. In other cases of non-simple digraphs, benchmarks also show that the new algorithm is superior to many traditional approaches. Embedding the new algorithm into the heuristic framework for the parallel sweeping solution of neutron transport applications, the new vertex priorities improve the performance by 20 % or so while the number of processors scales up from 32 to 2048.
A new parallel algorithm has been developed for second-order Moller-Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of ...
详细信息
A new parallel algorithm has been developed for second-order Moller-Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of dispersion interaction. Tests on a moderate number of processors (2-16) show that the program has high CPU and parallel efficiency. Timings are presented for two relatively large molecules, taxol (C47H51NO14) and luciferin (C11H8N2O3S2), the former with the 6-31G* and 6-311G** basis sets (1032 and 1484 basis functions, 164 correlated orbitals), and the latter with the aug-cc-pVDZ and aug-cc-pVTZ basis sets (530 and 1198 basis functions, 46 correlated orbitals). An MP2 energy calculation on C130H10 (1970 basis functions, 265 con-elated orbitals) completed in less than 2 h on 128 processors. (c) 2006 Wiley Periodicals, Inc.
Given a natural number e, an addition chain for e is a finite sequence of numbers having the following properties: (1) the first number is one, (2) every element is the sum of two earlier elements, and (3) the given n...
详细信息
Given a natural number e, an addition chain for e is a finite sequence of numbers having the following properties: (1) the first number is one, (2) every element is the sum of two earlier elements, and (3) the given number occurs at the end of the sequence. We introduce a fast optimal algorithm to generate a chain of short length for the number e of n-bits. The algorithm is based on the right-left binary strategy and barrel shifter circuit. The algorithm uses processors and runs in time under exclusive read exclusive write parallel random access machine.
Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, coll...
详细信息
Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation. (C) 2017 COSPAR. Published by Elsevier Ltd. All rights reserved.
暂无评论