In this paper, we develop a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of graphic processing units (GPUs). In the construction of the hierarchical coarse grid, we use a simple a...
详细信息
In this paper, we develop a new parallel auxiliary grid algebraic multigrid (AMG) method to leverage the power of graphic processing units (GPUs). In the construction of the hierarchical coarse grid, we use a simple and fixed coarsening procedure based on a region quadtree generated from an auxiliary grid. This allows us to explicitly control the sparsity patterns and operator complexities of the AMG solver. This feature provides (nearly) optimal load balancing and predictable communication patterns on shape regular grids, which makes our new algorithm suitable for parallel computing, especially on GPUs. We also design a parallel smoother based on the special coloring of the quadtree to accelerate the convergence rate and improve the parallel performance of this solver. Based on the CUDA toolkit [NVIDIA CUDA Programming Guide, NVIDIA Corp., 2010], we implemented our new parallel auxiliary grid AMG method on GPUs and the numerical results of this implementation demonstrate the efficiency of our new method for (nearly) isotropic problems. The results achieve an average speedup of over 4 on quasi-uniform grids and 2 on shape regular grids when compared to the AMG implementation in CUSP [M. Garland and N. Bell, CUSP: Generic parallel algorithms for Sparse Matrix and Graph Computations, http://***/(2010)].
Computation time and memory requirements are two common problems for magnetotelluric (MT) modeling of three-dimensional conductivity structure. We develop a new parallel processing scheme that can efficiently improve ...
详细信息
Computation time and memory requirements are two common problems for magnetotelluric (MT) modeling of three-dimensional conductivity structure. We develop a new parallel processing scheme that can efficiently improve the computational speed of 3D MT modeling. The scheme of 3D MT modeling based on the staggered-grid finite difference method is implemented in the frequency domain, and the calculation process of the EM field for each frequency is independent. Therefore, considering the naturally parallelizable character, the whole computation task of all frequencies can be divided into many minor calculation tasks for single or multiple frequencies, which will be assigned to different computing nodes and calculated in a parallel manner. In this work, by adopting master-slave parallel mode and parallel computation with frequencies scheme, we have implemented the parallel computation of 3D MT modeling using MPI on the Dawn TC5000A high-performance parallel platform. Furthermore, we tested our parallel algorithm of 3D MT modeling using two 3D theoretical models and analyzed the calculation efficiency on a multiple-nodes computer, and the results show that the parallel algorithm is effective and efficient, which lays a solid foundation for subsequent three-dimensional parallel MT inversion.
As the web application is world wide used,system's performance,especially reliability,becomes more *** performance testing tools such as QA Load and LoadRunner will generate the stress data with the fixed *** in t...
详细信息
As the web application is world wide used,system's performance,especially reliability,becomes more *** performance testing tools such as QA Load and LoadRunner will generate the stress data with the fixed *** in the real time,network traffic is *** focus on generating test data to simulate network traffic accurately for web application reliability *** statistical results of network traffic show that the property of the self-similarity is ubiquitous in web *** generating self-similar network traffic is *** nowadays,there is a bottleneck in generating network traffic by single *** need a parallel method to solve this *** this paper we propose a distributed system based on a parallel algorithm to generate self-similar traffic using the Fraction Gaussian Noise(FGN)*** experiment results show that the network traffic generated by the distributed system has self-similar property.
In this paper, by the novel idea of integrating multiple-proposal algorithm and multiple-chain algorithm by parallel computing, we develop a highly efficient sampler for approximating statistical distributions: parall...
详细信息
In this paper, by the novel idea of integrating multiple-proposal algorithm and multiple-chain algorithm by parallel computing, we develop a highly efficient sampler for approximating statistical distributions: parallel Multi-proposal and Multi-chain Markov Chain Monte Carlo (pMPMC3), and we illustrate the high performance of this sampler by calculating P-value (odds ratio significance) for Genome Wide Association Study (GWAS). Computational results show that, by setting the convergence condition as the standard deviation of P-value is less than 10(-3), pMPMC3 with 4 proposals and 4 chains obtains a convergent P-value within 10(6) iterations, while the conventional method Monte Carlo simulation does not obtain convergent P-values even in 10(7) iterations. We also test pMPMC3 by changing the number of chains, the number of proposals and the size of the dataset on a cluster with maximum 600 processes, the algorithm scales well.
In this paper,a dynamic delivery and pick-up vehicle routing problem(DVRP) with ready-time and deadline of customer goods is *** using the rolling horizon approach,the DVRP is modeled and *** each decision epoch, the ...
详细信息
ISBN:
(纸本)9781479900305
In this paper,a dynamic delivery and pick-up vehicle routing problem(DVRP) with ready-time and deadline of customer goods is *** using the rolling horizon approach,the DVRP is modeled and *** each decision epoch, the open vehicle routing problem with multiple depots is *** on the adaptive memory programming,a master-slave parallel tabu search algorithm is developed,and an insertion procedure is also suggested for the real-time urgent orders. Computational experiment reveals that the parallel tabu search algorithm is of high practical value for solving the dynamic vehicle routing problem.
Texture mapping is an important part of the realistic graphics rendering process. In this paper we parallelize the algorithm for traditional texture mapping to improve the running speed of the program. The experimenta...
详细信息
Texture mapping is an important part of the realistic graphics rendering process. In this paper we parallelize the algorithm for traditional texture mapping to improve the running speed of the program. The experimental results show that, the speed-up of the program can reach to 1.92 in average.
The purpose of this paper is to validate a new highly parallelizable direction splitting algorithm. The parallelization capabilities of this algorithm are illustrated by providing a highly accurate solution for the st...
详细信息
The purpose of this paper is to validate a new highly parallelizable direction splitting algorithm. The parallelization capabilities of this algorithm are illustrated by providing a highly accurate solution for the start-up flow in a three-dimensional impulsively started lid-driven cavity of aspect ratio 1x1x2 at Reynolds numbers 1000 and 5000. The computations are done in parallel (up to 1024 processors) on adapted grids of up to 2 billion nodes in three space dimensions. Velocity profiles are given at dimensionless times t=4, 8, and 12;at least four digits are expected to be correct at Re=1000. Copyright (C) 2011 John Wiley & Sons, Ltd.
Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architecture...
详细信息
Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes;in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers.
The use of parallelism may overcome some of the constraints imposed by single processor computing systems. Besides offering faster solutions, applications that are parallelized can solve bigger or more complex problem...
详细信息
The use of parallelism may overcome some of the constraints imposed by single processor computing systems. Besides offering faster solutions, applications that are parallelized can solve bigger or more complex problems. For instance, simulations can be run at finer resolutions while physical phenomena can be potentially modeled more realistically. We describe in this paper the development of a bio-inspired parallel algorithm used in the three-dimensional simulation of multicellular tissue growth. We report on the different components of the model where cellular automata is used to model different types of cell populations that execute persistent random walks on the computational grid, collide, and proliferate until they reach confluence. We also discuss the main issues encountered in the parallelization of the model and its implementation on a parallel machine.
暂无评论