The characteristics of modern graphics processing unit (GPU) is programmable, high price / performance ratio and high speed. It has a strong ability to adapt the parallel calculation, Based on this, the article study ...
详细信息
ISBN:
(纸本)9780769539010
The characteristics of modern graphics processing unit (GPU) is programmable, high price / performance ratio and high speed. It has a strong ability to adapt the parallel calculation, Based on this, the article study the general method of GPU calculating and use compute unified device architecture (CUDA) to design new parallel algorithm to accelerate the matrix inversion and Binarization algorithm. The results show that with the increase of matrix dimension, CPU performs much better than CPU in increase multiple.
We propose a parallel Mean Shift (MS) tracking algorithm on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Traditional MS algorithm uses a large number of color histogram, say typical...
详细信息
ISBN:
(纸本)9783642021718
We propose a parallel Mean Shift (MS) tracking algorithm on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Traditional MS algorithm uses a large number of color histogram, say typically 16x16x16, which makes parallel implementation infeasible. We thus employ K-Means clustering to partition the object color space that enables us to represent color distribution with a quite small number of bins. Based on this compact histogram, all key components of the MS algorithm are mapped onto the GPU. The resultant parallel algorithm consist of six kernel functions, which involves primarily the parallel computation of the candidate histogram and calculation of the Mean Shift vector. Experiments on public available CAVIAR videos show that the proposed parallel tracking algorithm achieves large speedup and has comparable tracking performance, compared with the traditional serial MS tracking algorithm.
This paper discusses the key issues about multi-train movement simulation on electrified railways. A simulation system has been developed for computational simulation and scheme evaluation. Computing traction power ne...
详细信息
ISBN:
(纸本)9780769536347
This paper discusses the key issues about multi-train movement simulation on electrified railways. A simulation system has been developed for computational simulation and scheme evaluation. Computing traction power network equation is a CPU-intensive work. The Intel (R) architecture processors (Pentium III, Pentium4) introduces Streaming SIMD Extensions (SSE), that is an efficient execution model to accelerate of applications on a single processor. As an emphasis, the paper discusses the methods to accelerate the calculation of traction power supply. Main procedures have been redesigned by exploiting SIMD parallelism in PC platform. Some approaches about optimizing the parallel algorithm are adopted during their coding procedure such as unrolling the loop and address computation. Experimental results show that the performance of the optimized algorithm has been improved significantly. The biggest speedup ratio obtained has reached 3.35. Compared with convention algorithm, SSE algorithm enhances performance of train movement simulation more than one time.
Based on the method of the sub-structure and a parallel algorithm, the parallel processing is made on spline finite element analysis for plate bending problems.
ISBN:
(纸本)9781846260650
Based on the method of the sub-structure and a parallel algorithm, the parallel processing is made on spline finite element analysis for plate bending problems.
Computationally efficient parallel algorithms for downdating the least squares estimator of the ordinary linear regression are proposed. The algorithms, which are based on the QR decomposition, are block versions of s...
详细信息
Computationally efficient parallel algorithms for downdating the least squares estimator of the ordinary linear regression are proposed. The algorithms, which are based on the QR decomposition, are block versions of sequential Givens strategies and efficiently exploit the triangular structure of the data matrices. The first strategy utilizes only part of the orthogonal matrix which is derived from the QR decomposition of the initial data matrix. The rest of the orthogonal matrix is not updated or explicitly computed. A modification of the parallel algorithm, which explicitly computes the whole orthogonal matrix in the downdated QR decomposition, is also considered. An efficient distribution of the matrices over the processors is proposed. Furthermore, the new algorithms do not require any inter-processor communication. The theoretical complexities are derived and experimental results are presented and analyzed. The parallel strategies are scalable and highly efficient for large scale downdating least squares problems. A new parallel block-hyperbolic downdating strategy is developed. The algorithm is rich in BLAS-3 computations, involves negligible duplicated computations and requires insignificant inter-processor communication. It is found to outperform the previous downdating strategies and to be highly efficient for large scale problems. The experimental results confirm the derived theoretical complexities. (C) 2008 Elsevier B.V. All rights reserved.
A parallel Direct Simulation Monte Carlo (DSMC) algorithm to solve a spatially inhomogeneous nonlinear equation of coagulation is presented. The algorithm is based on simulating the evolution of stochastic test partic...
详细信息
ISBN:
(纸本)9783642032745
A parallel Direct Simulation Monte Carlo (DSMC) algorithm to solve a spatially inhomogeneous nonlinear equation of coagulation is presented. The algorithm is based on simulating the evolution of stochastic test particles ensembles. The algorithm can be effectively implemented on parallel computers of different architectures including GRID infrastructure based on MPLS networks. A problem of minimizing the computational cost of the algorithm is considered. To implement the algorithm on GRID infrastructure we carry out preliminary simulation of an underlying network. Such simulation enables to define minimal network bandwidth necessary for efficient parallel decomposition of DSMC algorithm.
In this paper, we present an efficient scheme for the parallel solution of large-scale three-dimensional groundwater flow equation. The scheme has been implemented and the program is parallelized by using a SPMD (sing...
详细信息
ISBN:
(纸本)9781424450756
In this paper, we present an efficient scheme for the parallel solution of large-scale three-dimensional groundwater flow equation. The scheme has been implemented and the program is parallelized by using a SPMD (single-program, multiple data) paradigm and a domain decomposition strategy provided in PETSc (Portable Extensible Toolkit for Scientific Computation). The efficiency and scalability of the parallel-computing scheme are demonstrated by completing a simplified groundwater scenario, in which different preconditioned Krylov subspace methods have been tested and analyzed. The numerical results show that the parallel method could enhance the modeling capabilities and accelerate the simulation process significantly.
Buffer analysis is one of the most important GIS functions to take spatial analysis. Based on our experience on EO data parallel processing, we take buffer generation as an example to research how to accelerate the sp...
详细信息
ISBN:
(纸本)9781424433940
Buffer analysis is one of the most important GIS functions to take spatial analysis. Based on our experience on EO data parallel processing, we take buffer generation as an example to research how to accelerate the speed and scale of GIS operations. This paper analyzes the procedure of the original sequential algorithm, and then gives out the mechanism of the parallelization. As a result, it will be easily assembled into OGC WPS container on a Grid environment as both standard web-service and enhanced WPS service Some interested examinations also will be involved in this work to prove such technical approach.
The Error correction SVM method is an excellent multiclass classification approach and hits been applied to face recognition successfully, Yet, it suffers from the computational complexity. To reduce the computation t...
详细信息
ISBN:
(纸本)9783642015090
The Error correction SVM method is an excellent multiclass classification approach and hits been applied to face recognition successfully, Yet, it suffers from the computational complexity. To reduce the computation time of the algorithm, a parallel implementation scheme is presented in the paper in which the training and classification tasks are assigned to multiple processors and run on all the processors simultaneously. The simulation experiments conducted on it local area network using Cambridge ORL face database show that the parallel algorithm given in the paper is effective in speeding up the algorithms, of the training and classification while maintaining the recognition accuracy unchanged.
With the object of realizing the direct solution to the triangular system, and in accordance with the associated chain for first order parallel linear recurrence method, this paper derives the associated chain for 8 o...
详细信息
ISBN:
(纸本)9781424447053
With the object of realizing the direct solution to the triangular system, and in accordance with the associated chain for first order parallel linear recurrence method, this paper derives the associated chain for 8 order triangular system, then constructs the relative associated chain for common system of equations based on the parity dichotomy method and designs the correlative parallel algorithm. In regards to the designed parallel algorithm, it is analyzed in two aspects, which are the theoretical derivation of the parallel speed-up ratio and efficiency for the algorithm and then the verification for it through the number experiment. By the analysis, we can see the algorithm has the characters of less communication time and better calculating efficiency, which testifies that the parallel algorithm is practically effective in solving the triangular system..
暂无评论