the three-dimensional variational assimilation (3D-Var) is the most commonly used technique currently to generate an analysis that provides better consistent initial conditions for numerical weather prediction (NWP). ...
详细信息
ISBN:
(纸本)9783540681052
the three-dimensional variational assimilation (3D-Var) is the most commonly used technique currently to generate an analysis that provides better consistent initial conditions for numerical weather prediction (NWP). the Global and Regional Assimilation Prediction System (GRAPES) is a new generation NWP system in China, in which 3D-Var is one of the main components and plays an important role in direct assimilation for non-conventional observations. In this study, the principal theory and serial implementation of GRAPES 3D-Var are introduced firstly, and the details of distributed parallel computing algorithm of GRAPES 3D-Var are discussed, including data partitioning strategies, data communication strategies and stagger parallelization strategies. At last, some parallel experimental results on 16-CPU cluster platform are put forward, and the numerical simulations of the parallelization show that the parallel strategies can be combined to achieve considerable load balancing and good performance.
In this paper we study multi-installment divisible load processing in a heterogeneous distributed system with limited memory. Divisible load model applies to computations which can be arbitrarily divided into parts an...
详细信息
ISBN:
(纸本)9783540681052
In this paper we study multi-installment divisible load processing in a heterogeneous distributed system with limited memory. Divisible load model applies to computations which can be arbitrarily divided into parts and performed independently in parallel. the initial waiting for the load may be shortened by sending many small chunks of load instead of one huge. the load chunk sizes must be adjusted to the speeds of communication, computation, and memory sizes, such that the whole processing time is as short as possible. We propose a new realistic model of memory management, and formulate it as mixed quadratic programming problem which is solved by branch and bound algorithm. Since this problem is computationally hard we. propose heuristics, and analyze their performance in a series of computational experiments.
In this paper we present two parallel routines for the LU factorization of band matrices arising in model reduction problems that target SMP architectures. the special properties of these problems often allows the eli...
详细信息
ISBN:
(纸本)9783540681052
In this paper we present two parallel routines for the LU factorization of band matrices arising in model reduction problems that target SMP architectures. the special properties of these problems often allows the elimination of pivoting during the factorization, and results in a higher efficiency of the parallel routines. Also, the routines aggregate operations during the iteration, exposing a coarser-grain parallelism than their LAPACK counterpart. Experimental results on two different parallel platforms show the benefits of the new approach.
Sparse matrix-vector multiplication (SpMV) is the most important kernel in parallel iterative method for solving modified equation in large scale power system power flow calculation. In this paper, one improved compre...
详细信息
the paper considers the problem of determining optimal sensors locations so as to estimate unknown parameters in a class of distributed parameter systems when the measurement errors are correlated. Given a finite set ...
详细信息
ISBN:
(纸本)9783540681052
the paper considers the problem of determining optimal sensors locations so as to estimate unknown parameters in a class of distributed parameter systems when the measurement errors are correlated. Given a finite set of possible sensor positions, the problem is formulated as the selection of the gaged sites so as to maximize the log-determinant of the Fisher information matrix associated withthe estimated parameters. the search for the optimal solution is performed using a GRASP method combined with a multipoint exchange algorithm. In order to alleviate the problem of excessive computational costs for large-scale problems, a parallel version of the GRASP solver is developed aimed at computations on a Linux cluster of PCs. the resulting numerical scheme is validated on a simulation example.
In order to reduce the computing time for processing large tree-structured data sets, parallelprocessing has been used. Recently, research has been done on parallel computing of tree-structured data on Graphics Proce...
详细信息
ISBN:
(纸本)9781509035199
In order to reduce the computing time for processing large tree-structured data sets, parallelprocessing has been used. Recently, research has been done on parallel computing of tree-structured data on Graphics processing Units (GPUs). GPU device cannot directly access the tree structured data on hard disks which is commonly stored as objects or linked-lists. So, it is required to copying this tree structured data from hard disk to device memory for the computation and copying tree structured data in its normal structure is very costly because of lots of pointers overhead. Existing tree data structures on GPUs are commonly applied to storing a particular kind of tree, and support limited types of tree traversals. In this work, a tree data structure is proposed to store different kind of trees as a linear data structure (fast in copying). the proposed data structure is applied on general trees and binary trees and supports four common types of tree traversals: pre-order, post-order, in-order and breadth-first traversals. therefore, most of the tree algorithms can be implemented on GPUs by using this proposed data structure. the results show that the proposed data structure is successfully implemented for all the traversals for binary as well as general trees.
the paper presents theoretical evaluation and numerical measurements of a performance of a new parallel direct solver implemented for hp Finite Element Method (FEM). the solver utilizes the substructuring method over ...
详细信息
ISBN:
(纸本)9783540681052
the paper presents theoretical evaluation and numerical measurements of a performance of a new parallel direct solver implemented for hp Finite Element Method (FEM). the solver utilizes the substructuring method over the non-overlapping sub-domains, which consists in elimination of the sub-domains internal d.o.f. with respect to the interface d.o.f., then solving the interface problem, finally solving back the internal problems by backward substitution on each subdomain. the interface problem is solved by recursive execution of the direct substructuring method on the tree of separators associated withthe subdomains on which the Schur complement, approach was applied. We show that the efficiency of the solver is growing when the accuracy of the FEM solution is increased by performing hp refinements on the computational mesh. the h refinements consists in breaking some finite elements into smaller son elements, the p refinements consists in increasing the polynomial order of approximation on some finite elements edges, faces and interiors.
As increasing clock frequency approaches its physical limits, a good approach to enhance performance is to increase parallelism by integrating more cores as coprocessors to general-purpose processors in order to handl...
详细信息
ISBN:
(纸本)9783540681052
As increasing clock frequency approaches its physical limits, a good approach to enhance performance is to increase parallelism by integrating more cores as coprocessors to general-purpose processors in order to handle the different workloads of scientific and signal processing applications. Many kernels in these applications lend themselves to the data-parallel architectures such as array processors. the basic linear algebra subroutines (BLAS) are standard operations to efficiently solve the linear algebra problems on high performance and parallel systems. In this paper, we implement and evaluate the performance of some important BLAS operations on a matrix coprocessor. Our analytical model shows the performance of the Level-3 BLAS represented by the n x n matrix multiply-add operation approaches the theoretical peak as n increases since the degree of data reuse is high. However, the performance of Level-1 and Level-2 BLAS operations is low as a result of low data reuse. Fortunately, many applications are based on intensive use of Level-3 BLAS with small percentage of Level-1 and Level-2 BLAS.
In this paper, a numerical solution of the theodorsen integral equation is studied. Using an adequate quadrature formula which eliminates the singularity of the integral part of the theodorsen equation, we obtain a sy...
详细信息
In this paper, a numerical solution of the theodorsen integral equation is studied. Using an adequate quadrature formula which eliminates the singularity of the integral part of the theodorsen equation, we obtain a system of nonlinear algebraical equations. this system may be served using a Jacobi type method and the procedure can be easily implemented using a programming language withparallel facilities. Examples are given using ADA, EVAL, and PARALLAXIS. A convergence result is established. (C) 1999 Elsevier Science Ltd. All rights reserved.
We propose a simple yet effective approach to learning bilingual word embeddings (BWEs) from non-parallel document-aligned data (based on the omnipresent skip-gram model), and its application to bilingual lexicon indu...
详细信息
ISBN:
(纸本)9781941643730
We propose a simple yet effective approach to learning bilingual word embeddings (BWEs) from non-parallel document-aligned data (based on the omnipresent skip-gram model), and its application to bilingual lexicon induction (BLI). We demonstrate the utility of the induced BWEs in the BLI task by reporting on benchmarking BLI datasets for three language pairs: (1) We show that our BWE-based BLI models significantly outperform the MuPTM-based and context-counting models in this setting, and obtain the best reported BLI results for all three tested language pairs;(2) We also show that our BWE-based BLI models outperform other BLI models based on recently proposed BWEs that require parallel data for bilingual training.
暂无评论