检索结果-内蒙古大学图书馆

parallel computing of GRAPES 3D-variational data assimilation system

7th international conference on parallel processing and applied mathematics

作者： Zhu, Xiaoqian Zhang, Weimin Song, Junqiang Natl Lab Parallel & Distributed Proc Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9783540681052

the three-dimensional variational assimilation (3D-Var) is the most commonly used technique currently to generate an analysis that provides better consistent initial conditions for numerical weather prediction (NWP). the Global and Regional Assimilation Prediction System (GRAPES) is a new generation NWP system in China, in which 3D-Var is one of the main components and plays an important role in direct assimilation for non-conventional observations. In this study, the principal theory and serial implementation of GRAPES 3D-Var are introduced firstly, and the details of distributed parallel computing algorithm of GRAPES 3D-Var are discussed, including data partitioning strategies, data communication strategies and stagger parallelization strategies. At last, some parallel experimental results on 16-CPU cluster platform are put forward, and the numerical simulations of the parallelization show that the parallel strategies can be combined to achieve considerable load balancing and good performance.

关键词： variational data assimilation parallel computing

来源：评论

学校读者我要写书评

暂无评论

A new model of multi-installment divisible loads processing in systems with limited memory

引用

7th international conference on parallel processing and applied mathematics

作者： Drozdowski, Maciej Lawenda, Marcin Poznan Univ Tech Inst Comp Sci Piotrowo 2 PL-60965 Poznan Poland Poznari Supercomp & Networking Ctr PL-61704 Poznan Poland

ISBN: (纸本)9783540681052

In this paper we study multi-installment divisible load processing in a heterogeneous distributed system with limited memory. Divisible load model applies to computations which can be arbitrarily divided into parts and performed independently in parallel. the initial waiting for the load may be shortened by sending many small chunks of load instead of one huge. the load chunk sizes must be adjusted to the speeds of communication, computation, and memory sizes, such that the whole processing time is as short as possible. We propose a new realistic model of memory management, and formulate it as mixed quadratic programming problem which is solved by branch and bound algorithm. Since this problem is computationally hard we. propose heuristics, and analyze their performance in a series of computational experiments.

关键词： scheduling divisible loads multiple installments memory limitations

来源：评论

学校读者我要写书评

暂无评论

parallel solution of band linear systems in model reduction

引用

7th international conference on parallel processing and applied mathematics

作者： Remon, Alfredo Quintana-Orti, Enrique S. Quintana-Orti, Gregorio Univ Jaume 1 Dept Ingn & Ciencia Computadores Castellon de La Plana 12071 Spain

ISBN: (纸本)9783540681052

In this paper we present two parallel routines for the LU factorization of band matrices arising in model reduction problems that target SMP architectures. the special properties of these problems often allows the elimination of pivoting during the factorization, and results in a higher efficiency of the parallel routines. Also, the routines aggregate operations during the iteration, exposing a coarser-grain parallelism than their LAPACK counterpart. Experimental results on two different parallel platforms show the benefits of the new approach.

关键词： model reduction band linear systems LU factorization multithreaded BLAS symmetric multiprocessors (SMP)

来源：评论

学校读者我要写书评

暂无评论

An improved sparse matrix-vector multiplication kernel for solving modified equation in large scale power flow calculation on CUDA

An improved sparse matrix-vector multiplication kernel for s...

引用

2012 IEEE 7th international Power Electronics and Motion Control conference - ECCE Asia, IPEMC 2012

作者： Yang, Mei Sun, Cheng Li, Zhimin Cao, Dayong Department of Electrical Engineering Harbin Institute of Technology China Department of Applied Mathematics Harbin University of Science and Technology China

ISBN: (纸本)9781457720864

Sparse matrix-vector multiplication (SpMV) is the most important kernel in parallel iterative method for solving modified equation in large scale power system power flow calculation. In this paper, one improved compressed sparse row (ICSR) storage used to settle the problem of the global memory alignment in the vector kernel on Graphics processing Unit (GPU) is given. the experiments on matrices with different sizes demonstrate that the vector kernel with ICSR storage format could improve the performance by 5%-30% for SpMV comparing with vector kernel with CSR, especially for the large-scale unstructured sparse matrix-vector product, the effect is more obvious. © 2012 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

A parallel sensor selection technique for identification of distributed parameter systems subject to correlated observations

引用

7th international conference on parallel processing and applied mathematics

作者： Baranowski, Przemyslaw Ucinski, Dariusz Univ Zielona Gora Ctr Comp Ul Podgorna 50 PL-65246 Zielona Gora Poland Univ Zielona Gora Inst Control & Computat Engn PL-65246 Zielona Gora Poland

ISBN: (纸本)9783540681052

the paper considers the problem of determining optimal sensors locations so as to estimate unknown parameters in a class of distributed parameter systems when the measurement errors are correlated. Given a finite set of possible sensor positions, the problem is formulated as the selection of the gaged sites so as to maximize the log-determinant of the Fisher information matrix associated with the estimated parameters. the search for the optimal solution is performed using a GRASP method combined with a multipoint exchange algorithm. In order to alleviate the problem of excessive computational costs for large-scale problems, a parallel version of the GRASP solver is developed aimed at computations on a Linux cluster of PCs. the resulting numerical scheme is validated on a simulation example.

关键词： distributed parameter systems parameter estimation GRASP parallel computations

来源：评论

学校读者我要写书评

暂无评论

Tree Structured Data processing on GPUs 7

Tree Structured Data Processing on GPUs

引用

7th international conference on Cloud Computing, Data Science and Engineering (Confluence)

作者： Lu, Yifan Yang, Lu Bhavsar, Virendrakumar C. Kumar, Neetesh Univ New Brunswick Fredericton Dept Comp Sci Fredericton NB E3B 5A3 Canada Delhi Technol Univ Dept Comp Sci & Engn Delhi 110042 India

ISBN: (纸本)9781509035199

In order to reduce the computing time for processing large tree-structured data sets, parallel processing has been used. Recently, research has been done on parallel computing of tree-structured data on Graphics processing Units (GPUs). GPU device cannot directly access the tree structured data on hard disks which is commonly stored as objects or linked-lists. So, it is required to copying this tree structured data from hard disk to device memory for the computation and copying tree structured data in its normal structure is very costly because of lots of pointers overhead. Existing tree data structures on GPUs are commonly applied to storing a particular kind of tree, and support limited types of tree traversals. In this work, a tree data structure is proposed to store different kind of trees as a linear data structure (fast in copying). the proposed data structure is applied on general trees and binary trees and supports four common types of tree traversals: pre-order, post-order, in-order and breadth-first traversals. therefore, most of the tree algorithms can be implemented on GPUs by using this proposed data structure. the results show that the proposed data structure is successfully implemented for all the traversals for binary as well as general trees.

关键词： Tree traversals GPU binary tree general tree CUDA parallel processing

来源：评论

学校读者我要写书评

暂无评论

Performance of multi level parallel direct solver for hp Finite Element Method

引用

7th international conference on parallel processing and applied mathematics

作者： Paszynski, Maciej AGH Univ Sci & Technol Dept Comp Sci PL-30059 Krakow Poland

ISBN: (纸本)9783540681052

the paper presents theoretical evaluation and numerical measurements of a performance of a new parallel direct solver implemented for hp Finite Element Method (FEM). the solver utilizes the substructuring method over the non-overlapping sub-domains, which consists in elimination of the sub-domains internal d.o.f. with respect to the interface d.o.f., then solving the interface problem, finally solving back the internal problems by backward substitution on each subdomain. the interface problem is solved by recursive execution of the direct substructuring method on the tree of separators associated with the subdomains on which the Schur complement, approach was applied. We show that the efficiency of the solver is growing when the accuracy of the FEM solution is increased by performing hp refinements on the computational mesh. the h refinements consists in breaking some finite elements into smaller son elements, the p refinements consists in increasing the polynomial order of approximation on some finite elements edges, faces and interiors.

关键词： parallel direct solvers substructuring method Finite Element Method hp adaptivity

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of basic linear algebra subroutines on a matrix co-processor

引用

7th international conference on parallel processing and applied mathematics

作者： Zekri, Ahmed S. Sedukhin, Stanislav G. Univ Aizu Grad Sch Comp Sci & Engn Aizu Wakamatsu Fukushima 9658580 Japan

ISBN: (纸本)9783540681052

As increasing clock frequency approaches its physical limits, a good approach to enhance performance is to increase parallelism by integrating more cores as coprocessors to general-purpose processors in order to handle the different workloads of scientific and signal processing applications. Many kernels in these applications lend themselves to the data-parallel architectures such as array processors. the basic linear algebra subroutines (BLAS) are standard operations to efficiently solve the linear algebra problems on high performance and parallel systems. In this paper, we implement and evaluate the performance of some important BLAS operations on a matrix coprocessor. Our analytical model shows the performance of the Level-3 BLAS represented by the n x n matrix multiply-add operation approaches the theoretical peak as n increases since the degree of data reuse is high. However, the performance of Level-1 and Level-2 BLAS operations is low as a result of low data reuse. Fortunately, many applications are based on intensive use of Level-3 BLAS with small percentage of Level-1 and Level-2 BLAS.

关键词： Signal processing

来源：评论

学校读者我要写书评

暂无评论

the numerical solution of theodorsen integral equation

引用

COMPUTERS & mathematics WIth APPLICATIONS 1999年第9-10期38卷 221-231页

作者： Scheiber, E Transylvania Univ Brasov Dept Math Brasov 2200 Romania

In this paper, a numerical solution of the theodorsen integral equation is studied. Using an adequate quadrature formula which eliminates the singularity of the integral part of the theodorsen equation, we obtain a system of nonlinear algebraical equations. this system may be served using a Jacobi type method and the procedure can be easily implemented using a programming language with parallel facilities. Examples are given using ADA, EVAL, and PARALLAXIS. A convergence result is established. (C) 1999 Elsevier Science Ltd. All rights reserved.

关键词： theodorsen integral equation quadrature formula parallel method

来源：评论

学校读者我要写书评

暂无评论

Bilingual Word Embeddings from Non-parallel Document-Aligned Data applied to Bilingual Lexicon Induction 53

Bilingual Word Embeddings from Non-Parallel Document-Aligned...

引用

53rd Annual Meeting of the Association-for-Computational-Linguistics (ACS) / 7th international Joint conference on Natural Language processing of the Asian-Federation-of-Natural-Language-processing (IJCNLP)

作者： Vulic, Ivan Moens, Marie-Francine Katholieke Univ Leuven Dept Comp Sci Leuven Belgium

ISBN: (纸本)9781941643730

We propose a simple yet effective approach to learning bilingual word embeddings (BWEs) from non-parallel document-aligned data (based on the omnipresent skip-gram model), and its application to bilingual lexicon induction (BLI). We demonstrate the utility of the induced BWEs in the BLI task by reporting on benchmarking BLI datasets for three language pairs: (1) We show that our BWE-based BLI models significantly outperform the MuPTM-based and context-counting models in this setting, and obtain the best reported BLI results for all three tested language pairs;(2) We also show that our BWE-based BLI models outperform other BLI models based on recently proposed BWEs that require parallel data for bilingual training.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：