检索结果-内蒙古大学图书馆

A Well-Scaling Parallel Algorithm for the Computation of the Translation Operator in the MLFMA

IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 2014年第5期62卷 2679-2687页

作者： Michiels, Bart Bogaert, Ignace Fostier, Jan De Zutter, Daniel Univ Ghent Dept Informat Technol INTEC B-9000 Ghent Belgium

This paper investigates the parallel, distributed-memory computation of the translation operator with L + 1 multipoles in the three-dimensional Multilevel Fast Multipole Algorithm (MLFMA). A baseline, communication-free parallel algorithm can compute such a translation operator in O(L) time, using O(L-2) processes. We propose a parallel algorithm that reduces this complexity to O(log L) time. This complexity is theoretically supported and experimentally validated up to 16 384 parallel processes. For realistic cases, the implementation of the proposed algorithm proves to be up to ten times faster than the baseline algorithm. For a large-scale parallel MLFMA simulation with 4096 parallel processes, the runtime for the computation of all translation operators during the setup stage is reduced from roughly one hour to only a few minutes.

关键词： distributed-memory architecture MLFMA parallel computing translation operator

来源：评论

学校读者我要写书评

暂无评论

DAIRRy-BLUP: A High-Performance Computing Approach to Genomic Prediction

引用

GENETICS 2014年第3期197卷 813-+页

作者： De Coninck, Arne Fostier, Jan Maenhout, Steven De Baets, Bernard Univ Ghent Res Unit Knowledge Based Syst KERMIT Dept Math Modelling Stat & Bioinformat B-9000 Ghent Belgium Ghent Univ IMinds IBCN B-9000 Ghent Belgium Ghent Univ IMinds Serv Res Unit Dept Informat Technol B-9000 Ghent Belgium Progeno B-9052 Zwijnaarde Belgium

In genomic prediction, common analysis methods rely on a linear mixed-model framework to estimate SNP marker effects and breeding values of animals or plants. Ridge regression-best linear unbiased prediction (RR-BLUP) is based on the assumptions that SNP marker effects are normally distributed, are uncorrelated, and have equal variances. We propose DAIRRy-BLUP, a parallel, distributed-memory RR-BLUP implementation, based on single-trait observations (y), that uses the Average Information algorithm for restricted maximum-likelihood estimation of the variance components. The goal of DAIRRy-BLUP is to enable the analysis of large-scale data sets to provide more accurate estimates of marker effects and breeding values. A distributed-memory framework is required since the dimensionality of the problem, determined by the number of SNP markers, can become too large to be analyzed by a single computing node. Initial results show that DAIRRy-BLUP enables the analysis of very large-scale data sets (up to 1,000,000 individuals and 360,000 SNPs) and indicate that increasing the number of phenotypic and genotypic records has a more significant effect on the prediction accuracy than increasing the density of SNP arrays.

关键词： distributed-memory architecture genomic prediction high-performance computing simulated data variance component estimation

来源：评论

学校读者我要写书评

暂无评论

Collective communication: theory, practice, and experience

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2007年第13期19卷 1749-1783页

作者： Chan, Ernie Heimlich, Marcel Purkayastha, Avi van de Geijn, Robert Univ Texas Dept Comp Sci Austin TX 78712 USA Univ Texas Texas Adv Comp Ctr Austin TX 78712 USA

We discuss the design and high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Xeon/Pentimn 4 (R) processor cluster are included. Copyright (C) 2007 John Wiley & Sons, Ltd.

关键词： collective communication distributed-memory architecture clusters

来源：评论

学校读者我要写书评

暂无评论

High Performance Fortran for practical scientific algorithms: An up-to-date evaluation

引用

FUTURE GENERATION COMPUTER SYSTEMS 1999年第3期15卷 343-352页

作者： Ding, CHQ Univ Calif Berkeley Lawrence Berkeley Lab Natl Energy Res Sci Comp Ctr Berkeley CA 94720 USA

A suite of High Performance Fortran (HPF) coding examples of practical scientific algorithms are examined in detail, with the idea that on these simple but non-trivial examples, we can fairly well understand issues related to different data distributions, different parallel constructs, and different programming styles (static Versus dynamic allocations). Coding examples include 2D stencils solution of PDEs, N-body problem, LU factorization, several vector/matrix library routines, 2D and 3D array redistribution. Performances of HPF codes are compared to hand-written Fortran codes with message passing libraries. From 1997 to 1998, HPF compilers are improved significantly such that HPF codes perform as well as Fortran+MPI codes for all the examples investigated here. However, many important peculiarities of HPF coding still exist. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： HPF distributed-memory architecture parallel programming language MPI stencils computation N-body problem LU factorization array redistribution FFT

来源：评论

学校读者我要写书评

暂无评论

Multiple sequential staging of tasks: A new approach to parallel computations

引用

COMMUNICATIONS IN NUMERICAL METHODS IN ENGINEERING 1999年第5期15卷 367-373页

作者： Zhong, YG Kong, XY Xu, GM Kuang, GH Univ Sci & Technol China Dept Modern Mech Hefei 230027 Anhui Peoples R China

Developing an efficient algorithm for solving a large linear system in a parallel computing environment is the major problem associated with the application of parallel processing to the numerical solution of large-scale engineering problems. This paper presents a new algorithm called Multiple Sequential Staging of Tasks (MSST) to speed up the solution of a large linear system. The technique of Sequential Staging of Tasks (SST) is a highly efficient approach to the parallel solution of a targe linear system, but it is not suitable for middle- and large-scale parallel computers due to the idle periods of processors. The MSST technique partitions processors into groups and makes each group start its operation from a different row of a large linear system to remove the idle period. Therefore, MSST can be performed effectively on middle- and large-scale parallel computers and achieves a higher speed-up. Numerical results were obtained from computer experiments performed with a numerical solution method of the Poisson equation on a Dawning-1000 supercomputer (a distributed-memory MIMD architecture). The parallel speed-up is satisfactory. Copyright (C) 1999 John Wiley & Sons, Ltd.

关键词： engineering numerical solution parallel approach higher speed-up distributed-memory architecture

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：