检索结果-内蒙古大学图书馆

Electromagnetic transient parallel simulation optimisation based on GPU

JOURNAL OF ENGINEERING-JOE 2019年第16期2019卷 1737-1742页

作者： Yao, Shujun Zhang, Shuo Guo, Wanhua North China Elect Power Univ Sch Elect & Elect Engn Beijing Peoples R China

The development of smart grid and the increasing scale of power system bring much pressure to the electromagnetic transient simulation of a power system. The graphic processing unit (GPU), which features the massive concurrent threads and excellent floating point performance, brings a new chance to the area of power system simulation. This study introduces a parallel lower triangular and upper triangular decomposition algorithm and calculation strategy of electromagnetic transient simulation based on GPU. In this scheme, the GPU is mainly used to do the computationally intensive part of the simulation in parallel on its built-in multiple processing cores, and the CPU is assigned for updating history terms and flow control of the simulation. By comparing with the results simulating by the CPU-only implementations, the validity and efficiency of the proposed method are verified.

关键词： parallel algorithms smart power grids power system simulation optimisation EMTP graphics processing units microprocessor chips multi-threading CPU-only implementations history terms flow control built-in multiple processing cores floating point performance parallel LU decomposition algorithm power system simulation massive concurrent threads graphic processing unit smart grid GPU electromagnetic transient parallel simulation optimisation

来源：评论

学校读者我要写书评

暂无评论

A parallel FAST MULTIPOLE METHOD FOR THE HELMHOLTZ EQUATION

引用

parallel Processing Letters 1995年第2期5卷 263-274页

作者： MARK A. STALZER Optical Physics Laboratory Hughes Research Laboratories 3011 Malibu Canyon Road Malibu CA 90265 USA

Presented is a parallel algorithm based on the fast multipole method (FMM) for the Helmholtz equation. This variant of the FMM is useful for computing radar cross sections and antenna radiation patterns. The FMM decomposes the impedance matrix into sparse components, reducing the operation count of the matrix-vector multiplication in iterative solvers to O(N 3/2 ) (where N is the number of unknowns). The parallel algorithm divides the problem into groups and assigns the computation involved with each group to a processor node. Careful consideration is given to the communications costs. A time complexity analysis of the algorithm is presented and compared with empirical results from a Paragon XP/S running the lightweight Sandia/University of New Mexico operating system (SUNMOS). For a 90,000 unknown problem running on 60 nodes, the sparse representation fits in memory and the algorithm computes the matrix-vector product in 1.26 seconds. It sustains an aggregate rate of 1.4 Gflop/s. The corresponding dense matrix would occupy over 100 Gbytes and, assuming that I/O is free, would require on the order of 50 seconds to form the matrix-vector product.

关键词： parallel algorithms fast multipole method (FMM) iterative solvers Helmholtz equation Paragon SUNMOS

来源：评论

学校读者我要写书评

暂无评论

AN EVALUATION OF THE MEMORY REFERENCE BEHAVIOR OF ENGINEERING/SCIENTIFIC APPLICATIONS IN parallel SYSTEMS

引用

International Journal of High Speed Computing 1989年第4期1卷 603-641页

作者： SANDRA J. BAYLOR BHARAT D. RATHI IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights NY 10598 USA

This paper presents the results of a study conducted to evaluate the inherent memory reference behavior of several engineering/scientific applications, executing on shared memory, MIN-based, parallel systems. In this study, system sizes of two to 64 processors were evaluated. A trace-driven simulation model was used to obtain dynamic reference characteristics of the code. Included in this code were explicit declarations of shared variables. Our results indicate that a significant amount of explicitly declared shared data is accessed as either readonly by several processors, or read-write by a single processor. Furthermore, lines containing synchronization variables tend to see small ownership times at a processor and are accessed by several processors in the system. We also note that, as expected, relatively more references are to data with smaller ownership times, as the number of processors increase. Finally, the application data set size can have an impact on ownership time, as the number of processors increase.

关键词： parallel algorithms parallel processors private caches shared-memory multiprocessors trace-driven simulation

来源：评论

学校读者我要写书评

暂无评论

Efficient algorithms for estimating the general linear model

引用

parallel COMPUTING 2006年第2期32卷 195-204页

作者： Yanev, P Kontoghiorghes, EJ INRIA IRISA F-35042 Rennes France Univ Cyprus Dept Publ & Business Adm CY-1678 Nicosia Cyprus Univ London Birkbeck Coll Sch Comp Sci & Informat Syst London WC1E 7HX England

Computationally efficient serial and parallel algorithms for estimating the general linear model are proposed. The sequential block-recursive algorithm is an adaptation of a known Givens strategy that has Lis a main component the Generalized QR decomposition. The proposed algorithm is based on orthogonal transformations and exploits the triangular structure of the Cholesky QRD factor of the variance-covariance matrix. Specifically, it computes the estimator of the general linear model by solving recursively a series of smaller and smaller generalized linear least squares problems. The new algorithm is found to Outperform significantly the corresponding LAPACK routine. A parallel version of the new sequential algorithm which utilizes an efficient distribution of the matrices over the processors and has low inter-processor communication is developed. The theoretical computational complexity of the parallel algorithms is derived and analyzed. Experimental results are presented which confirm the theoretical analysis. The parallel strategy is found to be scalable and highly efficient for estimating large-scale general linear estimation problems. (c) 2005 Elsevier B.V. All rights reserved.

关键词： general linear model generalized QR decomposition parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A SUBLOGARITHMIC CONVEX-HULL ALGORITHM

引用

BIT 1990年第3期30卷 378-384页

作者： FJALLSTROM, PO KATAJAINEN, J LEVCOPOULOS, C PETERSSON, O LINKOPING UNIV DEPT COMP & INFORMAT SCIS-58183 LINKOPINGSWEDEN UNIV TURKU DEPT COMP SCISF-20500 TURKU 50FINLAND UNIV LUND DEPT COMP SCIALGORITH THEORY GRPS-22100 LUNDSWEDEN

We present a parallel algorithm for finding the convex hull of a sorted set of points in the plane. Our algorithm runs inO(logn/log logn) time usingO(n log logn/logn) processors in theCommon crcw pram computational model, which is shown to be time and cost optimal. The algorithm is based onn 1/3 divide-and-conquer and uses a simple pointer-based data structure.

关键词： F.1.2 F.2.2 parallel algorithms pram computational geometry divide-and-conquer convex hull problem

来源：评论

学校读者我要写书评

暂无评论

Advances in computational and mathematical chemistry

引用

JOURNAL OF MATHEMATICAL CHEMISTRY 2012年第2期50卷 311-312页

作者： Vigo-Aguiar, Jesus Guirao, Juan L. G. Univ Politecn Cartagena Cartagena Spain Univ Salamanca E-37008 Salamanca Spain

来源：评论

学校读者我要写书评

暂无评论

parallel accelerated Stokesian dynamics with Brownian motion

引用

JOURNAL OF COMPUTATIONAL PHYSICS 2021年 442卷 110447-110447页

作者： Ouaknin, Gaddiel Y. Su, Yu Zia, Roseanna N. Stanford Univ Dept Chem Engn Stanford CA 94305 USA

We present scalable algorithms to simulate large-scale stochastic particle systems amenable for modeling dense colloidal suspensions, glasses and gels. To handle the large number of particles and consequent many-body interactions present in such systems, we leverage an Accelerated Stokesian Dynamics (ASD) approach, for which we developed parallel algorithms in a distributed memory architecture. We present parallelization of the sparse near-field (including singular lubrication) interactions, and of the matrix-free many body far-field interactions, along with a strategy for communicating and mapping the distributed data structures between the near-and far field. Scaling to up to tens of thousands of processors for a million particles is demonstrated. In addition, we propose a novel algorithm to efficiently simulate correlated Brownian motion with hydrodynamic interactions. The original Accelerated Stokesian Dynamics approach requires the separate computation of far-field and near-field Brownian forces. Recent advancements propose computation of a far-field velocity using positive spectral Ewald decomposition. We present an alternative approach for calculating the far-field Brownian velocity by implementing the fluctuating force coupling method and embedding it using a nested scheme into ASD. This straightforward and flexible approach reduces the computational time of the Brownian far field force construction from O(NlogN)(1+vertical bar alpha vertical bar) to O(NlogN). (C) 2021 Elsevier Inc. All rights reserved.

关键词： Stokesian dynamics Hydrodynamics Stochastic calculus parallel algorithms Brownian motion Stokes flow

来源：评论

学校读者我要写书评

暂无评论

An adaptive parallel evolutionary algorithm for solving the uncapacitated facility location problem

引用

EXPERT SYSTEMS WITH APPLICATIONS 2023年 224卷

作者： Sonuc, Emrullah Ozcan, Ender Univ Nottingham Sch Comp Sci Computat Optimisat & Learning COL Lab Nottingham NG8 1BB England Karabuk Univ Fac Engn Dept Comp Engn TR-78050 Karabuk Turkiye

Metaheuristics, providing high level guidelines for heuristic optimisation, have successfully been applied to many complex problems over the past decades. However, their performances often vary depending on the choice of the initial settings for their parameters and operators along with the characteristics of the given problem instance handled. Hence, there is a growing interest into designing adaptive search methods that automate the selection of efficient operators and setting of their parameters during the search process. In this study, an adaptive binary parallel evolutionary algorithm, referred to as ABPEA, is introduced for solving the uncapacitated facility location problem which is proven to be an NP-hard optimisation problem. The approach uses a unary and two other binary operators. A reinforcement learning mechanism is used for assigning credits to operators considering their recent impact on generating improved solutions to the problem instance in hand. An operator is selected adaptively with a greedy policy for perturbing a solution. The performance of the proposed approach is evaluated on a set of well-known benchmark instances using ORLib and M*, and its scaling capacity by running it with different starting points on an increasing number of threads. Parameters are adjusted to derive the best configuration of three different rewarding schemes, which are instant, average and extreme. A performance comparison to the other state-of-the-art algorithms illustrates the superiority of ABPEA. Moreover, ABPEA provides up to a factor of 3.9 times acceleration when compared to the sequential algorithm based on a single-operator.

关键词： Adaptive operator selection Metaheuristics Combinatorial optimisation parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A SYSTOLIC ARRAY FOR SVD UPDATING

引用

SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS 1993年第2期14卷 353-371页

作者： MOONEN, M VANDOOREN, P VANDEWALLE, J UNIV ILLINOIS DEPT ELECT & COMP ENGN URBANA IL 61801 USA

In an earlier paper, an approximate SVD updating scheme has been derived as an interlacing of a QR updating on the one hand and a Jacobi-type SVD procedure on the other hand, possibly supplemented with a certain re-orthogonalization scheme. This paper maps this updating algorithm onto a systolic array with 0(n2) parallelism for 0(n2) Complexity, resulting in an 0(n0) throughput. Furthermore, it is shown how a square root-free implementation is obtained by combining modified Givens rotations with approximate SVD schemes.

关键词： SINGULAR VALUE DECOMPOSITION parallel algorithms RECURSIVE LEAST SQUARES

来源：评论

学校读者我要写书评

暂无评论

WAVEFORM ITERATION AND THE SHIFTED PICARD SPLITTING

引用

SIAM JOURNAL ON SCIENTIFIC AND STATISTICAL COMPUTING 1989年第4期10卷 756-776页

作者： SKEEL, RD

The theme of this paper is that the primary computational bottleneck in the solution of stiff ordinary differential equations (ODEs) and the parallel solution of nonstiff ODEs is the implicitness of the ODE rather than the approximation of the integration process (or in conventional terminology, numerical stability rather than accuracy), and therefore it may be fruitful to apply (at least conceptually) the iterative techniques needed to overcome implicitness in continuous time, before discretization—to waveforms rather than values at a point in time. Several classical iterations, based on splitting, are discussed, but the emphasis is on those not based on a partitioning of the ODE system. The shifted Picard iteration is proposed as a compromise between the cheap but slow Picard iteration and the fast but expensive Newton iteration. By varying the shift parameter from one iteration to the next, a good rate of convergence seems possible. As an alternative, the author also examines the more classical acceleration technique applied to the Picard iteration. Some experimental results are given. However, the practical aspects of discretization are beyond the scope of this paper.

关键词： G.1.7 65B99 65L05 waveform relaxation waveform Newton parallel algorithms Picard iteration stiff ODEs acceleration operator splitting

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：