检索结果-内蒙古大学图书馆

arXiv 2019年

作者： Harris, David G. Department of Computer Science University of Maryland College ParkMD20742 United States

The Lovász Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection B of "bad" events which are mostly independent and have low probability. In its simplest "symmetric" form, it asserts that whenever a bad-event has probability p and affects at most d bad-events, and epd Copyright © 2019, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed memory parallel algorithms for minimum spanning trees

Distributed memory parallel algorithms for minimum spanning ...

引用

2013 World Congress on Engineering, WCE 2013

作者： Lončar, Vladmir Škrbić, Srdjan Balaž, Antun Department of Mathematics and Informatics Faculty of Science University of Novi Sad Serbia University of Belgrade Institue of Physics Serbia

ISBN: (纸本)9789881925282

Finding a minimum spanning tree of a graph is a well known problem in graph theory with many practical applications. We study serial variants of Prim's and Kruskal's algorithm and present their parallelization targeting message passing parallel machine with distributed memory. We consider large graphs that can not fit into memory of one process. Experimental results show that Prim's algorithm is a good choice for dense graphs while Kruskal's algorithm is better for sparse ones. Poor scalability of Prim's algorithm comes from its high communication cost while Kruskal's algorithm showed much better scaling to larger number of processes.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Tensors in Modelling Multi-particle Interactions 12th

Tensors in Modelling Multi-particle Interactions

引用

12th International Conference on Large-Scale Scientific Computations (LSSC)

作者： Stefonishin, Daniil A. Matveev, Sergey A. Zheltkov, Dmitry A. Skolkovo Inst Sci & Technol Moscow Russia RAS Marchuk Inst Numer Math Moscow Russia Moscow Inst Phys & Technol Moscow Russia

ISBN: (纸本)9783030410322;9783030410315

In this work we present recent results on application of low-rank tensor decompositions to modelling of aggregation kinetics taking into account multi-particle collisions (for three and more particles). Such kinetics can be described by system of nonlinear differential equations with right-hand side requiring N D operations for its straight-forward evaluation, where N is number of particles' size classes and D is number of particles colliding simultaneously. Such a complexity can be significantly reduced by application low rank tensor decompositions (either Tensor Train or Canonical Polyadic) to acceleration of evaluation of sums and convolutions from right-hand side. Basing on this drastic reduction of complexity for evaluation of right-hand side we further utilize standard second order Runge-Kutta time integration scheme and demonstrate that our approach allows to obtain numerical solutions of studied equations with very high accuracy in modest times. We also show preliminary results on parallel scalability of novel approach and conclude that it can be efficiently utilized with use of supercomputers.

关键词： Tensor train Aggregation kinetics parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Massively parallel Polar Decomposition on Distributed-memory Systems

引用

ACM TRANSACTIONS ON parallel COMPUTING 2019年第1期6卷 1–15页

作者： Ltaief, Hatem Sukkari, Dalal Esposito, Aniello Nakatsukasa, Yuji Keyes, David King Abdullah Univ Sci & Technol Extreme Comp Res Ctr 4700 King Abdullah Blvd Jeddah 23955 Saudi Arabia Cray EMEA Res Lab Bristol Avon England Univ Oxford Math Inst Oxford England

We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions-introduced by Zolotarev (ZOLO) in 1877-this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to 17, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102,400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3x speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.

关键词： Polar decomposition Zolotarev functions parallel algorithms strong scaling distributed-memory systems

来源：评论

学校读者我要写书评

暂无评论

A fast method based on GPU for solidification structure simulation of continuous casting billets

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2021年 48卷 101265-101265页

作者： Wang, Jing Jing Meng, Hong Ji Yang, Jian Xie, Zhi Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Peoples R China

The present paper develops a fast method to simulate the solidification structure of continuous billets with Cellular Automaton (CA) model. Traditional solution of the CA model on single CPU takes a long time for the massive datasets and complicated calculations, making it unrealistic to optimize the parameters through numerical simulation. In this paper, a parallel method based on Graphics Processing Units (GPU) was proposed to accelerate the calculation, which developed new algorithms for the solute redistribution and neighbor capture to avoid data race in parallel computing. This new method was applied to simulate the solidification structure of Fe0.64C alloy, and the simulating results were in good agreement with the experiment results with the same parameters. The absolute computational time for the fast method implemented on Tesla P100 GPU is 277 s, while the traditional method implemented on Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz with single core is 24.57 h. The speedup, ratio between the absolute computational time of GPU-CA and CPU-CA, varies from 300 to 400 with the increase of the grids.

关键词： Fast method Graphics processing units Cellular automaton Solidification structure parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

New optimized Schwarz algorithms for one dimensional Schrodinger equation with general potential

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2021年 383卷 113018-113018页

作者： Xing, F. Univ Nice Sophia Antipolis Lab Math JA Dieudonne UMR 7351 CNRS Parc Valrose F-06108 Nice 02 France INRIA Sophia Antipolis Mediterranee Team COFFEE Parc Valrose F-06108 Nice 02 France

The aim of this paper is to develop new optimized Schwarz algorithms for the one dimensional Schrodinger equation with linear and nonlinear potential. The classical algorithm is an iterative process. In case of time-independent linear potential, we construct explicitly the interface problem and use direct LU method on the interface problem. The algorithm therefore turns to be a direct process. Thus, the algorithm is independent of transmission condition and the numerical computation is smaller. To our knowledge, this is the first time that the Schwarz algorithm is constructed as direct process. Concerning the case of time-dependent linear potential or nonlinear potential, we propose to use a pre-processed linear operator as preconditioner which leads to a preconditioned algorithm. Numerically, the convergence is also independent of the transmission condition. In addition, both of these new algorithms implemented in parallel cluster are robust, scalable up to 256 sub domains (MPI process) and take much less computation time than the classical one, especially for the nonlinear case. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Schrodinger equation Optimized Schwarz method parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Subgradient averaging for multi-agent optimisation with different constraint sets

引用

AUTOMATICA 2021年 131卷 109738-109738页

作者： Romao, Licio Margellos, Kostas Notarstefano, Giuseppe Papachristodoulou, Antonis Univ Oxford Dept Engn Sci Parks Rd Oxford OX1 3PJ England Alma Mater Studiorum Univ Bologna Dept Elect Elect & Informat Engn G Marconi Bologna Italy

We consider a multi-agent setting with agents exchanging information over a possibly time-varying network, aiming at minimising a separable objective function subject to constraints. To achieve this objective we propose a novel subgradient averaging algorithm that allows for non-differentiable objective functions and different constraint sets per agent. Allowing different constraints per agent simultaneously with a time-varying communication network constitutes a distinctive feature of our approach, extending existing results on distributed subgradient methods. To highlight the necessity of dealing with a different constraint set within a distributed optimisation context, we analyse a problem instance where an existing algorithm does not exhibit a convergent behaviour if adapted to account for different constraint sets. For our proposed iterative scheme we show asymptotic convergence of the iterates to a minimum of the underlying optimisation problem for step sizes of the form eta/k+1, eta > 0. We also analyse this scheme under a step size choice of eta/root k+1, eta > 0, and establish a convergence rate of O(ln k/root k) in objective value. To demonstrate the efficacy of the proposed method, we investigate a robust regression problem and an l(2) regression problem with regularisation. (C) 2021 Elsevier Ltd. All rights reserved.

关键词： Distributed optimisation Multi-agent networks parallel algorithms Subgradient methods Consensus

来源：评论

学校读者我要写书评

暂无评论

Convergence and Accuracy of the Method of Iterative Approximate Factorization of Operators in Multidimensional High-Accuracy Bicompact Schemes

引用

Mathematical Models and Computer Simulations 2020年第5期12卷 660-675页

作者： Rogov, B.V. Chikitkin, A.V. Keldysh Institute of Applied Mathematics Russian Academy of Sciences Moscow 125047 Russian Federation Moscow Institute of Physics and Technology (National Research University) Dolgoprudnyi 141700 Moscow oblast Russian Federation

Abstract: The convergence and accuracy of a method for solving high-order accurate bicompact schemes having the fourth order of approximation in spatial variables on a minimum stencil for a multidimensional inhomogeneous advection equation are investigated. The method is based on the approximate factorization of difference operators of multidimensional bicompact schemes. In addition, it uses iterations to preserve a high (higher than the second) order of accuracy of bicompact schemes in time. The convergence of these iterations for both two- and three-dimensional bicompact schemes as applied to the linear inhomogeneous advection equation with positive constant coefficients is proved using the spectral method. The efficiency of two parallel algorithms for solving equations of multidimensional bicompact schemes is compared. One of them is the spatial marching algorithm for calculating unfactorized schemes, and the other is based on iterative approximate factorization of difference operators of the schemes. © 2020, Pleiades Publishing, Ltd.

关键词： bicompact schemes iterative approximate factorization method multidimensional inhomogeneous advection equation parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Expediting parallel Graph Connectivity algorithms 25

Expediting Parallel Graph Connectivity Algorithms

引用

25th IEEE International Conference on High Performance Computing, Data and Analytics (HiPC)

作者： Wadwekar, Mihir Kothapalli, Kishore Int Inst Informat Technol Hyderabad 500032 India

ISBN: (纸本)9781538683866

Finding whether a graph is k-connected, and the identification of its k-connected components is a fundamental problem in graph theory. For this reason, there have been several algorithms for this problem in both the sequential and parallel settings. Several recent sequential and parallel algorithms for k-connectivity rely on one or more breadth-first traversals of the input graph. While BFS can be made very efficient in a sequential setting, the same cannot be said in the case of parallel environments. A major factor in this difficulty is due to the inherent requirement to use a shared queue, balance work among multiple threads in every round, synchronization, and the like. Optimizing the execution of BFS on many current parallel architectures is therefore quite challenging. For this reason, it can be noticed that the time spent by the current parallel graph connectivity algorithms on BFS operations is usually a significant portion of their overall runtime. In this paper, we study how one can, in the context of algorithms for graph connectivity, mitigate the practical inefficiency of relying on BFS operations in parallel. Our technique suggests that such algorithms may not require a BFS of the input graph but actually can work with a sparse spanning subgraph of the input graph. The incorrectness introduced by not using a BFS spanning tree can then be offset by further post-processing steps on suitably defined small auxiliary graphs. Our experiments on finding the 2, and 3-connectivity of graphs on Nvidia K40c GPUs improve the state-of-the-art on the corresponding problems by a factor 2.2x, and 2.1x respectively.

关键词： Testing parallel algorithms Phase change random access memory Graphics processing units Data structures Synchronization

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm to generate connected network motifs

IAENG International Journal of Computer Science

引用

IAENG International Journal of Computer Science 2019年第4期46卷 1-6页

作者： Zaenudin, Efendi Wijaya, Ezra Bernadus Dessie, Eskezeia Yihunie Reddy, Mekala Venugopala Tsai, Jeffrey J.P. Huang, Chien-Hung Ng, Ka-Lok Department of Bioinformatics and Medical Engineering Asia University Research Center for Informatics Indonesian Institute of Sciences Bandung Indonesia Department of Computer Science and Information Engineering National Formosa University Taiwan Department of Bioinformatics and Medical Engineering Asia University and Department of Medical Research China Medical University Hospital China Medical University Taiwan

Network of interactions among bio-molecules is fundamental to biological processes. Many works have shown that molecular networks can be analyzed by decomposing the networks into smaller modules named network motifs. We hypothesize that identifying the set of possible 5-node motifs embeds in a network is a necessary step to elucidate the complex topology of a network. To achieve this goal, it requires to determine the complete set of motifs that are compose of five connected nodes. We developed an algorithm to remove motifs compose of disconnected components and implemented a parallelized algorithm to reduce the computation time. Our experiment demonstrated that the proposed parallel algorithm is approximately 1.3 times faster than serial programming for identifying 5-node motifs with all the nodes connected. © 2019 International Association of Engineers.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：