检索结果-内蒙古大学图书馆

Boosting expensive synchronizing heuristics

EXPERT SYSTEMS WITH APPLICATIONS 2021年 167卷 114203-114203页

作者： Sarac, N. Ege Altun, Omer Faruk Atam, Kamil Tolga Karahoda, Sertac Kaya, Kamer Yenigun, Husnu IST Austria Klosterneuburg Austria Sabanci Univ Fac Engn & Nat Sci Comp Sci & Engn Istanbul Turkey

For automata, synchronization, the problem of bringing an automaton to a particular state regardless of its initial state, is important. It has several applications in practice and is related to a fifty-year-old conjecture on the length of the shortest synchronizing word. Although using shorter words increases the effectiveness in practice, finding a shortest one (which is not necessarily unique) is NP-hard. For this reason, there exist various heuristics in the literature. However, high-quality heuristics such as SYNCHROP producing relatively shorter sequences are very expensive and can take hours when the automaton has tens of thousands of states. The SYNCHROP heuristic has been frequently used as a benchmark to evaluate the performance of the new heuristics. In this work, we first improve the runtime of SYNCHROP and its variants by using algorithmic techniques. We then focus on adapting SYNCHROP for many-core architectures, and overall, we obtain more than 1000x speedup on GPUs compared to naive sequential implementation that has been frequently used as a benchmark to evaluate new heuristics in the literature. We also propose two SYNCHROP variants and evaluate their performance.

关键词： Synchronizing heuristics parallel algorithms GPU programming

来源：评论

学校读者我要写书评

暂无评论

Influence of community structure on misinformation containment in online social networks

引用

KNOWLEDGE-BASED SYSTEMS 2021年 213卷 106693-106693页

作者： Ghoshal, Arnab Kumar Das, Nabanita Das, Soham Asutosh Coll Dept Comp Sci 92 SP Mukherjee Rd Kolkata 700026 W Bengal India Indian Stat Inst Adv Comp & Microelect Unit Kolkata India Microsoft Corp Redmond WA 98052 USA

With the emergence of Online Social Networks (OSNs) as an effective medium of information dissemination, its abuse in spreading misinformation has become a great concern to its users. Hence, the misinformation containment problem in various forms has emerged as an important topic of research. In general, given a snapshot of an online social network with a set of misinformed nodes and a budget limiting the maximum number of seed nodes, the goal is to determine a set of seed nodes with the correct information, to contain the misinformation at the earliest. In this paper, we leverage the community structure of the online social network to select the seed nodes statically, independent of the distribution of misinformed nodes for faster misinformation containment with simple one-time computation. We extend the work to include OSNs with overlapped community as well. To the best of our knowledge, so far, ours is the first work where the topology of the OSN has been exploited to combat the spread of misinformation faster. Experiments on real OSNs reveal that the proposed techniques outperform state-of-the-art algorithms significantly in terms of maximum and average infected time, and the point of decline, manifesting the key role of community structure on misinformation containment in a social network. Moreover, the parallel implementations of the proposed algorithms achieve around 10x speed-up over the sequential ones enhancing the scalability of the proposed approach. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Online social networks (OSNs) Community structure Misinformation containment Infected time Point of decline parallel algorithms General-purpose graphics processor unit (GP-GPU)

来源：评论

学校读者我要写书评

暂无评论

Deterministic algorithms for the Lovasz Local Lemma: simpler, more general, and more parallel

arXiv

引用

arXiv 2019年

作者： Harris, David G. Department of Computer Science University of Maryland College ParkMD20742 United States

The Lovász Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection B of "bad" events which are mostly independent and have low probability. In its simplest "symmetric" form, it asserts that whenever a bad-event has probability p and affects at most d bad-events, and epd Copyright © 2019, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed memory parallel algorithms for minimum spanning trees

Distributed memory parallel algorithms for minimum spanning ...

引用

2013 World Congress on Engineering, WCE 2013

作者： Lončar, Vladmir Škrbić, Srdjan Balaž, Antun Department of Mathematics and Informatics Faculty of Science University of Novi Sad Serbia University of Belgrade Institue of Physics Serbia

ISBN: (纸本)9789881925282

Finding a minimum spanning tree of a graph is a well known problem in graph theory with many practical applications. We study serial variants of Prim's and Kruskal's algorithm and present their parallelization targeting message passing parallel machine with distributed memory. We consider large graphs that can not fit into memory of one process. Experimental results show that Prim's algorithm is a good choice for dense graphs while Kruskal's algorithm is better for sparse ones. Poor scalability of Prim's algorithm comes from its high communication cost while Kruskal's algorithm showed much better scaling to larger number of processes.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Tensors in Modelling Multi-particle Interactions 12th

Tensors in Modelling Multi-particle Interactions

引用

12th International Conference on Large-Scale Scientific Computations (LSSC)

作者： Stefonishin, Daniil A. Matveev, Sergey A. Zheltkov, Dmitry A. Skolkovo Inst Sci & Technol Moscow Russia RAS Marchuk Inst Numer Math Moscow Russia Moscow Inst Phys & Technol Moscow Russia

ISBN: (纸本)9783030410322;9783030410315

In this work we present recent results on application of low-rank tensor decompositions to modelling of aggregation kinetics taking into account multi-particle collisions (for three and more particles). Such kinetics can be described by system of nonlinear differential equations with right-hand side requiring N D operations for its straight-forward evaluation, where N is number of particles' size classes and D is number of particles colliding simultaneously. Such a complexity can be significantly reduced by application low rank tensor decompositions (either Tensor Train or Canonical Polyadic) to acceleration of evaluation of sums and convolutions from right-hand side. Basing on this drastic reduction of complexity for evaluation of right-hand side we further utilize standard second order Runge-Kutta time integration scheme and demonstrate that our approach allows to obtain numerical solutions of studied equations with very high accuracy in modest times. We also show preliminary results on parallel scalability of novel approach and conclude that it can be efficiently utilized with use of supercomputers.

关键词： Tensor train Aggregation kinetics parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Massively parallel Polar Decomposition on Distributed-memory Systems

引用

ACM TRANSACTIONS ON parallel COMPUTING 2019年第1期6卷 1–15页

作者： Ltaief, Hatem Sukkari, Dalal Esposito, Aniello Nakatsukasa, Yuji Keyes, David King Abdullah Univ Sci & Technol Extreme Comp Res Ctr 4700 King Abdullah Blvd Jeddah 23955 Saudi Arabia Cray EMEA Res Lab Bristol Avon England Univ Oxford Math Inst Oxford England

We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions-introduced by Zolotarev (ZOLO) in 1877-this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to 17, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102,400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3x speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.

关键词： Polar decomposition Zolotarev functions parallel algorithms strong scaling distributed-memory systems

来源：评论

学校读者我要写书评

暂无评论

A fast method based on GPU for solidification structure simulation of continuous casting billets

引用

JOURNAL OF COMPUTATIONAL SCIENCE 2021年 48卷 101265-101265页

作者： Wang, Jing Jing Meng, Hong Ji Yang, Jian Xie, Zhi Northeastern Univ Sch Informat Sci & Engn Shenyang 110819 Peoples R China

The present paper develops a fast method to simulate the solidification structure of continuous billets with Cellular Automaton (CA) model. Traditional solution of the CA model on single CPU takes a long time for the massive datasets and complicated calculations, making it unrealistic to optimize the parameters through numerical simulation. In this paper, a parallel method based on Graphics Processing Units (GPU) was proposed to accelerate the calculation, which developed new algorithms for the solute redistribution and neighbor capture to avoid data race in parallel computing. This new method was applied to simulate the solidification structure of Fe0.64C alloy, and the simulating results were in good agreement with the experiment results with the same parameters. The absolute computational time for the fast method implemented on Tesla P100 GPU is 277 s, while the traditional method implemented on Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40 GHz with single core is 24.57 h. The speedup, ratio between the absolute computational time of GPU-CA and CPU-CA, varies from 300 to 400 with the increase of the grids.

关键词： Fast method Graphics processing units Cellular automaton Solidification structure parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

New optimized Schwarz algorithms for one dimensional Schrodinger equation with general potential

引用

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS 2021年 383卷 113018-113018页

作者： Xing, F. Univ Nice Sophia Antipolis Lab Math JA Dieudonne UMR 7351 CNRS Parc Valrose F-06108 Nice 02 France INRIA Sophia Antipolis Mediterranee Team COFFEE Parc Valrose F-06108 Nice 02 France

The aim of this paper is to develop new optimized Schwarz algorithms for the one dimensional Schrodinger equation with linear and nonlinear potential. The classical algorithm is an iterative process. In case of time-independent linear potential, we construct explicitly the interface problem and use direct LU method on the interface problem. The algorithm therefore turns to be a direct process. Thus, the algorithm is independent of transmission condition and the numerical computation is smaller. To our knowledge, this is the first time that the Schwarz algorithm is constructed as direct process. Concerning the case of time-dependent linear potential or nonlinear potential, we propose to use a pre-processed linear operator as preconditioner which leads to a preconditioned algorithm. Numerically, the convergence is also independent of the transmission condition. In addition, both of these new algorithms implemented in parallel cluster are robust, scalable up to 256 sub domains (MPI process) and take much less computation time than the classical one, especially for the nonlinear case. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Schrodinger equation Optimized Schwarz method parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Subgradient averaging for multi-agent optimisation with different constraint sets

引用

AUTOMATICA 2021年 131卷 109738-109738页

作者： Romao, Licio Margellos, Kostas Notarstefano, Giuseppe Papachristodoulou, Antonis Univ Oxford Dept Engn Sci Parks Rd Oxford OX1 3PJ England Alma Mater Studiorum Univ Bologna Dept Elect Elect & Informat Engn G Marconi Bologna Italy

We consider a multi-agent setting with agents exchanging information over a possibly time-varying network, aiming at minimising a separable objective function subject to constraints. To achieve this objective we propose a novel subgradient averaging algorithm that allows for non-differentiable objective functions and different constraint sets per agent. Allowing different constraints per agent simultaneously with a time-varying communication network constitutes a distinctive feature of our approach, extending existing results on distributed subgradient methods. To highlight the necessity of dealing with a different constraint set within a distributed optimisation context, we analyse a problem instance where an existing algorithm does not exhibit a convergent behaviour if adapted to account for different constraint sets. For our proposed iterative scheme we show asymptotic convergence of the iterates to a minimum of the underlying optimisation problem for step sizes of the form eta/k+1, eta > 0. We also analyse this scheme under a step size choice of eta/root k+1, eta > 0, and establish a convergence rate of O(ln k/root k) in objective value. To demonstrate the efficacy of the proposed method, we investigate a robust regression problem and an l(2) regression problem with regularisation. (C) 2021 Elsevier Ltd. All rights reserved.

关键词： Distributed optimisation Multi-agent networks parallel algorithms Subgradient methods Consensus

来源：评论

学校读者我要写书评

暂无评论

Expediting parallel Graph Connectivity algorithms 25

Expediting Parallel Graph Connectivity Algorithms

引用

25th IEEE International Conference on High Performance Computing, Data and Analytics (HiPC)

作者： Wadwekar, Mihir Kothapalli, Kishore Int Inst Informat Technol Hyderabad 500032 India

ISBN: (纸本)9781538683866

Finding whether a graph is k-connected, and the identification of its k-connected components is a fundamental problem in graph theory. For this reason, there have been several algorithms for this problem in both the sequential and parallel settings. Several recent sequential and parallel algorithms for k-connectivity rely on one or more breadth-first traversals of the input graph. While BFS can be made very efficient in a sequential setting, the same cannot be said in the case of parallel environments. A major factor in this difficulty is due to the inherent requirement to use a shared queue, balance work among multiple threads in every round, synchronization, and the like. Optimizing the execution of BFS on many current parallel architectures is therefore quite challenging. For this reason, it can be noticed that the time spent by the current parallel graph connectivity algorithms on BFS operations is usually a significant portion of their overall runtime. In this paper, we study how one can, in the context of algorithms for graph connectivity, mitigate the practical inefficiency of relying on BFS operations in parallel. Our technique suggests that such algorithms may not require a BFS of the input graph but actually can work with a sparse spanning subgraph of the input graph. The incorrectness introduced by not using a BFS spanning tree can then be offset by further post-processing steps on suitably defined small auxiliary graphs. Our experiments on finding the 2, and 3-connectivity of graphs on Nvidia K40c GPUs improve the state-of-the-art on the corresponding problems by a factor 2.2x, and 2.1x respectively.

关键词： Testing parallel algorithms Phase change random access memory Graphics processing units Data structures Synchronization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：