检索结果-内蒙古大学图书馆

parallel simulation via SPPARKS of on-lattice kinetic and Metropolis Monte Carlo models for materials processing

MODELLING AND SIMULATION IN MATERIALS SCIENCE AND ENGINEERING 2023年第5期31卷 055001-055001页

作者： Mitchell, John A. Abdeljawad, Fadi Battaile, Corbett Garcia-Cardona, Cristina Holm, Elizabeth A. Homer, Eric R. Madison, Jon Rodgers, Theron M. Thompson, Aidan P. Tikare, Veena Webb, Ed Plimpton, Steven J. Sandia Natl Labs Albuquerque NM 87123 USA Clemson Univ Dept Mech Engn Dept Mat Sci & Engn Clemson SC USA Los Alamos Natl Lab Los Alamos NM USA Univ Michigan Ann Arbor MI USA Brigham Young Univ Provo UT USA IAEA Vienna Austria Lehigh Univ Bethlehem PA USA

SPPARKS is an open-source parallel simulation code for developing and running various kinds of on-lattice Monte Carlo models at the atomic or meso scales. It can be used to study the properties of solid-state materials as well as model their dynamic evolution during processing. The modular nature of the code allows new models and diagnostic computations to be added without modification to its core functionality, including its parallel algorithms. A variety of models for microstructural evolution (grain growth), solid-state diffusion, thin film deposition, and additive manufacturing (AM) processes are included in the code. SPPARKS can also be used to implement grid-based algorithms such as phase field or cellular automata models, to run either in tandem with a Monte Carlo method or independently. For very large systems such as AM applications, the Stitch I/O library is included, which enables only a small portion of a huge system to be resident in memory. In this paper we describe SPPARKS and its parallel algorithms and performance, explain how new Monte Carlo models can be added, and highlight a variety of applications which have been developed within the code.

关键词： SPPARKS materials processing kinetic Monte Carlo Metropolis Monte Carlo on-lattice Monte Carlo parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel Algorithm for Solving Linear Symmetric Transformation Equations 1

引用

5th International Conference on Innovative Computing, IC 2022

作者： Ma, Xiaoxiao School of Traffic and Transportation Chongqing Vocational College of Transportation Chongqing402247 China

ISBN: (数字)9789811941320

ISBN: (纸本)9789811941313

Successive relaxation iterative algorithm (SOR) is a common iterative algorithm for solving linear symmetric transformation equations. When the coefficient matrix is positive, it has faster convergence speed. However, due to the data correlation in each iteration step, it is difficult to achieve parallel computing. The current sor parallel algorithm adopts the method of data decomposition, but because the parallel area of this method is too small, the cost of synchronous communication is high, and the parallel efficiency is low. This paper presents a new parallel algorithm for SOR, which is equivalent to the traditional SOR method and has the same convergence and iterative results. The parallel algorithm increases the area of parallel computing by matrix partition, and introduces pipeline technology, which makes use of the overlapping of communication and computing time between processors to achieve ideal parallel acceleration efficiency. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Maximum Length-Constrained Flows and Disjoint Paths: Distributed, Deterministic, and Fast 2023

Maximum Length-Constrained Flows and Disjoint Paths: Distrib...

引用

55th Annual ACM Symposium on Theory of Computing (STOC) part of the ACM Federated Computing Research Conference (FCRC)

作者： Haeupler, Bernhard Hershkowitz, D. Ellis Saranurak, Thatchaphol Swiss Fed Inst Technol Zurich Switzerland Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Michigan Ann Arbor MI USA

ISBN: (纸本)9781450399135

Computing routing schemes that support both high throughput and low latency is one of the core challenges of network optimization. Such routes can be formalized as h-length flows which are defined as flows whose flow paths have length at most h. Many well-studied algorithmic primitives-such as maximal and maximum length-constrained disjoint paths-are special cases of h-length flows. Likewise the optimal h-length flow is a fundamental quantity in network optimization, characterizing, up to poly-log factors, how quickly a network can accomplish numerous distributed primitives. In this work, we give the first efficient algorithms for computing (1 - epsilon)-approximate h-length flows that are nearly "as integral as possible." We give deterministic algorithms that take (O) over tilde (poly(h, 1/epsilon)) parallel time and (O) over tilde (poly(h, 1/epsilon) center dot 2(O) (root log n)) distributed CONGEST time. We also give a CONGEST algorithm that succeeds with high probability and only takes (O) over tilde (poly(h, 1/epsilon)) time. Using our h-length flow algorithms, we give the first efficient deterministic CONGEST algorithms for the maximal disjoint paths problem with length constraints-settling an open question of Chang and Saranurak (FOCS 2020)-as well as essentially-optimal parallel and distributed approximation algorithms for maximum length-constrained disjoint paths. The former greatly simplifies deterministic CONGEST algorithms for computing expander decompositions. We also use our techniques to give the first efficient and deterministic (1-epsilon)-approximation algorithms for bipartite b-matching in CONGEST. Lastly, using our flow algorithms, we give the first algorithms to efficiently compute h-length cutmatches, an object at the heart of recent advances in length-constrained expander decompositions.

关键词： length-bounded flows hop-bounded flows distributed algorithms parallel algorithms flow rounding cycle covers b-matching

来源：评论

学校读者我要写书评

暂无评论

Trust-based Misinformation Containment in Directed Online Social Networks 15

Trust-based Misinformation Containment in Directed Online So...

引用

15th International Conference on Communication Systems and Networks (COMSNETS)

作者： Ghoshal, Arnab Kumar Das, Nabanita Das, Soham Dhar, Subhankar Asutosh Coll Dept Comp Sci Kolkata India Indian Stat Inst Adv Comp & Microelect Unit Kolkata India Microsoft Corp Redmond WA 98052 USA San Jose State Univ Sch Informat Syst & Technol San Jose CA 95192 USA

ISBN: (纸本)9781665477062

In today's world, Online Social Networks (OSNs) play a crucial role in our everyday life. But, its abuse to disseminate misinformation has turned out to be a major concern to us. Hence, the misinformation containment (MC) problem has attracted a lot of attention in recent times. For a given OSN with a fixed budget, this paper proposes a trust-based static technique independent of the distribution of misinformed nodes to select a set of trusted seed nodes leveraging the topologies of the network, to contain and decimate the misinformation faster. We follow a modified form of Competitive Linear Threshold Model with One Direction state Transition (LT1DT) to study the propagation dynamics of both the correct information and misinformation. Simulation studies on three real-world OSNs show that proposed method outperforms earlier work [1] significantly in terms of maximum number of misinformed nodes, infected time, point of inflection and number of misinformed nodes in steady state respectively. Moreover, its parallel implementation achieves almost 32x speedup, making the procedure scalable for large scale OSNs to contain and decimate misinformation in real-time.

关键词： Online Social Networks (OSNs) Competitive Linear Threshold Model Disjoint and Overlapping Community Structures Misinformation Containment (MC) parallel algorithms General-Purpose Graphic-Processor Unit (GP-GPU)

来源：评论

学校读者我要写书评

暂无评论

A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs

引用

NUMERICAL algorithms 2021年第3期86卷 1243-1263页

作者： Dessole, M. Marcuzzi, F. Univ Padua Dept Math Tullio Levi Civita Via Trieste 63 I-35121 Padua Italy

In this paper, we present PARASOF, an algorithm for the solution of linear systems with BABD matrices on massively parallel computing systems like graphic processing units or GPUs. This algorithm is compared with the state-of-the-art algorithms, in particular SOF, from which it is inspired and takes the same stability properties. We detail its design and implementation issues and give the main figures of its theoretical and experimental performances.

关键词： GPU parallel algorithms BABD system Batched routines Optimal control GPGPU computing

来源：评论

学校读者我要写书评

暂无评论

FAST-CON: a Multi-source Approach for Efficient S-T Connectivity on Sparse Graphs

FAST-CON: a Multi-source Approach for Efficient S-T Connecti...

引用

IEEE High Performance Extreme Computing Virtual Conference (HPEC)

作者： Fraccaroli, Leonardo Giugno, Rosalba Cancellieri, Samuele Busato, Federico Bombieri, Nicola Univ Verona Dept Comp Sci Verona Italy Univ Trento Dept Cellular Computat & Integrat Biol Trento Italy NVIDIA Corp Santa Clara CA USA Univ Verona Dept Engn Innovat Med Verona Italy

ISBN: (纸本)9798350308600

S-t connectivity is a decision problem asking, for vertices s and t in a graph, if t is reachable from s. Many parallel solutions for GPUs have been proposed in literature to solve the problem. The most efficient, which rely on two concurrent BFS starting from s and t have shown limitations when applied on sparse graphs (i.e., graphs with low average degree). In this paper we present FAST-CON, an alternative solution based on multi-source BFS and adjacency matrix to better exploit the massive parallelism of the GPU architectures with any type of graph. The results show that FAST-CON achieves speedup up to one order of magnitude for dense graphs and up to two orders of magnitude for sparse graphs compared to the state of the art solutions.

关键词： s-t connectivity GPU parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Shared Memory SMC Sampler for Decision Trees 35

A Shared Memory SMC Sampler for Decision Trees

引用

35th IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

作者： Drousiotis, Efthyvoulos Varsi, Alessandro Spirakis, Paul G. Maskell, Simon Univ Liverpool Dept Elect Engn & Elect Liverpool L69 3BX Merseyside England Univ Liverpool Dept Comp Sci Liverpool L69 3BX Merseyside England

ISBN: (纸本)9798350305487

Modern classification problems tackled by using Decision Tree (DT) models often require demanding constraints in terms of accuracy and scalability. This is often hard to achieve due to the ever-increasing volume of data used for training and testing. Bayesian approaches to DTs using Markov Chain Monte Carlo (MCMC) methods have demonstrated great accuracy in a wide range of applications. However, the inherently sequential nature of MCMC makes it unsuitable to meet both accuracy and scaling constraints. One could run multiple MCMC chains in an embarrassingly parallel fashion. Despite the improved runtime, this approach sacrifices accuracy in exchange for strong scaling. Sequential Monte Carlo (SMC) samplers are another class of Bayesian inference methods that also have the appealing property of being parallelizable without trading off accuracy. Nevertheless, finding an effective parallelization for the SMC sampler is difficult, due to the challenges in parallelizing its bottleneck, redistribution, in such a way that the workload is equally divided across the processing elements, especially when dealing with variable-size models such as DTs. This study presents a parallel SMC sampler for DTs on Shared Memory (SM) architectures, with an O(log(2) N) parallel redistribution for variable-size samples. On an SM machine mounting 32 cores, the experimental results show that our proposed method scales up to a factor of 16 compared to its serial implementation, and provides comparable accuracy to MCMC, but 51 times faster.

关键词： parallel algorithms Sequential Monte Carlo Samplers Markov Chain Monte Carlo Bayesian Decision Trees Shared Memory Programming

来源：评论

学校读者我要写书评

暂无评论

Massively-parallel Lagrangian particle code and applications

引用

MECHANICS RESEARCH COMMUNICATIONS 2023年 129卷

作者： Yuan, Shaohua Aguilar, Mario Zepeda Naitlho, Nizar Samulyak, Roman SUNY Stony Brook Dept Appl Math & Stat Stony Brook NY 11794 USA Brookhaven Natl Lab Computat Sci Initiat Upton NY 11973 USA

Massively-parallel, distributed-memory algorithms for the Lagrangian particle hydrodynamic method (Samulyak et al., 2018) have been developed, verified, and implemented. The key component of parallel algorithms is a particle management module that includes a parallel construction of octree databases, dynamic adaptation and refinement of octrees, and particle migration between parallel subdomains. The particle management module is based on the p4est (parallel forest of k-trees) library. The massively-parallel Lagrangian particle code has been applied to a variety of fundamental science and applied problems. A summary of Lagrangian particle code applications to the injection of impurities into thermonuclear fusion devices and to the simulation of supersonic hydrogen jets in support of laser-plasma wakefield acceleration research has also been presented.

关键词： Lagrangian particle method Multiphase flows parallel algorithms parallel k-tree

来源：评论

学校读者我要写书评

暂无评论

Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques 23

Accelerating MPI Collectives with Process-in-Process-based M...

引用

32nd International Symposium on High-Performance parallel and Distributed Computing (HPDC) part of the ACM Federated Computing Research Conference (FCRC)

作者： Huang, Jiajun Ouyang, Kaiming Zhai, Yujia Liu, Jinyang Si, Min Raffenetti, Ken Zhou, Hui Hori, Atsushi Chen, Zizhong Guo, Yanfei Thakur, Rajeev Univ Calif Riverside Riverside CA 92521 USA NVIDIA Corp Santa Clara CA USA Meta Platforms Inc Menlo Pk CA USA Argonne Natl Lab Argonne IL USA Natl Inst Informat Tokyo Japan

ISBN: (纸本)9798400701559

In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecting HPC applications' efficiency and scalability. To address these issues, we propose PiP-MColl, a Process-in-Process-based Multi-object Interprocess MPI Collective design that maximizes small message MPI collective performance at scale. PiP-MColl features efficient multiple sender and receiver collective algorithms and leverages Process-in-Process shared memory techniques to eliminate unnecessary system call, page fault overhead, and extra data copy, improving intra- and inter-node message rate and throughput. Our design also boosts performance for larger messages, resulting in comprehensive improvement for various message sizes. Experimental results show that PiP-MColl outperforms popular MPI libraries, including OpenMPI, MVAPICH2, and Intel MPI, by up to 4.6X for MPI collectives like MPI_Scatter and MPI_Allgather.

关键词： mpi collective process-in-process distributed systems message passing interface parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

MultPIM: Fast Stateful Multiplication for Processing-in-Memory

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS 2022年第3期69卷 1647-1651页

作者： Leitersdorf, Orian Ronen, Ronny Kvatinsky, Shahar Technion Israel Inst Technol Erna Viterbi Fac Elect & Comp Engn IL-3200003 Haifa Israel

Processing-in-memory (PIM) seeks to eliminate computation/memory data transfer using devices that support both storage and logic. Stateful logic techniques such as IMPLY, MAGIC and FELIX can perform logic gates within memristive crossbar arrays with massive parallelism. Multiplication via stateful logic is an active field of research due to the wide implications. Recently, RIME has become the state-of-the-art algorithm for stateful single-row multiplication by using memristive partitions, reducing the latency of the previous state-of-the-art by 5.1x. In this brief, we begin by proposing novel partition-based computation techniques for broadcasting and shifting data. Then, we design an in-memory multiplication algorithm based on the carry-save add-shift (CSAS) technique. Finally, we develop a novel stateful full-adder that significantly improves the state-of-the-art (FELIX) design. These contributions constitute MultPIM, a multiplier that reduces state-of-the-art time complexity from quadratic to linear-log. For 32-bit numbers, MultPIM improves latency by an additional 4.2x over RIME, while even slightly reducing area overhead. Furthermore, we optimize MultPIM for full-precision matrix-vector multiplication and improve latency by 25.5x over FloatPIM matrix-vector multiplication.

关键词： Logic gates Memristors Partitioning algorithms Adders Transistors Resistance Clocks Processing-in-memory memristor multiplying circuits parallel algorithms iterative algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：