检索结果-内蒙古大学图书馆

A GENERIC FINITE ELEMENT FRAMEWORK ON parallel TREE-BASED ADAPTIVE MESHES

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2020年第6期42卷 C436-C468页

作者： Badia, Santiago Martin, Alberto F. Neiva, Eric Verdugo, Francesc Monash Univ Sch Math Sci Clayton Vic 3800 Australia UPC CIMNE Ctr Int Metodes Numer Engn Castelldefels 08860 Spain Univ Politecn Cataluna Dept Civil & Environm Engn ES-08034 Barcelona Spain

In this work we formally derive and prove the correctness of the algorithms and data structures in a parallel, distributed-memory, generic finite element framework that supports h-adaptivity on computational domains represented as forest-of-trees. The framework is grounded on a rich representation of the adaptive mesh suitable for generic finite elements that is built on top of a low-level, light-weight forest-of-trees data structure handled by a specialized, highly parallel adaptive meshing engine, for which we have identified the requirements it must fulfill to be coupled into our framework. Atop this two-layered mesh representation, we build the rest of the data structures required for the numerical integration and assembly of the discrete system of linear equations. We consider algorithms that are suitable for both subassembled and fully assembled distributed data layouts of linear system matrices. The proposed framework has been implemented within the FEMPAR scientific software library, using p4est as a practical forest-of-octrees demonstrator. A strong scaling study of this implementation when applied to Poisson and Maxwell problems reveals remarkable scalability up to 32.2K CPU cores and 482.2M degrees of freedom. Besides, a comparative performance study of FEMPAR and the state-of-the-art deal. II finite element software shows at least comparative performance, and at most a factor of 2-3 improvement in the h-adaptive approximation of a Poisson problem with first- and second-order Lagrangian finite elements, respectively.

关键词： partial differential equations finite elements adaptive mesh refinement forest of trees parallel algorithms scientific software

来源：评论

学校读者我要写书评

暂无评论

Solving Black-Scholes Equation Based on Time Domain Decomposition and Meshless Method

Solving Black-Scholes Equation Based on Time Domain Decompos...

引用

2022 International Joint Conference on Information and Communication Engineering, JCICE 2022

作者： Duan, Yong Zhu, Dongyuan School of Mathematical Sciences University of Electronic Science and Technology Chengdu China

ISBN: (数字)9781665460675

ISBN: (纸本)9781665460675

The work of this paper is to solve the Black-Scholes equation under European options based on the time parallel algorithm combined with the kansa method. Firstly, the partial differential equation of the price of derivative products based on stock price is obtained by using efficient market theory, no-arbitrage principle and ITO theorem. Then, the general heat conduction equation is solved by time domain decomposition coupled meshless method. Finally, through numerical example verify that this computational format has high accuracy and validity. © 2022 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Data-parallel Hashing Techniques for GPU Architectures

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2020年第1期31卷 237-250页

作者： Lessley, Brenton Childs, Hank Univ Oregon Dept Comp & Informat Sci Eugene OR 97403 USA

Hash tables are a fundamental data structure for effectively storing and accessing sparse data, with widespread usage in domains ranging from computer graphics to machine learning. This study surveys the state-of-the-art research on data-parallel hashing techniques for emerging massively-parallel, many-core GPU architectures. This survey identifies key factors affecting the performance of different techniques and suggests directions for further research.

关键词： Graphics processors hash tables parallel algorithms search problems

来源：评论

学校读者我要写书评

暂无评论

Anchored coreness: efficient reinforcement of social networks

Anchored coreness: efficient reinforcement of social network...

引用

作者： Linghu, Qingyuan Zhang, Fan Lin, Xuemin Zhang, Wenjie Zhang, Ying University of New South Wales Sydney Australia Guangzhou University Guangzhou China Centre for AI University of Technology Sydney Sydney Australia

The stability of a social network has been widely studied as an important indicator for both the network holders and the participants. Existing works on reinforcing networks focus on a local view, e.g., the anchored k-core problem aims to enlarge the size of the k-core with a fixed input k. Nevertheless, it is more promising to reinforce a social network in a global manner: considering the engagement of every user (vertex) in the network. Since the coreness of a user has been validated as the "best practice" for capturing user engagement, we propose and study the anchored coreness problem in this paper: anchoring a small number of vertices to maximize the coreness gain (the total increment of coreness) of all the vertices in the network. We prove the problem is NP-hard and show it is more challenging than the existing local-view problems. An efficient greedy algorithm is proposed with novel techniques on pruning search space and reusing the intermediate results. The algorithm is also extended to distributed environment with a novel graph partition strategy to ensure the computing independency of each machine. Extensive experiments on real-life data demonstrate that our model is effective for reinforcing social networks and our algorithms are efficient. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Data Distribution Management on Shared-memory Multiprocessors

引用

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION 2020年第1期30卷 1–25页

作者： Marzolla, Moreno D'angelo, Gabriele Univ Bologna Dept Comp Sci & Engn DISI Mura Anteo Zamboni 7 I-90126 Bologna Italy

The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification a standard framework for interoperability among simulators includes a Data Distribution Management (DDM) service whose responsibility is to report all intersections between a set of subscription and update regions. The algorithms at the core of the DDM service are CPU-intensive, and could greatly benefit from the large computing power of modern multi-core processors. In this article, we propose two parallel solutions to the DDM problem that can operate effectively on shared-memory multiprocessors. The first solution is based on a data structure (the interval tree) that allows concurrent computation of intersections between subscription and update regions. The second solution is based on a novel parallel extension of the Sort Based Matching algorithm, whose sequential version is considered among the most efficient solutions to the DDM problem. Extensive experimental evaluation of the proposed algorithms confirm their effectiveness on taking advantage of multiple execution units in a shared-memory architecture.

关键词： Data distribution management (DDM) parallel and distributed simulation (PADS) high level architecture (HLA) parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

NEARLY WORK-EFFICIENT parallel ALGORITHM FOR DIGRAPH REACHABILITY

引用

SIAM JOURNAL ON COMPUTING 2020年第5期49卷 STOC18-500-STOC18-539页

作者： Fineman, Jeremy T. Georgetown Univ Dept Comp Sci Washington DC 20057 USA

One of the simplest problems on directed graphs is that of identifying the set of vertices reachable from a designated source vertex. This problem can be solved easily sequentially by performing a graph search, but efficient parallel algorithms have eluded researchers for decades. For sparse high-diameter graphs in particular, there is no known work-efficient parallel algorithm with nontrivial parallelism. This amounts to one of the most fundamental open questions in parallel graph algorithms: Is there a parallel algorithm for digraph reachability with nearly linear work? This article shows that the answer is yes, presenting a randomized parallel algorithm for digraph reachability and related problems with expected work o(m) and span (O) over tilde (n(2/3)), and hence parallelism (O) over tilde (m/n(2/3)) = (Omega) over tilde (n(1/3)), on any graph with n vertices and m arcs. This is the first parallel algorithm having both nearly linear work and strongly sublinear span, i.e., span (O) over tilde (n(1-is an element of)) for any constant is an element of > 0. The algorithm can be extended to produce a directed spanning tree, determine whether the graph is acyclic, topologically sort the strongly connected components of the graph, or produce a directed ear decomposition, all with work (O) over tilde (m) and span (O) over tilde (n(2/3)). The main technical contribution is an efficient Monte Carlo algorithm that, through the addition of a(n) shortcuts, reduces the diameter of the graph to (O) over tilde (n(2/3)) with high probability. While both sequential and parallel algorithms are known with those combinatorial properties, even the sequential algorithms are not efficient, having sequential runtime Omega(mn(Omega(1))). This article presents a surprisingly simple sequential algorithm that achieves the stated diameter reduction and runs in (O) over tilde (m) time. parallelizing that algorithm yields the main result, but doing so involves overcoming several other challen

关键词： parallel algorithms randomized algorithms graph search reachability shortcuts

来源：评论

学校读者我要写书评

暂无评论

Advances in Asynchronous parallel and Distributed Optimization

引用

PROCEEDINGS OF THE IEEE 2020年第11期108卷 2013-2031页

作者： Assran, By Mahmoud Aytekin, Arda Feyzmahdavian, Hamid Reza Johansson, Mikael Rabbat, Michael G. McGill Univ Dept Elect & Comp Engn Montreal PQ H3A 0G4 Canada Ericsson AB S-16440 Stockholm Sweden ABB S-72226 Stockholm Sweden KTH Royal Inst Technol S-10044 Stockholm Sweden Facebook Inc Dept AI Res Montreal PQ H2S 3G9 Canada

Motivated by large-scale optimization problems arising in the context of machine learning, there have been several advances in the study of asynchronous parallel and distributed optimization methods during the past decade. Asynchronous methods do not require all processors to maintain a consistent view of the optimization variables. Consequently, they generally can make more efficient use of computational resources than synchronous methods, and they are not sensitive to issues like stragglers (i.e., slow nodes) and unreliable communication links. Mathematical modeling of asynchronous methods involves proper accounting of information delays, which makes their analysis challenging. This article reviews recent developments in the design and analysis of asynchronous optimization methods, covering both centralized methods, where all processors update a master copy of the optimization variables, and decentralized methods, where each processor maintains a local copy of the variables. The analysis provides insights into how the degree of asynchrony impacts convergence rates, especially in stochastic optimization methods.

关键词： Program processors Optimization methods Machine learning Computational modeling Convergence Computational efficiency Distributed algorithms machine learning machine learning algorithms optimization methods parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel implementation of the Image Block Representation using OpenMP

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2020年 137卷 134-147页

作者： Spiliotis, Iraklis M. Bekakos, Michael P. Boutalis, Yiannis S. Democritus Univ Thrace Dept Elect & Comp Engn GR-67100 Xanthi Greece

Herein, a parallel implementation in OpenMP of the Image Block Representation (IBR) for binary images is investigated. The IBR is a region-based image representation scheme that represents the binary image as a set of non-overlapping rectangular areas with object level, called blocks. The IBR permits the execution of operations on image areas instead of image points and therefore leads to a substantial reduction of the required computational complexity. The experimental and the analytically derived results from parallel implementation in OpenMP, on a multicore computer, proved that a very good overall performance can be achieved. (C) 2019 Elsevier Inc. All rights reserved.

关键词： Image Block Representation Karp-Flatt metric parallel computing parallel algorithms OpenMP

来源：评论

学校读者我要写书评

暂无评论

NvPD: novel parallel edit distance algorithm, correctness, and performance evaluation

引用

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS 2020年第2期23卷 879-894页

作者： Sadiq, Muhammad Umair Yousaf, Muhammad Murtaza Aslam, Laeeq Aleem, Muhammad Sarwar, Shahzad Jaffry, Syed Waqar Univ Punjab Punjab Univ Coll Informat Technol Lahore Pakistan Univ Punjab Punjab Univ Coll Informat Technol Comp Sci Lahore Turkey Capital Univ Sci & Technol Dept Comp Sci Islamabad Pakistan

Edit distance has applications in many domains such as bioinformatics, spell checking, plagiarism checking, query optimization, speech recognition, and data mining. Traditionally, edit distance is computed by dynamic programming based sequential solution which becomes infeasible for large problems. In this paper, we introduce NvPD, a novel algorithm for parallel edit distance computation by resolving dependencies in the conventional dynamic programming based solution. We also establish the correctness of modified dependencies. NvPD exhibits certain characteristics such as balanced workload among processors, less synchronization overhead, maximum utilization of resources and it can exploit spatial locality. It requiresmin(m,n)steps to complete as compared to diagonal based approach that completes inmax(m,n) Experimental evaluation using variety of random and real life data sets over shared memory multi-core systems and graphic processing units (GPUs) show that NvPD outperforms state-of-the-art parallel edit distance algorithms.

关键词： Edit distance Dynamic programming parallel algorithms Performance evaluation GPUs OpenMP

来源：评论

学校读者我要写书评

暂无评论

Improved parallel construction of wavelet trees and rank/select structures ?

引用

INFORMATION AND COMPUTATION 2020年 273卷

作者： Shun, Julian MIT CSAIL 32 Vassar St Cambridge MA 02139 USA

Existing parallel algorithms for wavelet tree construction have a work complexity of O ( n log ⁡ σ ) . This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on parallel integer sorting and has either O ( n log ⁡ log ⁡ n ⌈ log ⁡ σ / log ⁡ n log ⁡ log ⁡ n ⌉ ) work and polylogarithmic depth, or O ( n ⌈ log ⁡ σ / log ⁡ n ⌉ ) work and sub-linear depth. We also describe another algorithm that has O ( n ⌈ log ⁡ σ / log ⁡ n ⌉ ) work and O ( σ + log ⁡ n ) depth. We then show how to use similar ideas to construct variants of wavelet trees (arbitrary-shaped binary trees and multiary trees) as well as wavelet matrices in parallel with lower work complexity than prior algorithms. Finally, we show that the rank and select structures on binary sequences and multiary sequences, which are stored on wavelet tree nodes, can be constructed in parallel with improved work bounds, matching those of the best existing sequential algorithms for constructing rank and select structures.

关键词： Wavelet tree Rank and select Wavelet matrix parallel algorithms Succinct data structures

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：