检索结果-内蒙古大学图书馆

Algorithm 995: An Efficient parallel Anisotropic Delaunay Mesh Generator for Two-Dimensional Finite Element Analysis

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2019年第3期45卷 1–30页

作者： Pardue, Juliette Chernikov, Andrey Old Dominion Univ Dept Comp Sci 4700 Elkhorn Ave Norfolk VA 23529 USA

A bottom-up approach to parallel anisotropic mesh generation is presented by building a mesh generator starting from the basic operations of vertex insertion and Delaunay triangles. Applications focusing on high-lift design or dynamic stall, or numerical methods and modeling test cases, still focus on two-dimensional domains. This automated parallel mesh generation approach can generate high-fidelity unstructured meshes with anisotropic boundary layers for use in the computational fluid dynamics field. The anisotropy requirement adds a level of complexity to a parallel meshing algorithm by making computation depend on the local alignment of elements, which in turn is dictated by geometric boundaries and the density functions- one-dimensional spacing functions generated from an exponential distribution. This approach yields computational savings in mesh generation and flow solution through well-shaped anisotropic triangles instead of isotropic triangles. The validity of the meshes is shown through solution characteristic comparisons to verified reference solutions. A 79% parallel weak scaling efficiency on 1,024 distributed memory nodes, and a 72% parallel efficiency over the fastest sequential isotropic mesh generator on 512 distributed memory nodes, is shown through numerical experiments.

关键词： Anisotropic mesh generation parallel algorithms finite element analysis boundary layer computational geometry

来源：评论

学校读者我要写书评

暂无评论

parallel Multidimensional Lookahead Sorting Algorithm

引用

IEEE ACCESS 2019年 7卷 75446-75463页

作者： Gebali, Fayez Taher, Mohamed Zaki, Ahmed M. El-Kharashi, M. Watheq Tawfik, Ayman Univ Victoria Dept Elect & Comp Engn Victoria BC V8W 3P6 Canada Ain Shams Univ Fac Engn Comp & Syst Engn Dept Cairo 11517 Egypt Ajman Univ Elect Engn Dept Ajman U Arab Emirates

This paper presents a new parallel structured lookahead multidimensional sorting algorithm. Our algorithm can be based on any sequential sorting algorithm. The amount of parallelism can be controlled using several parameters such as the number of threads, word size, memory/processor communication overhead, and the dimension of the algorithm. The proposed technique is ideally suited for general purpose graphic processing units and shared-memory massively parallel processor systems. It ensures that data being processed exhibits temporal and spatial locality to maximize the utilization of processor cache. The algorithm achieves a speedup even when a single processor is used. A lookahead algorithm is also proposed to achieve even higher speedup. The performance of the proposed algorithm is verified numerically and experimentally.

关键词： Lookahead multicore execution parallel algorithms sorting algorithms

来源：评论

学校读者我要写书评

暂无评论

A semi-Automatic Approach for parallel Problem Solving using the Multi-BSP Model

引用

PROGRAMMING AND COMPUTER SOFTWARE 2019年第8期45卷 517-531页

作者： Alaniz, M. Nesmachnow, S. Univ Republica Herrera y Reissig 565 Montevideo Uruguay

The Multi-Bulk Synchronous parallel (Multi-BSP) model is a recently proposed parallel programming model for multicore machines that extends the classic Bulk Synchronous parallel model. Multi-BSP aims to be a useful model to design algorithms and estimate their running time. This model heavily relies on the right computation of parameters that characterize the hardware. Of course, the hardware utilization also depends on the specific features of the problems and the algorithms applied to solve them. This article introduces a semi-automatic approach for solving problems applying parallel algorithms using the Multi-BSP model. First, the specific multicore computer to use is characterized by applying an automatic procedure. After that, the hardware architecture discovered in the previous step is considered in order to design a portable parallel algorithm. Finally, a fine tuning of parameters is performed to improve the overall efficiency. We propose a specific benchmark for measuring the parameters that characterize the communication and synchronization costs in a particular hardware. Our approach discovers the hierarchical structure of the multicore architecture and compute both parameters for each level that can share data and make synchronizations between computing units. A second contribution of our research is a proposal for a Multi-BSP engine. It allows designing algorithms by applying a recursive methodology over the hierarchical tree already built by the benchmark, focusing on three atomic functions based in a divide-and-conquer strategy. The validation of the proposed method is reported, by studying an algorithm implemented in a prototype of the Multi-BSP engine, testing different parameter configurations that best fit to each problem and using three different high-performance multicore computers.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Exaflops biomedical knowledge graph analytics 22

Exaflops biomedical knowledge graph analytics

引用

Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

作者： Ramakrishnan Kannan Piyush Sao Hao Lu Jakub Kurzak Gundolf Schenk Yongmei Shi Seung-Hwan Lim Sharat Israni Vijay Thakkar Guojing Cong Robert Patton Sergio E. Baranzini Richard Vuduc Thomas Potok Oak Ridge National Laboratory Advanced Micro Devices Inc. University of California San Francisco Georgia Institute of Technology

We are motivated by newly proposed methods for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover relationships among concepts. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as an all-pairs shortest paths (APSP) and validate connective paths against curated biomedical knowledge graphs (e.g., Spoke). In this context, we present Coast (Exascale Communication-Optimized All-Pairs Shortest Path) and demonstrate 1.004 EF/s on 9,200 Frontier nodes (73,600 GCDs). We develop hyperbolic performance models (HyPerMod), which guide optimizations and parametric tuning. The proposed Coast algorithm achieved the memory constant parallel efficiency of 99% in the single-precision tropical semiring. Looking forward, Coast will enable the integration of scholarly corpora like PubMed into the Spoke biomedical knowledge graph.

关键词： high-performance computing shortest path problem parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Triangle Counting Through Cover-Edges

arXiv

引用

arXiv 2022年

作者： Bader, David A. Li, Fuhuan Ganeshan, Anya Gundogdu, Ahmet Lew, Jason Alvarado Rodriguez, Oliver Du, Zhihui Department of Data Science New Jersey Institute of Technology NewarkNJ United States

Counting and finding triangles in graphs is often used in real-world analytics to characterize cohesiveness and identify communities in graphs. In this paper, we propose the novel concept of a cover-edge set that can be used to find triangles more efficiently. We use a breadth-first search (BFS) to quickly generate a compact cover-edge set. Novel sequential and parallel triangle counting algorithms are presented that employ cover-edge sets. The sequential algorithm avoids unnecessary triangle-checking operations, and the parallel algorithm is communication-efficient. The parallel algorithm can asymptotically reduce communication on massive graphs such as from real social networks and synthetic graphs from the Graph500 Benchmark. In our estimate from massive-scale Graph500 graphs, our new parallel algorithm can reduce the communication on a scale 36 graph by 1156x and on a scale 42 graph by 2368x. © 2022, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel Algorithm for Community Detection in Social Networks, Based on Path Ana vs is and Threaded Binary Trees

引用

IEEE ACCESS 2019年 7卷 20499-20519页

作者： Souravlas, Stavros Sifaleras, Angelo Katsavounis, Stefanos Univ Macedonia Dept Appl Informat Thessaloniki 54636 Greece Democritus Univ Thrace Dept Prod & Management Engn Xanthi 67100 Greece

Several synchronous applications are based on the graph-structured data;among them, a very important application of this kind is community detection. Since the number and size of the networks modeled by graphs grow larger and larger, some level of parallelism needs to be used, to reduce the computational costs of such massive applications. Social networking sites allow users to manually categorize their friends into social circles (referred to as lists on Facebook and Twitter), while users, based on their interests, place themselves into groups of interest. However, the community detection and is a very effortful procedure, and in addition, these communities need to be updated very often, resulting in more effort. In this paper, we combine parallel processing techniques with a typical data structure like threaded binary trees to detect communities in an efficient manner. Our strategy is implemented over weighted networks with irregular topologies and it is based on a stepwise path detection strategy, where each step finds a link that increases the overall strength of the path being detected. To verify the functionality and parallelism benefits of our scheme, we perform experiments on five real-world data sets: Facebook (R), Twitter (R), Google+(R), Pokec, and LiveJournal.

关键词： Community detection parallel algorithms binary trees social circles

来源：评论

学校读者我要写书评

暂无评论

Arbitrarily parallel Turbo Decoding for Ultra-Reliable Low Latency Communication in 3GPP LTE

引用

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS 2019年第4期37卷 826-838页

作者： Xiang, Luping Brejza, Matthew E. Maunder, Robert G. Al-Hashimi, Bashir M. Hanzo, Lajos Univ Southampton Sch Elect & Comp Sci Southampton SO17 1BJ Hants England

In order to meet the latency requirements of the ultra-reliable low latency communication (URLLC) mode of the third-generation partnership project's long term evolution (LTE) mobile communication standard, this paper proposes a novel turbo decoding algorithm that supports an arbitrarily high degree of parallel processing, facilitating significantly higher processing throughputs and substantially lower processing latencies than the state-of-the-art (SOTA) LTE turbo decoder. As in conventional turbo decoding algorithms, the proposed Arbitrarily parallel Turbo Decoder (APTD) decomposes each frame of information bits into a sequence of windows, where the bits within different windows are processed simultaneously using forward and backward recursions in a serial manner. However, in contrast to conventional turbo decoding algorithms, the APTD does not require different windows to be composed of an identical number of bits, which allows the use of an arbitrary number of windows and hence an arbitrary degree of parallelism, when decoding information bits of an arbitrary frame length. Furthermore, conventional turbo decoding algorithms alternate between simultaneously processing the windows in the upper decoder and those in the lower decoder. By contrast, the APTD processes the odd-indexed windows in the upper decoder at the same time as the even-indexed windows in the lower decoder and alternates between this and the reversed arrangement, hence further improving the decoding throughput and latency. Furthermore, the APTD achieves a reduced hardware resource requirement by calculating the extrinsic information based only on the outputs of the forward recursions, rather than based on both the forward and backward recursions of conventional turbo decoding algorithms. We demonstrate that the proposed APTD achieves superior latency, throughput, and computational efficiency than the SOTA LTE turbo decoder at all frame lengths, but particularly at the short frame lengths that are t

关键词： Turbo decoding FPTD parallel algorithms latency throughput

来源：评论

学校读者我要写书评

暂无评论

parallel PIPS-SBB: multi-level parallelism for stochastic mixed-integer programs

引用

COMPUTATIONAL OPTIMIZATION AND APPLICATIONS 2019年第2期73卷 575-601页

作者： Munguia, Lluis-Miquel Oxberry, Geoffrey Rajan, Deepak Shinano, Yuji Georgia Inst Technol Coll Comp Atlanta GA 30332 USA Lawrence Livermore Natl Lab Computat Engn Div Livermore CA 94550 USA Zuse Inst Berlin Dept Optimizat Takustr 7 D-14195 Berlin Germany

PIPS-SBB is a distributed-memory parallel solver with a scalable data distribution paradigm. It is designed to solve mixed integer programs (MIPs) with a dual-block angular structure, which is characteristic of deterministic-equivalent stochastic mixed-integer programs. In this paper, we present two different parallelizations of Branch & Bound (B&B), implementing both as extensions of PIPS-SBB, thus adding an additional layer of parallelism. In the first of the proposed frameworks, PIPS-PSBB, the coordination and load-balancing of the different optimization workers is done in a decentralized fashion. This new framework is designed to ensure all available cores are processing the most promising parts of the B&B tree. The second, ug[PIPS-SBB,MPI], is a parallel implementation using the Ubiquity Generator, a universal framework for parallelizing B&B tree search that has been sucessfully applied to other MIP solvers. We show the effects of leveraging multiple levels of parallelism in potentially improving scaling performance beyond thousands of cores.

关键词： MIPs Stochastic MIPs parallel algorithms parallel Branch and Bound

来源：评论

学校读者我要写书评

暂无评论

Improved parallel Resampling Methods for Particle Filtering

引用

IEEE ACCESS 2019年 7卷 47593-47604页

作者： Nicely, Matthew A. Wells, B. Earl Univ Alabama Dept Elect & Comp Engn Huntsville AL 35805 USA

Particle filter techniques are common methods used to estimate the evolving state of nonlinear, non-Gaussian time-variant systems by utilizing a periodic sequence of noisy measurements. The accuracy of particle filter methods has often been shown to be superior to other state estimation techniques, such as the extended Kalman filter (EKF), for many applications. Unfortunately, the high computational cost and highly nondeterministic runtime behavior of particle filters often preclude their use in hard, real-time environments, where filter response must meet the strict timing requirements of the application. Particle filter algorithms are composed of three main stages: prediction, update, and resampling. General purpose graphics processing units (GPGPUs) have been successfully employed in previous research to accelerate the computation of both the prediction and update stages by exploiting their natural fine-grain parallelism. This research focuses on accelerating the resampling stage for GPGPU execution, which has been much more difficult to parallelize due to it's apparent inherent sequentially. This paper introduces a novel GPGPU implementation of the systematic and stratified resampling algorithms that exploit the monotonically increasing nature of the prefix-sum and the evolutionary nature of the particle weighting process to allow the re-indexing portion of the algorithms to occur in a two-phase, multi-threaded manner. This resulting measured factor of performance improvement for the systematic and stratified algorithms was 15x and 32x, respectively, over the serial implementations.

关键词： Graphics processing units parallel algorithms parallel architectures parallel programming particle filters state estimation resampling

来源：评论

学校读者我要写书评

暂无评论

Model-driven transformations for mapping parallel algorithms on parallel computing platforms 2

Model-driven transformations for mapping parallel algorithms...

引用

2nd International Workshop on Model-Driven Engineering for High Performance and CLoud Computing, MDHPCL 2013 - Co-located with 16th International Conference on Model Driven Engineering Languages and Systems, MODELS 2013

作者： Arkin, Ethem Tekinerdogan, Bedir Aselsan MGEO Ankara Turkey Bilkent University Dept. of Computer Engineering Ankara Turkey

One of the important problems in parallel computing is the mapping of the parallel algorithm to the parallel computing platform. Hereby, for each parallel node the corresponding code for the parallel nodes must be implemented. For platforms with a limited number of processing nodes this can be done manually. However, in case the parallel computing platform consists of hundreds of thousands of processing nodes then the manual coding of the parallel algorithms becomes intractable and error-prone. Moreover, a change of the parallel computing platform requires considerable effort and time of coding. In this paper we present a model-driven approach for generating the code of selected parallel algorithms to be mapped on parallel computing platforms. We describe the required platform independent metamodel, and the model-to-model and the model-to-text transformation patterns. We illustrate our approach for the parallel matrix multiplication algorithm. Copyright © 2013 for the individual papers by the papers' authors.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：