检索结果-内蒙古大学图书馆

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Humayun Kabir Kamesh Madduri Computer Science and Engineering The Pennsylvania State University University Park PA USA

k-core decomposition is a social network analytic that can be applied to centrality analysis, visualization, and community detection. There is a simple and well-known linear time sequential algorithm to perform k-core decomposition, and for this reason, k-core decomposition is widely used as a network analytic. In this work, we present a new shared-memory parallel algorithm called PKC for k-core decomposition on multicore platforms. This approach improves on the state-of-the-art implementations for k-core decomposition algorithms by reducing synchronization overhead and creating a smaller graph to process high-degree vertices. We show that PKC consistently outperforms implementations of other methods on a 32-core multicore server and on a collection of large sparse graphs. We achieve a 2.81× speedup (geometric mean of speedups for 17 large graphs) over our implementation of the ParK k-core decomposition algorithm.

关键词： Multicore processing Partitioning algorithms Algorithm design and analysis parallel algorithms Tools Time complexity

来源：评论

学校读者我要写书评

暂无评论

Forestclaw: A parallel algorithm for patch-based adaptive mesh refinement on a forest of quadtrees

arXiv

引用

arXiv 2017年

作者： Calhoun, Donna A. Burstedde, Carsten Boise State University BoiseID United States Universitadiet Bonn Germany

We describe a parallel, adaptive, multiblock algorithm for explicit integration of time dependent partial differential equations on two-dimensional Cartesian grids. The grid layout we consider consists of a nested hierarchy of fixed size, non-overlapping, logically Cartesian grids stored as leaves in a quadtree. Dynamic grid refinement and parallel partitioning of the grids is done through the use of the highly scalable quadtree/octree library p4est. Because our concept is multiblock, we are able to easily solve on a variety of geometries including the cubed sphere. In this paper, we pay special attention to providing details of the parallel ghost-filling algorithm needed to ensure that both corner and edge ghost regions around each grid hold valid values. We have implemented this algorithm in the ForestClaw code using single-grid solvers from Clawpack, a software package for solving hyperbolic PDEs using finite volumes methods. We show weak and strong scalability results for scalar advection problems on two-dimensional manifold domains on 1 to 64Ki MPI processes, demonstrating neglible regridding *** Codes 65M08, 65M50, 68W10, 65Y05 Copyright © 2017, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for generating a random graph with a prescribed degree sequence

A parallel algorithm for generating a random graph with a pr...

引用

IEEE International Conference on Big Data

作者： Hasanuzzaman Bhuiyan Maleq Khan Madhav Marathe Department of Computer Science Network Dynamics and Simulation Science Laboratory (NDSSL) Biocomplexity Institute of Virginia Tech Blacksburg VA USA

Random graphs (or networks) have gained a significant increase of interest due to its popularity in modeling and simulating many complex real-world systems. Degree sequence is one of the most important aspects of these systems. Random graphs with a given degree sequence can capture many characteristics like dependent edges and non-binomial degree distribution that are absent in many classical random graph models such as the Erdöos-Rényi graph model. In addition, they have important applications in uniform sampling of random graphs, counting the number of graphs having the same degree sequence, as well as in string theory, random matrix theory, and matching theory. In this paper, we present an OpenMP-based shared-memory parallel algorithm for generating a random graph with a prescribed degree sequence, which achieves a speedup of 20.4 with 32 cores. We also present a comparative study of several structural properties of the random graphs generated by our algorithm with that of the real-world graphs and random graphs generated by other popular methods. One of the steps in our parallel algorithm requires checking the Erdöos-Gallai characterization, i.e., whether there exists a graph obeying the given degree sequence, in parallel. This paper presents a non-trivial parallel algorithm for checking the Erdöos-Gallai characterization, which achieves a speedup of 23 with 32 cores.

关键词： parallel algorithms Biological system modeling Computational modeling Bipartite graph Computer science Mathematical model Monte Carlo methods

来源：评论

学校读者我要写书评

暂无评论

PSelInv – A Distributed Memory parallel Algorithm for Selected Inversion: the non-symmetric Case

arXiv

引用

arXiv 2017年

作者： Jacquelin, Mathias Lin, Lin Yang, Chao Computational Research Division Lawrence Berkeley National Laboratory BerkeleyCA94720 United States Department of Mathematics University of California Berkeley BerkeleyCA94720 United States

This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as P AQ = LU on a distributed memory parallel machine, where L, U are lower and upper triangular matrices, and P, Q are permutation matrices, respectively. The PSelInv method computes selected elements of A−1. The selection is confined by the sparsity pattern of the matrix AT . Our algorithm does not assume any symmetry properties of A, and our parallel implementation is memory efficient, in the sense that the computed elements of A−T overwrites the sparse matrix L + U in situ. PSelInv involves a large number of collective data communication activities within different processor groups of various sizes. In order to minimize idle time and improve load balancing, tree-based asynchronous communication is used to coordinate all such collective communication. Numerical results demonstrate that PSelInv can scale efficiently to 6, 400 cores for a variety of matrices. Copyright © 2017, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Accelerating the Smith-Waterman Algorithm Using Bitwise parallel Bulk Computation Technique on GPU

Accelerating the Smith-Waterman Algorithm Using Bitwise Para...

引用

IEEE International parallel and Distributed Processing Symposium Workshops

作者： Takahiro Nishimura Jacir L. Bordim Yasuaki Ito Koji Nakano Department of Information Engineering Hiroshima University Hicashi-Hiroshima Japan Departmen Computer Science University of Brasília DF Brazil

The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution of this work is to present a Bitwise parallel Bulk Computation (BPBC) to accelerate the Smith-Waterman Algorithm (SWA). More precisely, the dynamic programming for the SWA repeatedly performs the same computation O(mn) times. Thus, our idea is to convert this computation into a circuit simulation using the BPBC technique to compute multiple instances simultaneously. The proposed BPBC technique for the SWA has been implemented on the GPU and CPU. Experimental results show that the proposed BPBC for SWA accelerates the computation by over 447 times as compared to a single CPU implementation.

关键词： Smith-Waterman GPU parallel algorithms Bulk computation Bitwise operations

来源：评论

学校读者我要写书评

暂无评论

parallel Construction of Suffix Trees and the All-Nearest-Smaller-Values Problem

Parallel Construction of Suffix Trees and the All-Nearest-Sm...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： Patrick Flick Srinivas Aluru Georgia Institute of Technology Atlanta Georgia USA

A Suffix tree is a fundamental and versatile string data structure that is frequently used in important application areas such as text processing, information retrieval, and computational biology. Sequentially, the construction of suffix trees takes linear time, and optimal parallel algorithms exist only for the PRAM model. Recent works mostly target low core-count shared-memory implementations but achieve suboptimal complexity, and prior distributed-memory parallel algorithms have quadratic worst-case complexity. Suffix trees can be constructed from suffix and longest common prefix (LCP) arrays by solving the All-Nearest-Smaller-Values(ANSV) problem. In this paper, we formulate a more generalized version of the ANSV problem, and present a distributed-memory parallel algorithm for solving it in O(n/p +p) time. Our algorithm minimizes the overall and per-node communication volume. Building on this, we present a parallel algorithm for constructing a distributed representation of suffix trees, yielding both superior theoretical complexity and better practical performance compared to previous distributed-memory algorithms. We demonstrate the construction of the suffix tree for the human genome given its suffix and LCP arrays in under 2 seconds on 1024 Intel Xeon cores.

关键词： Phase change random access memory parallel algorithms Complexity theory Genomics Bioinformatics Iron Arrays

来源：评论

学校读者我要写书评

暂无评论

Energy-performance trade-off analysis of parallel algorithms for shared memory architectures

引用

SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS 2011年第3期1卷 167-176页

作者： Korthikanti, Vijay Anand Agha, Gul 201 North Goodwin Champaign IL 61820 USA

Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone;it must be modeled explicitly. This paper analyzes energy consumption of parallel algorithms executed on a model of shared memory multicore processors. Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied. We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement, and the optimal number of cores to maximize the performance of a parallel algorithms for a specific problem size under a given energy budget. We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory. The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters. (C) 2011 Elsevier Inc. All rights reserved.

关键词： Energy Performance parallel algorithms Shared memory architectures

来源：评论

学校读者我要写书评

暂无评论

Algorithm 981: Talbot Suite DE: Application of Modified Talbot's Method to Solve Differential Problems

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2017年第2期44卷 18-18页

作者： Rizzardi, Mariarosaria Parthenope Univ DiST Naples Italy

In order to solve a differential problem, the Laplace Transform method, when applicable, replaces the problem with a simpler one;the solution is obtained by solving the new problem and then by computing the inverse Laplace Transform of this function. In a numerical context, since the solution of the transformed problem consists of a sequence of Laplace Transform samples, most of the software for the numerical inversion cannot be used since the transform, among parameters, must be passed as a function. To fill this gap, we present Talbot Suite DE, a C software collection for Laplace Transform inversions, specifically designed for these problems and based on Talbot's method. It contains both sequential and parallel implementations;the latter is accomplished by means of OpenMP. We also report some performance results. Aimed at non-expert users, the software is equipped with several examples and a User Guide that includes the external documentation, explains how to use all the sample code, and reports its results about accuracy and efficiency. Some examples are entirely in C and others combine different programming languages (C/MATLAB, C/FORTRAN). The User Guide also contains useful hints to avoid possible errors issued during the compilation or execution of mixed-language code.

关键词： Inverse Laplace transform Talbot's method parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallelization of an Unsteady ALE Solver with Deforming Mesh Using OpenACC

引用

SCIENTIFIC PROGRAMMING 2017年第1期2017卷 1-16页

作者： Ma, Wenpeng Lu, Zhonghua Yuan, Wu Hu, Xiaodong Xinyang Normal Univ Coll Comp & Informat Technol Xinyang 464200 Henan Peoples R China Chinese Acad Sci Supercomp Ctr Beijing 100190 Peoples R China

This paper presents a parallel, GPU-based, deforming mesh-enabled unsteady numerical solver for solving moving body problems by using OpenACC. Both the 2D and 3D parallel algorithms based on spring-like deforming mesh methods are proposed and then implemented through OpenACC programming model. Furthermore, these algorithms are coupled with an unstructured mesh based, implicit time scheme integrated numerical solver, which makes the full GPU version of the solver capable of handling unsteady calculations with deforming mesh. Experiments results show that the proposed parallel deforming mesh algorithm can achieve over 2.5x speedup on K20 GPU card in comparison with 20 OpenMP threads on Intel E5-2658 V2 CPU cores. And both 2D and 3D cases are conducted to validate the efficiency, correctness, and accuracy of the present solver.

关键词： PROBLEM solving parallel algorithms COMPUTER programming GRAPHICS processing units (Computers) parallel programming (Computer science)

来源：评论

学校读者我要写书评

暂无评论

On Speeding-up parallel Jacobi Iterations for SVDs

On Speeding-up Parallel Jacobi Iterations for SVDs

引用

18th IEEE International Conference on High Performance Computing and Communications (HPCC) / 14th IEEE International Conference on Smart City (Smart City) / 2nd IEEE International Conference on Data Science and Systems (DSS)

作者： Pal, Soumitra Pathak, Sudipta Rajasekaran, Sanguthevar Univ Connecticut Comp Sci & Engn 371 Fairfield Rd Storrs CT 06269 USA

ISBN: (纸本)9781509042975

We live in an era of big data and the analysis of these data is becoming a bottleneck in many domains including biology and the internet. To make these analyses feasible in practice, we need efficient data reduction algorithms. The Singular Value Decomposition (SVD) is a data reduction technique that has been used in many different applications. For example, SVDs have been extensively used in text analysis. The best known sequential algorithms for the computation of SVDs take cubic time which may not be acceptable in practice. As a result, many parallel algorithms have been proposed in the literature. There are two kinds of algorithms for SVD, namely, QR decomposition and Jacobi iterations. Researchers have found out that even though QR is sequentially faster than Jacobi iterations, QR is difficult to parallelize. As a result, most of the parallel algorithms in the literature are based on Jacobi iterations. For example, the Jacobi Relaxation Scheme (JRS) of the classical Jacobi algorithm has been shown to be very effective in parallel. In this paper we propose a novel variant of the classical Jacobi algorithm that is more efficient than the JRS algorithm. Our experimental results confirm this assertion. The key idea behind our algorithm is to select the pivot elements for each sweep appropriately. We also show how to efficiently implement our algorithm on such parallel models as the PRAM and the mesh.

关键词： SVD Jacobi iterations JRS parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：