检索结果-内蒙古大学图书馆

arXiv 2017年

作者： Calhoun, Donna A. Burstedde, Carsten Boise State University BoiseID United States Universitadiet Bonn Germany

We describe a parallel, adaptive, multiblock algorithm for explicit integration of time dependent partial differential equations on two-dimensional Cartesian grids. The grid layout we consider consists of a nested hierarchy of fixed size, non-overlapping, logically Cartesian grids stored as leaves in a quadtree. Dynamic grid refinement and parallel partitioning of the grids is done through the use of the highly scalable quadtree/octree library p4est. Because our concept is multiblock, we are able to easily solve on a variety of geometries including the cubed sphere. In this paper, we pay special attention to providing details of the parallel ghost-filling algorithm needed to ensure that both corner and edge ghost regions around each grid hold valid values. We have implemented this algorithm in the ForestClaw code using single-grid solvers from Clawpack, a software package for solving hyperbolic PDEs using finite volumes methods. We show weak and strong scalability results for scalar advection problems on two-dimensional manifold domains on 1 to 64Ki MPI processes, demonstrating neglible regridding *** Codes 65M08, 65M50, 68W10, 65Y05 Copyright © 2017, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel firefly meta-heuristics algorithm for financial option pricing

A parallel firefly meta-heuristics algorithm for financial o...

引用

IEEE Symposium Series on Computational Intelligence (SSCI)

作者： Kevin Mather Parimala Thulasiraman Ruppa K. Thulasiram Sujata Dash Department of Computer Science University of Manitoba Winnipeg MB Canada

In this paper we present a design and development of a parallel Firefly meta-heuristic algorithm for option pricing. We study the parallel algorithm for performance both theoretically and experimentally. Our implementation of the algorithm exhibits significant speedup for financial option pricing problem and demonstrates the utility of our parallel algorithm even when the problem size with large number of Fireflies is deployed. We also present a detailed analysis of the theoretical runtime cost of firefly algorithm on both the RAM and P-RAM models of computation. Moreover, we identify certain issues in the algorithm regarding global memory access pattern, which could be studied for further improvement.

关键词： Pricing Computational modeling Algorithm design and analysis Contracts Analytical models Europe parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Accelerating the Smith-Waterman Algorithm Using Bitwise parallel Bulk Computation Technique on GPU

Accelerating the Smith-Waterman Algorithm Using Bitwise Para...

引用

IEEE International parallel and Distributed Processing Symposium Workshops

作者： Takahiro Nishimura Jacir L. Bordim Yasuaki Ito Koji Nakano Department of Information Engineering Hiroshima University Hicashi-Hiroshima Japan Departmen Computer Science University of Brasília DF Brazil

The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution of this work is to present a Bitwise parallel Bulk Computation (BPBC) to accelerate the Smith-Waterman Algorithm (SWA). More precisely, the dynamic programming for the SWA repeatedly performs the same computation O(mn) times. Thus, our idea is to convert this computation into a circuit simulation using the BPBC technique to compute multiple instances simultaneously. The proposed BPBC technique for the SWA has been implemented on the GPU and CPU. Experimental results show that the proposed BPBC for SWA accelerates the computation by over 447 times as compared to a single CPU implementation.

关键词： Smith-Waterman GPU parallel algorithms Bulk computation Bitwise operations

来源：评论

学校读者我要写书评

暂无评论

Nesterov-based parallel algorithm for large-scale nonnegative tensor factorization

Nesterov-based parallel algorithm for large-scale nonnegativ...

引用

IEEE International Conference on Acoustics, Speech and Signal Processing

作者： A. P. Liavas G. Kostoulas G. Lourakis K. Huang N. D. Sidiropoulos Department of ECE Technical University of Crete Greece Department of ECE University of Minnesota United States of America

ISBN: (纸本)9781509041183

We consider the problem of nonnegative tensor factorization. Our aim is to derive an efficient algorithm that is also suitable for parallel implementation. We adopt the alternating optimization (AO) framework and solve each matrix nonnegative least-squares problem via a Nesterov-type algorithm for strongly convex problems. We describe a parallel implementation of the algorithm and measure the speedup attained by itsMessage Passing Interface implementation on a parallel computing environment. It turns out that the attained speedup is significant, rendering our algorithm a competitive candidate for the solution of very large-scale dense nonnegative tensor factorization problems.

关键词： Tensors constrained optimization CANDECOMP PARAFAC nonnegative factorization parallel algorithms Tensor parallel algorithms Constrained optimization Nonnegative factorization Speeding

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for generating a random graph with a prescribed degree sequence

A parallel algorithm for generating a random graph with a pr...

引用

IEEE International Conference on Big Data

作者： Hasanuzzaman Bhuiyan Maleq Khan Madhav Marathe Department of Computer Science Network Dynamics and Simulation Science Laboratory (NDSSL) Biocomplexity Institute of Virginia Tech Blacksburg VA USA

Random graphs (or networks) have gained a significant increase of interest due to its popularity in modeling and simulating many complex real-world systems. Degree sequence is one of the most important aspects of these systems. Random graphs with a given degree sequence can capture many characteristics like dependent edges and non-binomial degree distribution that are absent in many classical random graph models such as the Erdöos-Rényi graph model. In addition, they have important applications in uniform sampling of random graphs, counting the number of graphs having the same degree sequence, as well as in string theory, random matrix theory, and matching theory. In this paper, we present an OpenMP-based shared-memory parallel algorithm for generating a random graph with a prescribed degree sequence, which achieves a speedup of 20.4 with 32 cores. We also present a comparative study of several structural properties of the random graphs generated by our algorithm with that of the real-world graphs and random graphs generated by other popular methods. One of the steps in our parallel algorithm requires checking the Erdöos-Gallai characterization, i.e., whether there exists a graph obeying the given degree sequence, in parallel. This paper presents a non-trivial parallel algorithm for checking the Erdöos-Gallai characterization, which achieves a speedup of 23 with 32 cores.

关键词： parallel algorithms Biological system modeling Computational modeling Bipartite graph Computer science Mathematical model Monte Carlo methods

来源：评论

学校读者我要写书评

暂无评论

parallel Construction of Suffix Trees and the All-Nearest-Smaller-Values Problem

Parallel Construction of Suffix Trees and the All-Nearest-Sm...

引用

International Symposium on parallel and Distributed Processing (IPDPS)

作者： Patrick Flick Srinivas Aluru Georgia Institute of Technology Atlanta Georgia USA

A Suffix tree is a fundamental and versatile string data structure that is frequently used in important application areas such as text processing, information retrieval, and computational biology. Sequentially, the construction of suffix trees takes linear time, and optimal parallel algorithms exist only for the PRAM model. Recent works mostly target low core-count shared-memory implementations but achieve suboptimal complexity, and prior distributed-memory parallel algorithms have quadratic worst-case complexity. Suffix trees can be constructed from suffix and longest common prefix (LCP) arrays by solving the All-Nearest-Smaller-Values(ANSV) problem. In this paper, we formulate a more generalized version of the ANSV problem, and present a distributed-memory parallel algorithm for solving it in O(n/p +p) time. Our algorithm minimizes the overall and per-node communication volume. Building on this, we present a parallel algorithm for constructing a distributed representation of suffix trees, yielding both superior theoretical complexity and better practical performance compared to previous distributed-memory algorithms. We demonstrate the construction of the suffix tree for the human genome given its suffix and LCP arrays in under 2 seconds on 1024 Intel Xeon cores.

关键词： Phase change random access memory parallel algorithms Complexity theory Genomics Bioinformatics Iron Arrays

来源：评论

学校读者我要写书评

暂无评论

PSelInv – A Distributed Memory parallel Algorithm for Selected Inversion: the non-symmetric Case

arXiv

引用

arXiv 2017年

作者： Jacquelin, Mathias Lin, Lin Yang, Chao Computational Research Division Lawrence Berkeley National Laboratory BerkeleyCA94720 United States Department of Mathematics University of California Berkeley BerkeleyCA94720 United States

This paper generalizes the parallel selected inversion algorithm called PSelInv to sparse non-symmetric matrices. We assume a general sparse matrix A has been decomposed as P AQ = LU on a distributed memory parallel machine, where L, U are lower and upper triangular matrices, and P, Q are permutation matrices, respectively. The PSelInv method computes selected elements of A−1. The selection is confined by the sparsity pattern of the matrix AT . Our algorithm does not assume any symmetry properties of A, and our parallel implementation is memory efficient, in the sense that the computed elements of A−T overwrites the sparse matrix L + U in situ. PSelInv involves a large number of collective data communication activities within different processor groups of various sizes. In order to minimize idle time and improve load balancing, tree-based asynchronous communication is used to coordinate all such collective communication. Numerical results demonstrate that PSelInv can scale efficiently to 6, 400 cores for a variety of matrices. Copyright © 2017, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel multi-splitting proximal method for star networks

Parallel multi-splitting proximal method for star networks

引用

American Control Conference

作者： Ermin Wei Department of Electrical Engineering and Computer Science Northwestern University Evanston IL 60202 United States of America

ISBN: (纸本)9781509045839

We develop a parallel algorithm based on proximal method to solve the problem of minimizing summation of convex (not necessarily smooth) functions over a star network. We show that this method converges to an optimal solution for any choice of constant stepsize for convex objective functions. Under further assumption of Lipschitz-gradient and strong convexity of objective functions, the method converges linearly.

关键词： choice constant stepsize step size Star networks Constants Objective function Convexity parallel algorithms optimal solution

来源：评论

学校读者我要写书评

暂无评论

Energy-performance trade-off analysis of parallel algorithms for shared memory architectures

引用

SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS 2011年第3期1卷 167-176页

作者： Korthikanti, Vijay Anand Agha, Gul 201 North Goodwin Champaign IL 61820 USA

Energy consumption by computer systems has emerged as an important concern. However, the energy consumed in executing an algorithm cannot be inferred from its performance alone;it must be modeled explicitly. This paper analyzes energy consumption of parallel algorithms executed on a model of shared memory multicore processors. Specifically, we develop a methodology to evaluate how energy consumption of a given parallel algorithm changes as the number of cores and their frequency is varied. We use this analysis to establish the optimal number of cores to minimize the energy consumed by the execution of a parallel algorithm for a specific problem size while satisfying a given performance requirement, and the optimal number of cores to maximize the performance of a parallel algorithms for a specific problem size under a given energy budget. We study the sensitivity of our analysis to changes in parameters such as the ratio of the power consumed by a computation step versus the power consumed in accessing memory. The results show that the relation between the problem size and the optimal number of cores is relatively unaffected for a wide range of these parameters. (C) 2011 Elsevier Inc. All rights reserved.

关键词： Energy Performance parallel algorithms Shared memory architectures

来源：评论

学校读者我要写书评

暂无评论

Accelerating the phylogenetic parsimony function on heterogeneous systems

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2017年第8期29卷

作者： Santander-Jimenez, Sergio Ilic, Aleksandar Sousa, Leonel Vega-Rodriguez, Miguel A. Univ Extremadura Escuela Politecn Dept Comp & Commun Technol Campus Univ S-N Caceres 10003 Spain Univ Lisbon INESC ID IST P-1000029 Lisbon Portugal

The availability of heterogeneous CPU+GPU systems has opened the door to new opportunities for the development of parallel solutions to tackle complex biological problems. The reconstruction of evolutionary histories among species represents a grand computational challenge, which can be addressed by exploiting this kind of hardware designs. In this research, we study the application of heterogeneous computing with OpenCL to accelerate one of the most well-known objective functions for inferring phylogenies, the phylogenetic parsimony function. For this purpose, we undertake the design of CPU and GPU kernel implementations of this relevant function, proposing a heterogeneous CPU+GPU multidevice approach that distributes multiple parsimony evaluations among processing devices. Experiments on 6 real nucleotide data sets and comparisons with other parallel implementations give account of the benefits of the proposal in this paper, obtaining significant parallel results by combining CPU and GPU capabilities in accordance with the characteristics of the input data.

关键词： bioinformatics heterogeneous CPU plus GPU systems parallel algorithms parallel performance evaluation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：