检索结果-内蒙古大学图书馆

Fast and scalable evaluation of pairwise potentials

COMPUTER PHYSICS COMMUNICATIONS 2020年 255卷 107248-107248页

作者： Hughey, S. Alsnayyan, A. Aktulga, H. M. Gao, T. Shanker, B. Michigan State Univ Dept Elec & Comp Engn E Lansing MI 48824 USA Michigan State Univ Dept Comp Sci E Lansing MI 48824 USA Michigan State Univ Dept Mech Engn E Lansing MI 48824 USA

Pair potentials or kernels, psi(vertical bar r vertical bar), play a critical role in a number of areas;these include biophysics, electrical engineering, fluid dynamics, diffusion physics, solid state physics, and many more. The need to evaluate these potentials rapidly for N particles gives rise to the classical N-body problem. In this paper, we present scalable parallel algorithms for evaluation of these potentials for highly non-uniform distributions. The underlying methodology for evaluating these potentials relies on the accelerated Cartesian expansion (ACE) framework that is quasi-kernel-independent with the requirement that the kernel be differentiable with known derivatives. The results presented demonstrate the accuracy control, low cost, and parallel scalability offered by this method for several example kernels and distributions of up to 5 billion particles on 16384 CPU cores. Potential applications of the algorithm include various disciplines of computational physics, engineering, machine learning, among others. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Fast multipole methods parallel algorithms N-body problem

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms FOR MAPPING SHORT DEGENERATE AND WEIGHTED DNA SEQUENCES TO A REFERENCE GENOME

引用

INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE 2012年第2期23卷 249-259页

作者： Iliopoulos, Costas S. Miller, Mirka Pissis, Solon P. Kings Coll London Dept Comp Sci London WC2R 2LS England Curtin Univ Digital Ecosyst & Business Intelligence Inst Perth WA 6845 Australia Univ Newcastle Sch Elect Engn & Comp Sci Callaghan NSW 2308 Australia Univ W Bohemia Dept Math Plzen Czech Republic

One of the most ambitious trends in current biomedical research is the large-scale genomic sequencing of patients. Novel high-throughput (or next-generation) sequencing technologies have redefined the way genome sequencing is performed. They are able to produce millions of short sequences (reads) in a single experiment, and with a much lower cost than previously possible. Due to this massive amount of data, efficient algorithms for mapping these sequences to a reference genome are in great demand, and recently, there has been ample work for publishing such algorithms. One important feature of these algorithms is the support of multithreaded parallel computing in order to speedup the mapping process. In this paper, we design parallel algorithms, which make use of the message-passing parallelism model, to address this problem efficiently. The proposed algorithms also take into consideration the probability scores assigned to each base for occurring in a specific position of a sequence. In particular, we present parallel algorithms for mapping short degenerate and weighted DNA sequences to a reference genome.

关键词： parallel algorithms string algorithms next-generation sequencing

来源：评论

学校读者我要写书评

暂无评论

Efficient graphical-processor-unit parallelization algorithm for computing Eigen values

引用

JOURNAL OF ELECTRONIC IMAGING 2020年第6期29卷 063008-063008页

作者： Ben Sayadia, Sofien Elloumi, Yaroub Akil, Mohamed Bedoui, Mohamed Hedi Univ Gustave Eiffel CNRS ESIEE Paris Gaspard Monge Comp Sci Lab Marne La Vallee France Univ Monastir Med Technol & Image Proc Lab Fac Med Monastir Tunisia Univ Sousse ISITCom Sousse Tunisia

Several leading-edge applications such as pathology detection, biometric identification, and face recognition are based mainly on blob and line detection. To address this problem, Eigen value computing has been commonly employed due to its accuracy and robustness. However, Eigen value computing requires a raised computational processing, intensive memory data access, and data overlapping, which involve higher execution times. To overcome these limitations, we propose in this paper a new parallel strategy to implement Eigen value computing using a graphics processing unit (GPU). Our contributions are (1) to optimize instruction scheduling to reduce the computation time, (2) to efficiently partition processing into blocks to increase the occupancy of streaming multiprocessors, (3) to provide efficient input data splitting on shared memory to benefit from its lower access time, and (4) to propose new data management of shared memory to avoid access memory conflict and reduce memory bank accesses. Experimental results show that our proposed GPU parallel strategy for Eigen value computing achieves speedups of 27 compared with a multithreaded implementation, of 16 compared with a predefined function in the OpenCV library, and of eight compared with a predefined function in the Cublas library, all of which are performed into a quad core multi-central-processing unit platform. Next, our parallel strategy is evaluated through an Eigen value-based method for retinal thick vessel segmentation, which is essential for detecting ocular pathologies. Eigen value computing is executed in 0.017 s when using Structured Analysis of the Retina database images. Accordingly, we achieved real-time thick retinal vessel segmentation with an average execution time of about 0.039 s. (C) 2020 SPIE and IS&T

关键词： Eigen values Hessian filter parallel algorithms graphics processing unit CUDA real-time graphics processing unit implementation

来源：评论

学校读者我要写书评

暂无评论

New parallel algorithms for finding determinants of N×N matrices

New parallel algorithms for finding determinants of N×N mat...

引用

World Congress on Computer and Information Technology (WCCIT)

作者： Sami Almalki Saeed Alzahrani Abdullatif Alabdullatif College of Computer and Information Sciences King Saud University Riyadh Saudi Arabia

ISBN: (纸本)9781479904617

Determinants has been used intensively in a variety of applications through history. It also influenced many fields of mathematics like linear algebra. Finding the determinants of a squared matrix can be done using a variety of methods, including well-known methods of Leibniz formula and Laplace expansion which calculate the determinant of any N×N matrix in O(n!). However, decomposition methods, such as: LU decomposition, Cholesky decomposition and QR decomposition, have replaced the native methods with a significantly reduced complexity of O(n^3). In this paper, we introduce two parallel algorithms for Laplace expansion and LU decomposition. Then, we analyze them and compare them with their perspective sequential algorithms in terms of run time, speed-up and efficiency, where new algorithms provided better results. At maximum, in Laplace expansion, it became 129% faster, whereas in LU Decomposition, it became 44% faster.

关键词： Matrix decomposition parallel algorithms Complexity theory Algorithm design and analysis Program processors Pipelines

来源：评论

学校读者我要写书评

暂无评论

OpenCL Based parallel Algorithm for RBF-PUM Interpolation

引用

JOURNAL OF SCIENTIFIC COMPUTING 2018年第1期74卷 267-289页

作者： Cavoretto, Roberto Schneider, Teseo Zulian, Patrick Univ Torino Dept Math G Peano Via Carlo Alberto 10 I-10123 Turin Italy Univ Svizzera Italiana Fac Informat Via Giuseppe Buffi 13 CH-6904 Lugano Switzerland

We present a parallel algorithm for multivariate Radial Basis Function Partition of Unity Method (RBF-PUM) interpolation. The concurrent nature of the RBF-PUM enables designing parallel algorithms for dealing with a large number of scattered data-points in high space dimensions. To efficiently exploit this concurrency, our algorithm makes use of shared-memory parallel processors through the OpenCL standard. This efficiency is achieved by a parallel space partitioning strategy with linear computational time complexity with respect to the input and evaluation points. The speed of our approach allows for computationally more intensive construction of the interpolant. In fact, the RBF-PUM can be coupled with a cross-validation technique that searches for optimal values of the shape parameters associated with each local RBF interpolant, thus reducing the global interpolation error. The numerical experiments support our claims by illustrating the interpolation errors and the running times of our algorithm.

关键词： Mesh-free approximation Partition of unity methods Radial basis functions Scattered data interpolation parallel algorithms Opencl

来源：评论

学校读者我要写书评

暂无评论

parallel synchronous and asynchronous coupled simulated annealing

引用

JOURNAL OF SUPERCOMPUTING 2018年第6期74卷 2841-2869页

作者： Goncalves-e-Silva, Kayo Aloise, Daniel Xavier-de-Souza, Samuel Univ Fed Rio Grande do Norte Natal RN Brazil Ecole Polytech Montreal Montreal PQ Canada

We propose a parallel synchronous and asynchronous implementation of the coupled simulated annealing (CSA) algorithm in a shared-memory architecture. The original CSA was implemented synchronously in a distributed-memory architecture. It synchronizes at each temperature update, which leads to idling and loss of efficiency when increasing the number of processors. The proposed synchronous CSA (SCSA) is implemented as the original, but in a shared-memory architecture. The proposed asynchronous CSA (ACSA) does not synchronize, allowing a larger parallel efficiency for larger numbers of processors. Results from extensive experiments show that the proposed ACSA presents much better quality of solution when compared to the serial and to the SCSA. The experiments also show that the performance of the proposed ACSA is better than the SCSA for less computationally intensive problems or when a larger number of processing cores are available. Moreover, the parallel efficiency of the ACSA improves by increasing the size of the problem. With the advent of the Multi-core Era, the use of the proposed algorithm becomes more attractive than the original synchronous CSA.

关键词： Coupled simulated annealing Global optimization parallel algorithms parallel efficiency

来源：评论

学校读者我要写书评

暂无评论

Weighted Matchings via Unweighted Augmentations 19

Weighted Matchings via Unweighted Augmentations

引用

38th ACM Symposium on Principles of Distributed Computing (PODC)

作者： Gamlath, Buddhima Kale, Sagar Mitrovic, Slobodan Svensson, Ola Ecole Polytech Fed Lausanne Lausanne Switzerland MIT Cambridge MA 02139 USA

ISBN: (纸本)9781450362177

We design a generic method to reduce the task of finding weighted matchings to that of finding short augmenting paths in unweighted graphs. This method enables us to provide efficient implementations for approximating weighted matchings in the massively parallel computation (MPC) model and in the streaming model. For the MPC and the multi-pass streaming model, we show that any algorithm computing a (1- delta)-approximate unweighted matching in bipartite graphs can be translated into an algorithm that computes a (1 - epsilon(delta))-approximate maximum weighted matching. Furthermore, this translation incurs only a constant factor (that depends on epsilon > 0) overhead in the complexity. Instantiating this with the current best MPC algorithm for unweighted matching yields a (1 - epsilon)-approximation algorithm for maximum weighted matching that uses O-epsilon (log logn) rounds, O(m/n) machines per round, and Oe (n poly(logn)) memory per machine. This improves upon the previous best approximation guarantee of (1/2 - epsilon) for weighted graphs. In the context of single-pass streaming with random edge arrivals, our techniques yield a (1/2 + c)-approximation algorithm thus breaking the natural barrier of 1/2.

关键词： Weighted matching parallel algorithms MPC Semi-streaming

来源：评论

学校读者我要写书评

暂无评论

parallel accelerated vector similarity calculations for genomics applications

引用

parallel COMPUTING 2018年 75卷 130-145页

作者： Joubert, Wayne Nance, James Weighill, Deborah Jacobson, Daniel Oak Ridge Natl Lab 1 Bethel Valley Rd Oak Ridge TN 37831 USA Univ Tennessee Bredesen Ctr Interdisciplinary Res & Grad Educ 444 Greve Hall821 Volunteer Blvd Knoxville TN 37996 USA

The surge in availability of genomic data holds promise for enabling determination of genetic causes of observed individual traits, with applications to problems such as discovery of the genetic roots of phenotypes, be they molecular phenotypes such as gene expression or metabolite concentrations, or complex phenotypes such as diseases. However, the growing sizes of these datasets and the quadratic, cubic or higher scaling characteristics of the relevant algorithms pose a serious computational challenge necessitating use of leadership scale computing. In this paper we describe a new approach to performing vector similarity metrics calculations, suitable for parallel systems equipped with graphics processing units (GPUs) or Intel Xeon Phi processors. Our primary focus is the Proportional Similarity metric applied to Genome Wide Association Studies (GWAS) and Phenome Wide Association Studies (PheWAS). We describe the implementation of the algorithms on accelerated processors, methods used for eliminating redundant calculations due to symmetries, and techniques for efficient mapping of the calculations to many-node parallel systems. Results are presented demonstrating high per-node performance and parallel scalability with rates of more than five quadrillion (5 x 10(15)) elementwise comparisons achieved per second on the ORNL Titan system. In a companion paper we describe corresponding techniques applied to calculations of the Custom Correlation Coefficient for comparative genomics applications. (C) 2018 Elsevier B.V. All rights reserved.

关键词： High performance computing parallel algorithms NVIDIA (R) GPU Intel (R) Xeon Phi Comparative genomics Vector similarity metrics Proportional Similarity metric

来源：评论

学校读者我要写书评

暂无评论

PA-Star: A disk-assisted parallel A-Star strategy with locality-sensitive hash for multiple sequence alignment

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2018年第Part2期112卷 154-165页

作者： Sundfeld, Daniel Razzolini, Caina Teodoro, George Boukerche, Azzedine Magalhaes Alves de Melo, Alba Cristina Univ Brasilia UnB Dept Comp Sci Brasilia DF Brazil Univ Ottawa Sch Informat Technol & Engn Ottawa ON Canada

Multiple Sequence Alignment (MSA) is a basic operation in Bioinformatics, and is used to highlight the similarities among a set of sequences. The MSA problem was proven NP-Hard, thus requiring a high amount of memory and computing power. This problem can be modeled as a search for the path with minimum cost in a graph, and the A-Star algorithm has been adapted to solve it sequentially and in parallel. The design of a parallel version for MSA with A-Star is subject to challenges such as irregular dependency pattern and substantial memory requirements. In this paper, we propose PA-Star, a locality sensitive multithreaded strategy based on A-Star, which computes optimal MSAs using both RAM and disk to store nodes. The experimental results obtained in 3 different machines show that the optimizations used in PA-Star can achieve an acceleration of 1.88x in the serial execution, and the parallel execution can attain an acceleration of 5.52 x with 8 cores. We also show that PA-Star outperforms a state-of-the-art MSA tool based on A-Star, executing up to 4.77 x faster. Finally, we show that our disk-assisted strategy is able to retrieve the optimal alignment when other tools fail. (C) 2017 Elsevier Inc. All rights reserved.

关键词： Multiple sequence alignment Locality-sensitive hash A-Star parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Replicable parallel branch and bound search

引用

JOURNAL OF parallel AND DISTRIBUTED COMPUTING 2018年 113卷 92-114页

作者： Archibald, Blair Maier, Patrick McCreesh, Ciaran Stewart, Robert Trinder, Phil Univ Glasgow Sch Comp Sci Glasgow G12 8QQ Lanark Scotland Heriot Watt Univ Edinburgh EH14 4AS Midlothian Scotland

Combinatorial branch and bound searches are a common technique for solving global optimisation and decision problems. Their performance often depends on good search order heuristics, refined over decades of algorithms research. parallel search necessarily deviates from the sequential search order, sometimes dramatically and unpredictably, e.g. by distributing work at random. This can disrupt effective search order heuristics and lead to unexpected and highly variable parallel performance. The variability makes it hard to reason about the parallel performance of combinatorial searches. This paper presents a generic parallel branch and bound skeleton, implemented in Haskell, with replicable parallel performance. The skeleton aims to preserve the search order heuristic by distributing work in an ordered fashion, closely following the sequential search order. We demonstrate the generality of the approach by applying the skeleton to 40 instances of three combinatorial problems: Maximum Clique, 0/1 Knapsack and Travelling Salesperson. The overheads of our Haskell skeleton are reasonable: giving slowdown factors of between 1.9 and 6.2 compared with a class-leading, dedicated, and highly optimised C++ Maximum Clique solver. We demonstrate scaling up to 200 cores of a Beowulf cluster, achieving speedups of 100x for several Maximum Clique instances. We demonstrate low variance of parallel performance across all instances of the three combinatorial problems and at all scales up to 200 cores, with median Relative Standard Deviation (RSD) below 2%. parallel solvers that do not follow the sequential search order exhibit far higher variance, with median RSD exceeding 85% for Knapsack. (C) 2017 The Author(s). Published by Elsevier Inc.

关键词： Algorithmic skeletons Branch-and-bound parallel algorithms Combinatorial optimisation Distributed computing Repeatability

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：