检索结果-内蒙古大学图书馆

31st IEEE International parallel and Distributed Processing Symposium Workshops (IPDPS)

作者： Nishimura, Takahiro Bordim, Jacir L. Ito, Yasuaki Nakano, Koji Hiroshima Univ Dept Informat Engn 1-4-1 Kagamiyama Higashihiroshima 7398527 Japan Univ Brasilia Dept Comp Sci BR-70910900 Brasilia DF Brazil

ISBN: (纸本)9780769561493

The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution of this work is to present a Bitwise parallel Bulk Computation (BPBC) to accelerate the Smith-Waterman Algorithm (SWA). More precisely, the dynamic programming for the SWA repeatedly performs the same computation O(mn) times. Thus, our idea is to convert this computation into a circuit simulation using the BPBC technique to compute multiple instances simultaneously. The proposed BPBC technique for the SWA has been implemented on the GPU and CPU. Experimental results show that the proposed BPBC for SWA accelerates the computation by over 447 times as compared to a single CPU implementation.

关键词： Smith-Waterman GPU parallel algorithms bulk computation bitwise operations

来源：评论

学校读者我要写书评

暂无评论

parallel Modularity Clustering

Parallel Modularity Clustering

引用

International Conference on Computational Science (ICCS)

作者： Fender, Alexandre Emad, Nahid Petiton, Serge Naumov, Maxim Nvidia Corp Santa Clara CA 95050 USA Maison Simulat Saclay France Univ Versailles LI PaRAD Versailles France Univ Lille I Sci & Technol Villeneuve Dascq France

In this paper we develop a parallel approach for computing the modularity clustering often used to identify and analyse communities in social networks. We show that modularity can be approximated by looking at the largest eigenpairs of the weighted graph adjacency matrix that has been perturbed by a rank one update. Also, we generalize this formulation to identify multiple clusters at once. We develop a fast parallel implementation for it that takes advantage of the Lanczos eigenvalue solver and k-means algorithm on the GPU. Finally, we highlight the performance and quality of our approach versus existing state-of-the-art techniques. (C) 2017 The Authors. Published by Elsevier B.V.

关键词： modularity assortativity coefficient spectral clustering community detection graphs parallel algorithms Lanczos k-means CUDA GPU

来源：评论

学校读者我要写书评

暂无评论

A New parallel Training Algorithm for Optimum-Path Forest-Based Learning 21st

A New Parallel Training Algorithm for Optimum-Path Forest-Ba...

引用

21st Iberoamerican Congress on Pattern Recognition (CIARP)

作者： Culquicondor, Aldo Castelo-Fernandez, Cesar Papa, Joao Paulo Univ Catolica San Pablo Escuela Ciencia Computac Arequipa Peru Sao Paulo State Univ UNESP Comp Sci Dept Bauru Brazil

ISBN: (纸本)9783319522777;9783319522760

In this work, we present a new parallel-driven approach to speed up Optimum-Path Forest (OPF) training phase. In addition, we show how to make OPF up to five times faster for training using a simple parallel-friendly data structure, which can achieve the same accuracy results to the ones obtained by traditional OPF. To the best of our knowledge, we have not observed any work that attempted at parallelizing OPF to date, which turns out to be the main contribution of this paper. The experiments are carried out in four public datasets, showing the proposed approach maintains the trade-off between efficiency and effectiveness.

关键词： Optimum-path forest parallel algorithms Graph algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimal Representation for Right-to-Left parallel Scalar Point Multiplication 5

Optimal Representation for Right-to-Left Parallel Scalar Poi...

引用

5th International Symposium on Computing and Networking (CANDAR)

作者： Phalakarn, Kittiphon Phalakarn, Kittiphop Suppakitpaisarn, Vorapong Chulalongkorn Univ Dept Comp Engn Bangkok Thailand Univ Tokyo Dept Comp Sci Tokyo Japan

ISBN: (纸本)9781538620878

This paper introduces an optimal representation for a right-to-left parallel elliptic curve scalar point multiplication. The right-to-left approach is easier to parallelize than the conventional left-to-right approach. However, unlike the left-to-right approach, there is still no work considering number representations for the right-to-left parallel calculation. By simplifying the implementation by Robert, we devise a mathematical model to capture the computation time of the calculation. Then, for any arbitrary amount of doubling time and addition time, we propose algorithms to generate representations which minimize the time in that model. As a result, we can show a negative result that a conventional representation like NAF is almost optimal. The parallel computation time obtained from any representation cannot be better than NAF by more than 1%.

关键词： information and communication security efficient implementation elliptic curve cryptography scalar point multiplication binary representation parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel multigrid technique: Reduction to independent problems

引用

Mathematical Models and Computer Simulations 2017年第1期9卷 120-126页

作者： Martynenko, S.I. Volokhov, V.M. Yanovskiy, L.S. Institute of Problems of Chemical Physics Russian Academy of Sciences Chernogolovka Moscow oblast 142432 Russian Federation

The unsatisfactory operation of a parallel multigrid algorithm is caused by two reasons: the imbalanced load of processors and the intensive exchanges of data between them. The further development of the parallel universal multigrid technique based on the reduction of a difference initial boundary value problem to a set of independent problems is considered. The universal multigrid technique is a single-grid algorithm, which uses the fundamental multigrid principle to minimize the number of problem-dependent components. The use of the same grid for the calculation of a correction eliminates all the difficulties produced by imbalanced loads and intensive exchanges on coarse grids. It has been shown that it is possible to decrease the volume of stored data and the time of computation and to attain nearly absolute parallelism in some cases. The results of some computational experiments with the difference six-order approximation pattern are presented. © 2017, Pleiades Publishing, Ltd.

关键词： geometric multigrid methods parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Longest Common Sequence Algorithm on Multicore Systems Using OpenACC, OpenMP and OpenMPI 11

Parallel Longest Common Sequence Algorithm on Multicore Syst...

引用

11th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC)

作者： Li, Zuqing Goyal, Aakashdeep Kimm, Haklin East Stroudsburg Univ Penn Dept Comp Sci East Stroudsburg PA 18301 USA

ISBN: (纸本)9781538634417

The longest common subsequence (LCS) problem is one of the most useful algorithms being applied in various research areas. This problem is known to be NP-hard for arbitrary data. In this paper, we present a parallel LCS algorithm using the GPU-based OpenACC model, which is based on the existing dynamic approach and parallel anti-diagonal scheme that is applied in order to eliminate the data dependencies. The proposed algorithm in this paper has been benchmarked using four different computing models: OpenMPI, OpenMP, hybrid OpenMPI & OpenMP, and OpenACC model. The parallel LCS algorithm has been implemented using Swiss-Prot databases over these computing models, so that their execution times, speed-ups and speed-ratios have been measured and analogized among them extensively. Our experimental results reveal that the computation of our algorithm on OpenACC (on GPU) is around 16 times faster than the execution on a single CPU, and around 2 times faster than on the octa-core processor systems. The performance of the OpenACC model stands out among the four tested models in solving the LCS problem.

关键词： Heuristic algorithms Dynamic programming Graphics processing units Computational modeling Programming Multicore processing parallel algorithms Computational modeling parallel algorithms Heuristic algorithms dynamic programming Programming Graphics Processing Unit LCS1 gene Legal Executions parallel Lines Multi-core processors algorithms

来源：评论

学校读者我要写书评

暂无评论

Systematic derivation of efficient parallel algorithms for generate-test-α computation

Computer Software

引用

Computer Software 2012年第1期29卷 159-175页

作者： Emoto, Kento School of Information Science and Technology University of Tokyo Japan

What we call "generate-test-α" is a computation pattern in which we do some extra computation, such as choosing the optimal solution, after the usual generate & test computation that enumerates all solutions passing the test. A naive parallel algorithm of the generate-test-α can be given as a composition of parallel skeletons, but it will suffer from a heavy computation cost when the number of generated candidates is large. Such a situation often occurs when we generate a set of substructures from a source data structure. It is known in the field of skeletal parallel programming that a certain class of simplified computation without test phases can be given efficient linear cost algorithms by making systematic transformations exploiting semirings. However, no transformation is known as yet to optimize the generate-test-α computation uniformly. In this paper, we propose a novel transformation to embed the test phases into semirings so that generate-test-α computation can be transformed into a simplified generate-α computation. This transformation allows us to reuse efficient parallel algorithms of generate-α for the generate-test-α computation. In addition, we give powerful optimizations for a class of generate-α computations, so that we can give uniform optimizations for a wide class of generate-test-α computations.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel algorithm for minimum spanning tree on GPU 29

A parallel algorithm for minimum spanning tree on GPU

引用

8th International Symposium on Computer Architecture and High Performance Computing (SBAC-PADW)

作者： de Alencar Vasconcellos, Jucele Franca Caceres, Edson Norberto Mongelli, Henrique Song, Siang Wun Univ Fed Mato Grosso do Sul Coll Comp Campo Grande MS Brazil Univ Sao Paulo Inst Math & Stat Sao Paulo SP Brazil

ISBN: (纸本)9781538648193

Computing a minimum spanning tree (MST) of a graph is a fundamental problem in Graph Theory and arises as a subproblem in many applications. In this paper, we propose a parallel MST algorithm and implement it on a GPU (Graphics Processing Unit). One of the steps of previous parallel MST algorithms is a heavy use of parallel list ranking. Besides the fact that list ranking is present in several parallel libraries, it is very time-consuming. Using a different graph decomposition, called strut, we devised a new parallel MST algorithm that does not make use of the list ranking procedure. Based on the BSP/CGM model we proved that our algorithm is correct and it finds the MST after O(log p) iterations (communication and computation rounds). To show that our algorithm has a good performance on real parallel machines, we have implemented it on GPU. The way that we have designed the parallel algorithm allowed us to exploit the computing power of the GPU. The efficiency of the algorithm was confirmed by our experimental results. The tests performed show that, for randomly constructed graphs, with vertex numbers varying from 10,000 to 30,000 and density between 0.02 and 0.2, the algorithm constructs an MST in a maximum of six iterations. When the graph is not very sparse, our implementation achieved a speedup of more than 50, for some instances as high 296, over a minimum spanning tree sequential algorithm previously proposed in the literature.

关键词： Bipartite graph Graphics processing units parallel algorithms Algorithm design and analysis Vegetation Transforms

来源：评论

学校读者我要写书评

暂无评论

Improved parallel Construction of Wavelet Trees and Rank/Select Structures

Improved Parallel Construction of Wavelet Trees and Rank/Sel...

引用

Data Compression Conference (DCC)

作者： Shun, Julian Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9781509067213

Existing parallel algorithms for wavelet tree construction have a work complexity of O(n log sigma). This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on parallel integer sorting and has either O(n log log n [log sigma/root log n log log n ]) work and polylogarithmic depth, or O(n [ log sigma/root log n ]) work and sub- linear depth. We also describe another algorithm that has O(n [ log sigma/root log n] ) work and O(sigma + logn) depth. We then show how to use similar ideas to construct variants of wavelet trees (arbitrary- shaped binary trees and multiary trees) as well as wavelet matrices in parallel with lower work complexity than prior algorithms. Finally, we show that the rank and select structures on binary sequences and multiary sequences, which are stored on wavelet tree nodes, can be constructed in parallel with improved work bounds, matching those of the best existing sequential algorithms for constructing rank and select structures.

关键词： parallel algorithms Complexity theory Standards Sorting Table lookup

来源：评论

学校读者我要写书评

暂无评论

parallel Algorithm for Single-Source Earliest-Arrival Problem in Temporal Graphs 46

Parallel Algorithm for Single-Source Earliest-Arrival Proble...

引用

46th International Conference on parallel Processing Workshops (ICPPW)

作者： Ni, Peng Hanai, Masatoshi Tan, Wen Jun Wang, Chen Cai, Wentong SAP Innovat Ctr Network Potsdam Germany Nanyang Technol Univ Singapore Singapore

ISBN: (纸本)9781538610428

Many real-world networks, including online social networks and communication networks, are commonly modeled as temporal graphs. Answering earliest-arrival queries in temporal graphs is one of the most fundamental studies with numerous applications, such as information diffusion and measuring temporal closeness centrality. As graph sizes are growing rapidly, speedup of query execution time becomes even more important. In this paper, we propose a novel edge-centric parallel algorithm for solving single-source earliest-arrival problem in temporal graphs based on a new data structure named Edge-Scan-Dependency Graph (ESD-Graph). We evaluate the proposed parallel algorithm by theoretical analysis as well as by empirical experiments on real-world temporal graphs and synthetic graphs. Empirical results show that the new parallel algorithm outperforms the existing serial algorithm by up to 8.2 and 9.5 times on multi-core processors for real-world data and synthetic data respectively.

关键词： Algorithm design and analysis Arrays parallel algorithms Image edge detection Multicore processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：