As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests. When a single stream recurrent neural network (RNN) is executed...
详细信息
In this paper we show that many sequential randomized incremental algorithms are in fact parallel. We consider algorithms for several problems including Delaunay triangulation, linear programming, closest pair, smalle...
详细信息
We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy. We propose the first shared-memory as well as distributed memory para...
详细信息
ISBN:
(纸本)9781538683859
We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy. We propose the first shared-memory as well as distributed memory parallel algorithms for the adaptive anonymity problem that achieves this goal, and produces high quality anonymized datasets. The new algorithm is based on an optimization procedure that iteratively computes weights on the edges of a dissimilarity matrix, and at each iteration computes a minimum weighted b-Edge Cover in the graph. We describe how a 2-approximation algorithm for computing the b-Edge Cover can be used to solve the adaptive anonymity problem in parallel. We are able to solve adaptive anonymity problems with hundreds of thousands of instances and hundreds of features on a supercomputer in under five minutes. Our algorithm scales up to 8K cores on a distributed memory supercomputer, while also providing good speedups on shared memory multiprocessors. On smaller problems where an a Belief Propagation algorithm is feasible, our algorithm is two orders of magnitude faster.
Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's ge...
详细信息
ISBN:
(纸本)9781538683859
Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.
The k nearest neighbors (kNN) is an algorithm for finding the closest k points in metric spaces. Due to its high computational costs, many parallel solutions have been proposed, including some implementations targeted...
详细信息
ISBN:
(纸本)9781728137735
The k nearest neighbors (kNN) is an algorithm for finding the closest k points in metric spaces. Due to its high computational costs, many parallel solutions have been proposed, including some implementations targeted at modern accelerators. However, most approaches assume relatively low dimensionality and dense data. Such conditions do not apply to textual datasets, which are known for their high dimensionality and sparsity. This work presents a fine-grained parallel algorithm that applies filtering technique based on most common important terms of the query document using an inverted index and its implementation on GPU. Our method improves the top k nearest neighbors search in textual datasets by up to 37× with a single GPU.
Recently the topic of how to improve the efficiency of semantic reasoning on large-scale knowledge graph has gained considerable attention from global researchers and engineers. Most of existing distributed parallel a...
详细信息
ISBN:
(纸本)9781728112787
Recently the topic of how to improve the efficiency of semantic reasoning on large-scale knowledge graph has gained considerable attention from global researchers and engineers. Most of existing distributed parallel algorithms for inference based on OWL Horst ruleset require multiple iterations. Moreover, in the process of which, the data stored repeatedly generate redundant records, resulting the reasoning in low overall efficiency. In order to address the challenges, firstly, we presents a storage solution combining variable storage and multivariable connector in accordance with characteristics of OWL Horst ruleset in the context of knowledge graph, aiming at reduction of repeated data storage and data transmission cost. Then, on the basis of such scheme, a streaming reasoning algorithm is introduced to curtail iterations and promote efficiency. Experimental results on LUBM and DBpedia datasets demonstrate that our proposed framework and algorithm could deliver superior performance in scalability and efficiency.
Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness a...
详细信息
The article presents an approach to the problem of automatic adaptation of sequential algorithms for their parallel execution on embedded systems with specialized multi-core processors with a high degree of integratio...
详细信息
ISBN:
(纸本)9781538657119
The article presents an approach to the problem of automatic adaptation of sequential algorithms for their parallel execution on embedded systems with specialized multi-core processors with a high degree of integration and optimizing access to RAM parallel execution. The mathematical apparatus of extended Petri nets is proposed to simulate and perform the adaptation of the code. It provides also the ability to verify program code and to define control and data relations between the operations. The approach for estimation time characteristics for the code optimization for embedded systems with asymmetric parallelism based on Petri nets is proposed.
We propose PSM, an algorithmic framework to parallelize a common class of subgraph matching algorithms, which are based on recursive backtracking. Specifically, we abstract the matching process as a tree search in the...
详细信息
We propose PSM, an algorithmic framework to parallelize a common class of subgraph matching algorithms, which are based on recursive backtracking. Specifically, we abstract the matching process as a tree search in the state space and different matching algorithms as different orders in the search. Subsequently, we parallelize such subgraph matching by dividing up the state space search tree and exploring it in parallel. Different from traditional approaches that parallelize the search by each individual state, we dynamically split the state tree into search regions each of which consist of a subtree. We further optimize PSM for load balance and communication efficiency. As case studies, we have parallelized three representative recursive backtracking based subgraph matching algorithms in PSM and studied their performance in comparison with their sequential counterparts. Our results show that the PSM -style parallel algorithms achieved a speedup of 15X-19X on the in-memory execution time on a twenty-core machine.
We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers. The presented algorithm is based on the PUSH-RELABEL. algorithm which is...
详细信息
We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers. The presented algorithm is based on the PUSH-RELABEL. algorithm which is known to be one of the fastest algorithms for the bipartite matching problem. Previous attempts at developing parallel implementations of it have focused on shared memory computers using only a limited number of processors. We first present a straightforward adaptation of these shared memory algorithms to distributed memory computers. However, this is not a viable approach as it requires too much communication. We then develop our new algorithm by modifying the previous approach through a sequence of steps with the main goal being to reduce the amount of communication and to increase load balance. The first goal is achieved by changing the algorithm so that many push and relabel operations can be performed locally between communication rounds and also by selecting augmenting paths that cross processor boundaries infrequently. To achieve good load balance, we limit the speed at which global relabelings traverse the graph. In several experiments on a large number of instances, we study weak and strong scalability of our algorithm using up to 128 processors. The algorithm can also be used to find epsilon-approximate matchings quickly. (C) 2011 Elsevier B.V. All rights reserved.
暂无评论