检索结果-内蒙古大学图书馆

arXiv 2018年

作者： Sung, Wonyong Park, Jinhwan Seoul National University Seoul Korea Republic of

As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests. When a single stream recurrent neural network (RNN) is executed for a personal user in embedded systems, it demands a large amount of DRAM accesses because the network size is usually much bigger than the cache size and the weights of an RNN are used only once at each time step. We overcome this problem by parallelizing the algorithm and executing it multiple time steps at a time. This approach also reduces the power consumption by lowering the number of DRAM accesses. QRNN (Quasi Recurrent Neural Networks) and SRU (Simple Recurrent Unit) based recurrent neural networks are used for implementation. The experiments for SRU showed about 300% and 930% of speedup when the numbers of multi time steps are 4 and 16, respectively, in an ARM CPU based system. Copyright © 2018, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallelism in randomized incremental algorithms

arXiv

引用

arXiv 2018年

作者： Blelloch, Guy E. Gu, Yan Shun, Julian Sun, Yihan CMU MIT CSAIL

In this paper we show that many sequential randomized incremental algorithms are in fact parallel. We consider algorithms for several problems including Delaunay triangulation, linear programming, closest pair, smallest enclosing disk, least-element lists, and strongly connected components. We analyze the dependences between iterations in an algorithm, and show that the dependence structure is shallow with high probability, or that by violating some dependences the structure is shallow and the work is not increased significantly. We identify three types of algorithms based on their dependences and present a framework for analyzing each type. Using the framework gives work-efficient polylogarithmic-depth parallel algorithms for most of the problems that we study. This paper shows the first incremental Delaunay triangulation algorithm with optimal work and polylogarithmic depth, which is an open problem for over 30 years. This result is important since most implementations of parallel Delaunay triangulation use the incremental approach. Our results also improve bounds on strongly connected components and least-elements lists, and significantly simplify parallel algorithms for several problems. Copyright © 2018, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Adaptive Anonymization of Data using b-Edge Cover 18

Adaptive Anonymization of Data using b-Edge Cover

引用

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis

作者： Arif Khan Krzysztof Choromanski Alex Pothen S. M. Ferdous Mahantesh Halappanavar Antonino Tumeo Pacific Northwest National Laboratory Google Brain Robotics Purdue University

ISBN: (纸本)9781538683859

We explore the problem of sharing data that pertains to individuals with anonymity guarantees, where each user requires a desired level of privacy. We propose the first shared-memory as well as distributed memory parallel algorithms for the adaptive anonymity problem that achieves this goal, and produces high quality anonymized datasets. The new algorithm is based on an optimization procedure that iteratively computes weights on the edges of a dissimilarity matrix, and at each iteration computes a minimum weighted b-Edge Cover in the graph. We describe how a 2-approximation algorithm for computing the b-Edge Cover can be used to solve the adaptive anonymity problem in parallel. We are able to solve adaptive anonymity problems with hundreds of thousands of instances and hundreds of features on a supercomputer in under five minutes. Our algorithm scales up to 8K cores on a distributed memory supercomputer, while also providing good speedups on shared memory multiprocessors. On smaller problems where an a Belief Propagation algorithm is feasible, our algorithm is two orders of magnitude faster.

关键词： Approximation algorithms Data privacy Privacy Optimization Medical services Memory management parallel algorithms Approximation algorithms parallel algorithms physician services Data privacy Store management Privacy Anonymity supercomputer Medicine Optimization Procedure Self tuning

来源：评论

学校读者我要写书评

暂无评论

Extreme Scale De Novo Metagenome Assembly 18

Extreme Scale De Novo Metagenome Assembly

引用

ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis

作者： Evangelos Georganas Rob Egan Steven Hofmeyr Eugene Goltsman Bill Arndt Andrew Tritt Aydin Bulu? Leonid Oliker Katherine Yelick Intel Corp. Joint Genome Institute National Energy Research Scientific Computing Center

ISBN: (纸本)9781538683859

Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipMer pipeline, a high-quality and high-performance metagenome assembler that employs an iterative de Bruijn graph approach. MetaHipMer leverages a specialized scaffolding algorithm that produces long scaffolds and accommodates the idiosyncrasies of metagenomes. MetaHipMer is end-to-end parallelized using the Unified parallel C language and therefore can run seamlessly on shared and distributed-memory systems. Experimental results show that MetaHipMer matches or outperforms the state-of-the-art tools in terms of accuracy. Moreover, MetaHipMer scales efficiently to large concurrencies and is able to assemble previously intractable grand challenge metagenomes. We demonstrate the unprecedented capability of MetaHipMer by computing the first full assembly of the Twitchell Wetlands dataset, consisting of 7.5 billion reads - size 2.6 TBytes.

关键词： Bioinformatics Genomics Tools DNA Pipelines parallel algorithms Iterative algorithms Metagenome Iterative algorithms pipeline parallel algorithms Genomics Bioinformatics assemblies DNA piping C language

来源：评论

学校读者我要写书评

暂无评论

引用

Simpasio em Sistemas Computacionais (WSCAD-SSC)

作者： Leonardo Afonso Amorim Mateus F. Freitas Paulo Henrique da Silva Wellington S. Martins Instituto de Informatica (INF) Universidade Federal de Goias (UFG) Alameda Palmeiras Goiás Brazil

ISBN: (纸本)9781728137735

The k nearest neighbors (kNN) is an algorithm for finding the closest k points in metric spaces. Due to its high computational costs, many parallel solutions have been proposed, including some implementations targeted at modern accelerators. However, most approaches assume relatively low dimensionality and dense data. Such conditions do not apply to textual datasets, which are known for their high dimensionality and sparsity. This work presents a fine-grained parallel algorithm that applies filtering technique based on most common important terms of the query document using an inverted index and its implementation on GPU. Our method improves the top k nearest neighbors search in textual datasets by up to 37× with a single GPU.

关键词： Graphics processing units Indexes Approximation algorithms parallel algorithms Sorting Proposals

来源：评论

学校读者我要写书评

暂无评论

EMSR: An Efficient Method of Streaming Reasoning

EMSR: An Efficient Method of Streaming Reasoning

引用

International Conference on Cloud Computing, Big Data and Blockchain (ICCBB)

作者： Juan Li Jingbin Wang Jing Lin College of Mathematics and Computer Science Fuzhou University Fuzhou

ISBN: (纸本)9781728112787

Recently the topic of how to improve the efficiency of semantic reasoning on large-scale knowledge graph has gained considerable attention from global researchers and engineers. Most of existing distributed parallel algorithms for inference based on OWL Horst ruleset require multiple iterations. Moreover, in the process of which, the data stored repeatedly generate redundant records, resulting the reasoning in low overall efficiency. In order to address the challenges, firstly, we presents a storage solution combining variable storage and multivariable connector in accordance with characteristics of OWL Horst ruleset in the context of knowledge graph, aiming at reduction of repeated data storage and data transmission cost. Then, on the basis of such scheme, a streaming reasoning algorithm is introduced to curtail iterations and promote efficiency. Experimental results on LUBM and DBpedia datasets demonstrate that our proposed framework and algorithm could deliver superior performance in scalability and efficiency.

关键词： data handling graph theory inference mechanisms knowledge representation languages parallel algorithms Inference mechanisms parallel algorithms data processing Graph theory Reasoning SOLUTION STORAGE Ruleset knowledge representation languages Streaming Data Storage data transmission

来源：评论

学校读者我要写书评

暂无评论

Scalable katz ranking computation in large static and dynamic graphs

arXiv

引用

arXiv 2018年

作者： Grinten, Alexander Vander Bergamini, Elisabetta Green, Oded Bader, David A. Meyerhenke, Henning Department of Computer Science Humboldt-Universität zu Berlin Germany Karlsruhe Institute of Technology Germany School of Computational Science and Engineering Georgia Institute of Technology United States Department of Computer Science Humboldt-Universität zu Berlin Germany

Network analysis defines a number of centrality measures to identify the most central nodes in a network. Fast computation of those measures is a major challenge in algorithmic network analysis. Aside from closeness and betweenness, Katz centrality is one of the established centrality measures. In this paper, we consider the problem of computing rankings for Katz centrality. In particular, we propose upper and lower bounds on the Katz score of a given node. While previous approaches relied on numerical approximation or heuristics to compute Katz centrality rankings, we construct an algorithm that iteratively improves those upper and lower bounds until a correct Katz ranking is obtained. We extend our algorithm to dynamic graphs while maintaining its correctness guarantees. Experiments demonstrate that our static graph algorithm outperforms both numerical approaches and heuristics with speedups between 1.5× and 3.5×, depending on the desired quality guarantees. Our dynamic graph algorithm improves upon the static algorithm for update batches of less than 10000 edges. We provide efficient parallel CPU and GPU implementations of our algorithms that enable near real-time Katz centrality computation for graphs with hundreds of millions of nodes in fractions of seconds. Copyright © 2018, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

The Automatic algorithms' Adaptation Method for Embedded Multi-Core Configurations

The Automatic Algorithms' Adaptation Method for Embedded Mul...

引用

East-West Design & Test Symposium (EWDTS)

作者： Alexey N. Ivutin Alexander S. Novikov Anna G. Troshina Department of Computer technology Tula State University Tula Russia

ISBN: (纸本)9781538657119

The article presents an approach to the problem of automatic adaptation of sequential algorithms for their parallel execution on embedded systems with specialized multi-core processors with a high degree of integration and optimizing access to RAM parallel execution. The mathematical apparatus of extended Petri nets is proposed to simulate and perform the adaptation of the code. It provides also the ability to verify program code and to define control and data relations between the operations. The approach for estimation time characteristics for the code optimization for embedded systems with asymmetric parallelism based on Petri nets is proposed.

关键词： embedded systems multiprocessing systems parallel algorithms Petri nets program verification

来源：评论

学校读者我要写书评

暂无评论

parallelizing Recursive Backtracking Based Subgraph Matching on a Single Machine

Parallelizing Recursive Backtracking Based Subgraph Matching...

引用

International Conference on parallel and Distributed Systems (ICPADS)

作者： Shixuan Sun Qiong Luo Hong Kong University of Science and Technology Kowloon HK Hong Kong University of Science and Technology Hong Kong China

We propose PSM, an algorithmic framework to parallelize a common class of subgraph matching algorithms, which are based on recursive backtracking. Specifically, we abstract the matching process as a tree search in the state space and different matching algorithms as different orders in the search. Subsequently, we parallelize such subgraph matching by dividing up the state space search tree and exploring it in parallel. Different from traditional approaches that parallelize the search by each individual state, we dynamically split the state tree into search regions each of which consist of a subtree. We further optimize PSM for load balance and communication efficiency. As case studies, we have parallelized three representative recursive backtracking based subgraph matching algorithms in PSM and studied their performance in comparison with their sequential counterparts. Our results show that the PSM -style parallel algorithms achieved a speedup of 15X-19X on the in-memory execution time on a twenty-core machine.

关键词： Task analysis Heuristic algorithms Multicore processing Computer science parallel algorithms Indexes

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for bipartite matching problems on distributed memory computers

引用

parallel COMPUTING 2011年第12期37卷 820-845页

作者： Langguth, Johannes Patwary, Md. Mostofa Ali Manne, Fredrik Univ Bergen Dept Informat N-5008 Bergen Norway

We present a new parallel algorithm for computing a maximum cardinality matching in a bipartite graph suitable for distributed memory computers. The presented algorithm is based on the PUSH-RELABEL. algorithm which is known to be one of the fastest algorithms for the bipartite matching problem. Previous attempts at developing parallel implementations of it have focused on shared memory computers using only a limited number of processors. We first present a straightforward adaptation of these shared memory algorithms to distributed memory computers. However, this is not a viable approach as it requires too much communication. We then develop our new algorithm by modifying the previous approach through a sequence of steps with the main goal being to reduce the amount of communication and to increase load balance. The first goal is achieved by changing the algorithm so that many push and relabel operations can be performed locally between communication rounds and also by selecting augmenting paths that cross processor boundaries infrequently. To achieve good load balance, we limit the speed at which global relabelings traverse the graph. In several experiments on a large number of instances, we study weak and strong scalability of our algorithm using up to 128 processors. The algorithm can also be used to find epsilon-approximate matchings quickly. (C) 2011 Elsevier B.V. All rights reserved.

关键词： Bipartite graphs parallel algorithms Matching

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：