检索结果-内蒙古大学图书馆

ACM International Conference on parallel Architectures and Compilation Techniques (PACT)

作者： Byma, Stuart Dhasade, Akash Altenhoff, Adrian Dessimoz, Christophe Larus, James R. Ecole Polytech Fed Lausanne Lausanne Switzerland IIT Tirupati Tirupati Andhra Pradesh India Swiss Fed Inst Technol Zurich Switzerland Univ Lausanne Lausanne Switzerland

ISBN: (纸本)9781450380751

This paper describes a new technique for parallelizing protein clustering, an important bioinformatics computation for the analysis of protein sequences. Protein clustering identifies groups of proteins that are similar because they share long sequences of similar amino acids. Given a collection of protein sequences, clustering can significantly reduce the computational effort required to identify all similar sequences by avoiding many negative comparisons. The challenge, however, is to build a clustering that misses as few similar sequences (or elements, more generally) as possible. In this paper, we introduce precise clustering, a property that requires each pair of similar elements to appear together in at least one cluster. We show that transitivity in the data can be leveraged to merge clusters while maintaining a precise clustering, providing a basis for independently forming clusters. This allows us reformulate clustering as a bottom-up merge of independent clusters in a new algorithm called ClusterMerge. ClusterMerge exposes parallelism, enabling fast and scalable implementations. We apply ClusterMerge to find similar amino acid sequences in a collection of proteins. ClusterMerge identifies 99.8% of similar pairs found by a full O(n(2)) comparison, with only half as many comparisons. More importantly, ClusterMerge is highly amenable to parallel and distributed computation. Our implementation achieves a speedup of 604 times on 768 cores (1400 times faster than a comparable single-threaded clustering implementation), a strong scaling efficiency of 90%, and a weak scaling efficiency of nearly 100%.

关键词： bioinformatics protein clustering parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

An Algorithm for the Sequence Alignment with Gap Penalty Problem using Multiway Divide-and-Conquer and Matrix Transposition

引用

INFORMATION PROCESSING LETTERS 2022年 173卷

作者： Shubham Prakash, Surya Ganapathi, Pramod Indian Inst Technol Indore Discipline Comp Sci & Engn Indore India SUNY Stony Brook Dept Comp Sci Stony Brook NY 11794 USA

We present a cache-efficient parallel algorithm for the sequence alignment with gap penalty problem for shared-memory machines using multiway divide-and-conquer and not-in-place matrix transposition. Our r-way divide-and-conquer algorithm, for a fixed natural number r >= 2, performs Theta (n(3)) work, achieves Theta (n(logr(2r-1))) span, and incurs O(n(3)/(BM) + (n(2)/B)log root M) serial cache misses for n > gamma M, and incurs O ((n(2)/B)log(n/root M)) serial cache misses for alpha root M < n <= gamma M, where, M is the cache size, B is the cache line size, and alpha and gamma are constants. Published by Elsevier B.V.

关键词： Sequence alignment parallel algorithms Multiway divide-and-conquer Dynamic programming Cache-efficient

来源：评论

学校读者我要写书评

暂无评论

An Efficient and parallel Electromagnetic Solver for Complex Interconnects in Layered Media 29

An Efficient and Parallel Electromagnetic Solver for Complex...

引用

IEEE 29th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS)

作者： Marek, Damian Sharma, Shashwat Triverio, Piero Univ Toronto Edward S Rogers Sr Dept Elect & Comp Engn Toronto ON Canada

ISBN: (纸本)9781728161617

A novel parallel solver based on the adaptive integral method (AIM) is proposed for the electromagnetic analysis of electrical interconnects in layered media. We show that graph partitioning techniques can be used to optimally distribute, across thousands of processes, the computations related to both matrix filling and system solution. The proposed workload distribution strategy is compared to existing techniques through a scalability study on a large realistic interposer model in layered media.

关键词： surface integral equation method adaptive integral method parallel algorithms skin effect modeling

来源：评论

学校读者我要写书评

暂无评论

Simple parallel and Distributed algorithms for Spectral Graph Sparsification 14

Simple Parallel and Distributed Algorithms for Spectral Grap...

引用

26th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Koutis, Ioannis Univ Puerto Rico Rio Piedras Comp Sci Dept San Juan PR 00925 USA

ISBN: (纸本)9781450328210

We describe a simple algorithm for spectral graph sparsification, based on iterative computations of weighted spanners and uniform sampling. Leveraging the algorithms of Baswana and Sen for computing spanners, we obtain the first distributed spectral sparsification algorithm. We also obtain a parallel algorithm with improved work and time guarantees. Combining this algorithm with the parallel framework of Peng and Spielman for solving symmetric diagonally dominant linear systems, we get a parallel solver which is much closer to being practical and significantly more efficient in terms of the total work.

关键词： parallel algorithms Distributed algorithms Spectral Sparsification SDD linear systems

来源：评论

学校读者我要写书评

暂无评论

parallel Peeling algorithms 14

Parallel Peeling Algorithms

引用

26th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Jiang, Jiayang Mitzenmacher, Michael Thaler, Justin Harvard Univ Sch Engn & Appl Sci Cambridge MA 02138 USA Univ Calif Berkeley Simons Inst Theory Comp Berkeley CA USA

ISBN: (纸本)9781450328210

The analysis of several algorithms and data structures can be framed as a peeling process on a random hypergraph: vertices with degree less than k are removed until there are no vertices of degree less than k left. The remaining hypergraph is known as the k-core. In this paper, we analyze parallel peeling processes, where in each round, all vertices of degree less than k are removed. It is known that, below a specific edge density threshold, the k-core is empty with high probability. We show that, with high probability, below this threshold, only 1/log ((k-1)(r-1)) log logn + O(1) rounds of peeling are needed to obtain the empty k-core for r-uniform hypergraphs. Interestingly, we show that above this threshold, Omega(logn) rounds of peeling are required to find the non-empty k-core. Since most algorithms and data structures aim to peel to an empty kcore, this asymmetry appears fortunate. We verify the theoretical results both with simulation and with a parallel implementation using graphics processing units (GPUs). Our implementation provides insights into how to structure parallel peeling algorithms for efficiency in practice.

关键词： parallel algorithms peeling algorithms gpu implementations invertible bloom lookup tables random hypergraphs

来源：评论

学校读者我要写书评

暂无评论

Distributed-Memory parallel Symmetric Nonnegative Matrix Factorization

Distributed-Memory Parallel Symmetric Nonnegative Matrix Fac...

引用

International Conference on High Performance Computing, Networking, Storage and Analysis (SC)

作者： Eswar, Srinivas Hayashi, Koby Ballard, Grey Kannan, Ramakrishnan Vuduc, Richard Park, Haesun Georgia Inst Technol Dept Computat Sci & Engn Atlanta GA 30332 USA Wake Forest Univ Dept Comp Sci Winston Salem NC 27101 USA Oak Ridge Natl Lab Computat Data Analyt Grp Oak Ridge TN USA

ISBN: (纸本)9781728199986

We develop the first distributed -memory parallel implementation of Symmetric Nonnegative Matrix Factorization (SymNMF), a key data analytics kernel 14 clustering and dimensionality reduction. Our implementation includes two different algorithms for SytnNMF, which give comparable results in terms of time and accuracy. The first algorithm is a parallelization of an existing sequential approach that uses solvers for nonsymmetric NNW The second algorithm is a novel approach based on the Gauss -Newton method. It exploits second -order information without incurring large computational and memory costs. We evaluate the scalability of our algorithms on the Summit system at Oak Ridge National Laboratory, scaling up to 128 nodes (4,096 cores) with 70% efficiency. Additionally, we demonstrate our software on an image segmentation task.

关键词： High performance computing Newton method parallel algorithms Symmetric Matrices

来源：评论

学校读者我要写书评

暂无评论

A Lower Bound for parallel Submodular Minimization 2020

A Lower Bound for Parallel Submodular Minimization

引用

52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC)

作者： Balkanski, Eric Singer, Yaron Harvard Univ Cambridge MA 02138 USA

ISBN: (纸本)9781450369794

In this paper, we study submodular function minimization in the adaptive complexity model. Seminal work by Grotschel, Lovasz, and Schrijver shows that with oracle access to a function f, the problem of submodular minimization can be solved exactly with poly(n) queries to f. A long line of work has since then been dedicated to the acceleration of submodular minimization. In particular, recent work obtains a (strongly) polynomial time algorithm with (O) over tilde (n(3)) query complexity. A natural way to accelerate computation is via parallelization, though very little is known about the extent to which submodular minimization can be parallelized. A natural measure for the parallel runtime of a black-box optimization algorithm is its adaptivity, as recently introduced in the context of submodular maximization. Informally, the adaptivity of an algorithm is the number of sequential rounds it makes when each round can execute polynomially-many function evaluations in parallel. In the past two years there have been breakthroughs in the study of adaptivity for both submodular maximization and convex minimization, in particular an exponential improvement in the parallel running time of submodular maximization was obtained with a O(logn)-adaptive algorithm. Whether submodular minimization can enjoy, thanks to parallelization, the same dramatic speedups as submodular maximization is unknown. To date, we do not know of any polynomial time algorithm for solving submodular minimization whose adaptivity is subquadratic in n. We initiate the study of the adaptivity of submodular function minimization by giving the first non-trivial lower bound for the parallel runtime of submodular minimization. We show that there is no o(log n/log log n)-adaptive algorithm with poly(n) queries which solves the problem of submodular minimization. This is the first adaptivity lower bound for unconstrained submodular optimization (whether for maximization or minimization) and the analysis relies on

关键词： Adaptivity submodular minimization parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Theoretically-Efficient and Practical parallel DBSCAN 20

Theoretically-Efficient and Practical Parallel DBSCAN

引用

ACM SIGMOD International Conference on Management of Data (SIGMOD)

作者： Wang, Yiqiu Gu, Yan Shun, Julian MIT CSAIL Cambridge MA 02139 USA UC Riverside Riverside CA USA

ISBN: (纸本)9781450367356

The DBSCAN method for spatial clustering has received significant attention due to its applicability in a variety of data analysis tasks. There are fast sequential algorithms for DBSCAN in Euclidean space that take O(n logn) work for two dimensions, sub-quadratic work for three or more dimensions, and can be computed approximately in linear work for any constant number of dimensions. However, existing parallel DBSCAN algorithms require quadratic work in the worst case. This paper bridges the gap between theory and practice of parallel DBSCAN by presenting new parallel algorithms for Euclidean exact DBSCAN and approximate DBSCAN that match the work bounds of their sequential counterparts, and are highly parallel (polylogarithmic depth). We present implementations of our algorithms along with optimizations that improve their practical performance. We perform a comprehensive experimental evaluation of our algorithms on a variety of datasets and parameter settings. Our experiments on a 36-core machine with two-way hyper-threading show that our implementations outperform existing parallel implementations by up to several orders of magnitude, and achieve speedups of up to 33x over the best sequential algorithms.

关键词： spatial clustering parallel algorithms DBScan

来源：评论

学校读者我要写书评

暂无评论

Accelerating Domain Propagation: an Efficient GPU-parallel Algorithm over Sparse Matrices 10

Accelerating Domain Propagation: an Efficient GPU-Parallel A...

引用

10th IEEE/ACM Workshop on Irregular Applications - Architectures and algorithms (IA3)

作者： Sofranac, Boro Gleixner, Ambros Pokutta, Sebastian Berlin Inst Technol Berlin Germany Zuse Inst Berlin Berlin Germany HTW Berlin Berlin Germany

ISBN: (纸本)9781665415576

Fast domain propagation of linear constraints has become a crucial component of today's best algorithms and solvers for mixed integer programming and pseudo-boolean optimization to achieve peak solving performance. Irregularities in the form of dynamic algorithmic behaviour, dependency structures, and sparsity patterns in the input data make efficient implementations of domain propagation on GPUs and, more generally, on parallel architectures challenging. This is one of the main reasons why domain propagation in state-of-the-art solvers is single thread only. In this paper, we present a new algorithm for domain propagation which (a) avoids these problems and allows for an efficient implementation on GPUs, and is (b) capable of running propagation rounds entirely on the GPU, without any need for synchronization or communication with the CPU. We present extensive computational results which demonstrate the effectiveness of our approach and show that ample speedups are possible on practically relevant problems: on state-of-theart GPUs, our geometric mean speed-up for reasonably-large instances is around 10x to 20x and can be as high as 195x on favorably-large instances.

关键词： Mixed Integer Linear Programming MIP GPU Domain Propagation Bound Tightening parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Approximate Undirected Shortest Paths via Low Hop Emulators 2020

Parallel Approximate Undirected Shortest Paths via Low Hop E...

引用

52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC)

作者： Andoni, Alexandr Stein, Clifford Zhong, Peilin Columbia Univ New York NY 10027 USA

ISBN: (纸本)9781450369794

We present a (1 + epsilon) -approximate parallel algorithm for computing shortest paths in undirected graphs, achieving poly(log n) depth and mpoly(log n) work for n-nodes m-edges graphs. Although sequential algorithms with (nearly) optimal running time have been known for several decades, near-optimal parallel algorithms have turned out to be a much tougher challenge. For (1 + epsilon) -approximation, all prior algorithms with poly(log n) depth perform at least Omega(mn(c)) work for some constant c > 0. Improving this long-standing upper bound obtained by Cohen (STOC'94) has been open for 25 years. We develop several new tools of independent interest. One of them is a new notion beyond hopsets - low hop emulator - a poly(log n)-approximate emulator graph in which every shortest path has at most O(log log n) hops (edges). Direct applications of the low hop emulators are parallel algorithms for poly(log n)-approximate single source shortest path (SSSP), Bourgain's embedding, metric tree embedding, and low diameter decomposition, all with poly(log n) depth and mpoly(log n) work. To boost the approximation ratio to (1 + epsilon), we introduce compressible preconditioners and apply it inside Sherman's framework (SODA'17) to solve the more general problem of uncapacitated minimum cost flow (a.k.a., transshipment problem). Our algorithm computes a (1 + epsilon)-approximate uncapacitated minimum cost flow in poly(log n) depth using mpoly(log n) work. As a consequence, it also improves the state-of-the-art sequential running time from m . 2(O(root log n)) to mpoly(log n).

关键词： parallel algorithms shortest paths minimum cost flow low hop emulators

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：