检索结果-内蒙古大学图书馆

36th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Dhulipala, Laxman Dong, Xiaojun Gowda, Kishen N. Gu, Yan Univ Maryland College Pk MD 20742 USA Univ Calif Riverside Riverside CA 92521 USA

ISBN: (纸本)9798400704161

Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree), the SLD of) is a binary dendrogram that summarizes the n = 1 clusterings obtained by contracting the edges of T in order of weight. Existing algorithms for computing the SLD all require Omega(n log n) work where n = vertical bar T vertical bar. Furthermore, to the best of our knowledge no prior work provides a parallel algorithm obtaining non-trivial speedup for this problem. In this paper, we design faster parallel algorithms for computing SLDs both in theory and in practice based on new structural results about SLDs. In particular, we obtain a deterministic output-sensitive parallel algorithm based on parallel tree contraction that requires O(n log h) work and O(log(2) n log(2) h) depth, where h is the height of the output SLD. We also give a deterministic bottom-up algorithm for the problem inspired by the nearest-neighbor chain algorithm for hierarchical agglomerative clustering, and show that it achieves O(n log h) work and O(h log n) depth. Our results are based on a novel divide-and-conquer framework for building SLDs, inspired by divide-and-conquer algorithms for Cartesian trees. Our new algorithms can quickly compute the SLD on billion-scale trees, and obtain up to 150x speedup over the highly-efficient Union-Find algorithm typically used to compute SLDs in practice.

关键词： Single-Linkage Clustering Hierarchical Graph Clustering HAC Dendrograms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Teaching parallel algorithms Using the Binary-Forking Model

Teaching Parallel Algorithms Using the Binary-Forking Model

引用

1st International Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Blelloch, Guy E. Gu, Yan Sun, Yihan Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Calif Riverside Riverside CA 92521 USA

ISBN: (纸本)9798350364613;9798350364606

In this paper, we share our experience in teaching parallel algorithms with the binary-forking model. With hardware advances, multicore computers are now ubiquitous. This has created a substantial demand in both research and industry to harness the capabilities of parallel computing. It is thus important to incorporate parallelism in computer science education, especially in the early stages of the curriculum. However, it is commonly believed that understanding and using parallelism requires a deep understanding of computer systems and architecture, which complicates introducing parallelism to young students and non-experts. We propose to use the binary-forking model in teaching parallel algorithms, proposed by our previous research work. This model is meant to capture the performance of algorithms on modern multicore shared-memory machines, which is a simple abstraction to isolate algorithm design ideas with system-level details. The abstraction allows for simple analysis based on the work-span model in theory, and can be directly implemented as parallel programs in practice. In this paper, we briefly overview some basic primitives in this model, and provide a list of algorithms that we believe are well-suited in parallel algorithm courses.

关键词： binary-forking model computer science education fork-join parallel algorithms parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Hierarchical Nucleus Decomposition

引用

Proceedings of the ACM on Management of Data 2024年第1期2卷 1-27页

作者： Jessica Shi Laxman Dhulipala Julian Shun MIT CSAIL Cambridge MA USA University of Maryland College Park College Park MA USA

Nucleus decompositions have been shown to be a useful tool for finding dense subgraphs. The coreness value of a clique represents its density based on the number of other cliques it is adjacent to. One useful output of nucleus decomposition is to generate a hierarchy among dense subgraphs at different resolutions. However, existing parallel algorithms for nucleus decomposition do not generate this hierarchy, and only compute the coreness values. This paper presents a scalable parallel algorithm for hierarchy construction, with practical optimizations, such as interleaving the coreness computation with hierarchy construction and using a concurrent union-find data structure in an innovative way to generate the hierarchy. We also introduce a parallel approximation algorithm for nucleus decomposition, which achieves much lower span in theory and better performance in practice. We prove strong theoretical bounds on the work and span (parallel time) of our *** a 30-core machine with two-way hyper-threading, our parallel hierarchy construction algorithm achieves up to a 58.84x speedup over the state-of-the-art sequential hierarchy construction algorithm by Sariyuce et al. and up to a 30.96x self-relative parallel speedup. On the same machine, our approximation algorithm achieves a 3.3x speedup over our exact algorithm, while generating coreness estimates with a multiplicative error of 1.33x on average.

关键词： graph processing parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Minimal Nondeterministic Finite Automata Inference

引用

FUNDAMENTA INFORMATICAE 2021年第3期178卷 203-227页

作者： Jastrzab, Tomasz Czech, Zbigniew J. Wieczorek, Wojciech Silesian Tech Univ Gliwice Poland Univ Bielsko Biala Bielsko Biala Poland

The goal of this paper is to develop the parallel algorithms that, on input of a learning sample, identify a regular language by means of a nondeterministic finite automaton (NFA). A sample is a pair of finite sets containing positive and negative examples. Given a sample, a minimal NFA that represents the target regular language is sought. We define the task of finding an NFA, which accepts all positive examples and rejects all negative ones, as a constraint satisfaction problem, and then propose the parallel algorithms to solve the problem. The results of comprehensive computational experiments on the variety of inference tasks are reported. The question of minimizing an NFA consistent with a learning sample is computationally hard.

关键词： parallel algorithms learning regular languages using nondeterministic finite automata constraint satisfaction and satisfiability problems grammatical inference

来源：评论

学校读者我要写书评

暂无评论

pylspack: parallel algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores

引用

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE 2022年第4期48卷 44-44页

作者： Sobczyk, Aleksandros Gallopoulos, Efstratios IBM Res Europe Zurich Switzerland Swiss Fed Inst Technol Zurich Switzerland Univ Patras HPCLAB Comp Engn & Informat Dept Patras Greece

We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix, and (iii) computation of the squared row norms of the product of two matrices, with a special focus on "tall-and-skinny" matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package(1) whose core is written in C++ and parallelized with OpenMP and that is compatiblewith standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.

关键词： parallel algorithms sparse data structures sketching column subset selection regression preconditioning statistical leverage scores

来源：评论

学校读者我要写书评

暂无评论

Fast parallel algorithms for Submodular p-Superseparable Maximization 21th

Fast Parallel Algorithms for Submodular p-Superseparable Max...

引用

21st International Workshop on Approximation and Online algorithms (WAOA) part of ALGO Conference

作者： Cervenjak, Philip Gan, Junhao Wirth, Anthony Univ Melbourne Sch Comp & Informat Syst Parkville Vic Australia

ISBN: (纸本)9783031498145;9783031498152

Maximizing a non-negative, monontone, submodular function f over n elements under a cardinality constraint k (SMCC) is a well-studied NP-hard problem. It has important applications in, e.g., machine learning and influence maximization. Though the theoretical problem admits polynomial-time approximation algorithms, solving it in practice often involves frequently querying submodular functions that are expensive to compute. This has motivated significant research into designing parallel approximation algorithms in the adaptive complexity model;adaptive complexity (adaptivity) measures the number of sequential rounds of poly(n) function queries an algorithm requires. The state-of-the-art algorithms can achieve (1- 1/e - e)-approximate solutions with O(1/e(2) log n) adaptivity, which approaches the known adaptivity lowerbounds. However, the O(1/e(2) log n) adaptivity only applies to maximizing worst-case functions that are unlikely to appear in practice. Thus, in this paper, we consider the special class of p-superseparable submodular functions, which places a reasonable constraint on f, based on the parameter p, and is more amenable to maximization, while also having real-world applicability. Our main contribution is the algorithm LS+GS, a finer-grained version of the existing LS+PGB algorithm, designed for instances of SMCC when f is p-superseparable;it achieves an expected (1- 1/e - e)-approximate solution with O(1/e(2) log(pk)) adaptivity independent of n. Additionally, unrelated to p-superseparability, our LS+GS algorithm uses only O(e(-1) n + e(-2) log n) oracle queries, which has an improved dependence on e(-1) over the state-of-the-art LS+PGB;this is achieved through the design of a novel thresholding subroutine.

关键词： parallel algorithms approximation algorithms submodular maximization

来源：评论

学校读者我要写书评

暂无评论

High-Performance and Flexible parallel algorithms for Semisort and Related Problems 23

High-Performance and Flexible Parallel Algorithms for Semiso...

引用

35th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Dong, Xiaojun Wu, Yunshu Wang, Zhongqi Dhulipala, Laxman Gu, Yan Sun, Yihan Univ Calif Riverside Riverside CA 92521 USA Univ Maryland College Pk MD 20742 USA

ISBN: (纸本)9781450395458

Semisort is a fundamental algorithmic primitive widely used in the design and analysis of efficient parallel algorithms. It takes input as an array of records and a function extracting a key per record, and reorders them so that records with equal keys are contiguous. Since many applications only require collecting equal values, but not fully sorting the input, semisort is broadly applicable, e.g., in string algorithms, graph analytics, and geometry processing, among many other domains. However, despite dozens of recent papers that use semisort in their theoretical analysis and the existence of an asymptotically optimal parallel semisort algorithm, most implementations of these parallel algorithms choose to implement semisort by using comparison or integer sorting in practice, due to potential performance issues in existing semisort implementations. In this paper, we revisit the semisort problem, with the goal of achieving a high-performance parallel semisort implementation with a flexible interface. Our approach can easily be extended to two related problems, histogram and collect-reduce. Our algorithms achieve strong speedups in practice, and importantly, outperform state-of-the-art parallel sorting and semisorting methods for almost all settings we tested, with varying input sizes, distribution, and key types. On average (geometric means), our semisort implementation is at least 1.27x faster the best of the tested baselines. We also test two important applications with real-world data, and show that our algorithms improve the performance (up to 2.13x) over existing approaches. We believe that many other parallel algorithm implementations can be accelerated using our results.

关键词： Semisort Collect-reduce Histogram Sorting Group-by parallel algorithms Shared-Memory parallelism

来源：评论

学校读者我要写书评

暂无评论

Modern parallel algorithms 48

Modern Parallel Algorithms

引用

48th International Symposium on Mathematical Foundations of Computer Science, MFCS 2023

作者： Czumaj, Artur University of Warwick Coventry United Kingdom

ISBN: (纸本)9783959772921

Recent advances in the design of efficient parallel algorithms have been largely focusing on the nowadays classical model of parallel computing called Massive parallel Computation (MPC), which follows the framework of MapReduce systems. In this talk we will survey recent advances in the design of algorithms for graph problems for the MPC model and will mention some interesting open questions in this area. © Artur Czumaj

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for Simulation of the Suspension Transport in Coastal Systems Based on the Explicit-Implicit and Splitting Schemes 17th

Parallel Algorithms for Simulation of the Suspension Tran...

引用

17th International Scientific Conference on parallel Computational Technologies, PCT 2023

作者： Sukhinov, A.I. Chistyakov, A.E. Sidoryakina, V.V. Kuznetsova, I. Yu. Atayan, A.M. Porksheyan, M.V. Don State Technical University Rostov-on-Don Russia Southern Federal University Rostov-on-Don Russia

ISBN: (纸本)9783031388637

We consider two difference schemes that describe the convective-diffusion transfer and settling of multifractional suspensions in coastal systems. The first is based on an explicit-implicit scheme with reduced cost of arithmetic operations. This difference scheme uses an explicit approximation of the diffusion-convection operator (on the lower time layer) along the horizontal directions and an implicit approximation along the vertical direction. We determine the admissible values of the time step for this scheme from the conditions of monotonicity, solvability, and stability. We deem appropriate the use of this scheme, which naturally leads to a parallel algorithm, on grids having a relatively moderate number of nodes along each of the indicated horizontal directions, up to several hundred. The admissible value of the time step in this case is in the interval from s to 1 s. The second is an additive scheme obtained by splitting the original spatial three-dimensional problem into a chain of two-dimensional ones in the horizontal directions and a one-dimensional problem in the vertical direction of the task. In this case, the allowable time step can be increased to several hundred seconds. We consider in detail the parallel implementation, based on the decomposition of the grid domain, of the set of two-dimensional diffusion-convection problems included in the chain. The speedup of the algorithm was estimated on the K60 computer cluster, installed at the Keldysh Institute of Applied Mathematics (Russian Academy of Sciences). © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms Align with Neural Execution

arXiv

引用

arXiv 2023年

作者： Engelmayer, Valerie Georgiev, Dobrik Veličković, Petar University of Augsburg Germany University of Cambridge United Kingdom Google DeepMind United Kingdom

Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve (often strongly) superior predictive performance. © 2023, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：