检索结果-内蒙古大学图书馆

parallel algorithms FOR TENSOR TRAIN ARITHMETIC

SIAM JOURNAL ON SCIENTIFIC COMPUTING 2022年第1期44卷 C25-C53页

作者： Al Daas, Hussam Ballard, Grey Benner, Peter Max Planck Inst Dynam Complex Tech Syst Dept Computat Methods Syst & Control Theory D-39106 Magdeburg Germany Wake Forest Univ Comp Sci Dept Winston Salem NC 27106 USA

We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthonormalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure. The parallel algorithms are designed for distributed-memory computation, and we propose a data distribution and strategy that parallelizes computations for individual cores within the TT format. We analyze the computation and communication costs of the proposed algorithms to show their scalability, and we present numerical experiments that demonstrate their efficiency on both shared-memory and distributed-memory parallel systems. For example, we observe better single-core performance than the existing MATLAB TT-Toolbox in rounding a 2GB TT tensor, and our implementation achieves a 34x speedup using all 40 cores of a single node. We also show nearly linear parallel scaling on larger TT tensors up to over 10,000 cores for all mathematical operations.

关键词： low-rank tensor format tensor train parallel algorithms QR SVD

来源：评论

学校读者我要写书评

暂无评论

Are parallel algorithms Ready for Prime Time? 23

Are Parallel Algorithms Ready for Prime Time?

引用

35th ACM Symposium on parallelism in algorithms and Architectures (SPAA)

作者： Blelloch, Guy E. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450395458

I've spent my career trying to make parallel algorithms accessible to the masses, working from the programming language, systems and algorithms sides. For much of this time, unfortunately, parallel machines were not ready for prime time. They were expensive, hard to access, quirky and there was a lack of software support. parallel algorithms and programming were reserved for a small cadre of experts. However, with advances over the past fifteen or so years we have gone from a situation where all commodity machines had a single processor to one in which all but perhaps a toaster has multiple processors (cores), some with hundreds+. Given the state of modern machines one should wander whether we are at a point where parallel algorithms are ready for prime time and can replace sequential algorithms? Or perhaps a better question is whether we are at a point where algorithms should be algorithms, some with more parallelism than others? In the talk I argued that parallel algorithms are indeed ready for prime time. In particular that, they supply useful abstractions, are broadly applicable, support general techniques, lead to interesting theoretical questions, are elegant, are easy to program, rely on a simple cost model, and importantly can lead to good efficiency on modern multicore machines, very much more so than sequential algorithms. As some evidence, I described our experience implementing a set of 60+ parallel algorithms across a wide set of domains.(1) With the right abstractions and techniques, the code is rarely significantly more complicated than the sequential counterpart, and sometimes simpler. Furthermore the algorithms often get near perfect speedup relative to good sequential algorithms on a modern multicore. As such, it could be that the limiting factor in broadly adopting parallel algorithms is now more a social one rather than a technical one.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for the creation of medical database 2

Parallel algorithms for the creation of medical database

引用

2nd International Scientific Conference on Metrological Support of Innovative Technologies, ICMSIT II-2021

作者： Rakhimov, Bakhtiyar S. Mekhmanov, Mukhiddin S. Bekchanov, Bakhtiyar G. Urgench Branch of Tashkent Medical Academy Urgench Uzbekistan

Analysis of parallel algorithms for graphics processors allows you to determine the bottlenecks of the algorithm that affect its performance on a particular computing system. algorithms can be analyzed both at the AGM level and at the level of the CPU-GPU system. Despite the fact that the model does not reflect the overlap of the execution of computational operations and operations of access to the global memory, but estimates the upper limit of the execution time of the algorithm on the AGM, in any parallel algorithms for the AGM it is necessary to reduce this parameter of the algorithm due to the large number of clock cycles spent by the multiprocessor to implement global memory access operations. © Published under licence by IOP Publishing Ltd.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Accelerating Large-Scale Sorting through parallel algorithms

引用

Journal of Computer and Communications 2024年第1期12卷 131-138页

作者： Yahya Alhabboub Fares Almutairi Mohammed Safhi Yazan Alqahtani Adam Almeedani Yasir Alguwaifli College of Computer Science and Information Technology Imam Abdulrahman Bin Faisal University Dammam Saudi Arabia

This study explores the application of parallel algorithms to enhance large-scale sorting, focusing on the QuickSort method. Implemented in both sequential and parallel forms, the paper provides a detailed comparison of their performance. This study investigates the efficacy of both techniques through the lens of array generation and pivot selection to manage datasets of varying sizes. This study meticulously documents the performance metrics, recording 16,499.2 milliseconds for the serial implementation and 16,339 milliseconds for the parallel implementation when sorting an array by using C++ chrono library. These results suggest that while the performance gains of the parallel approach over its serial counterpart are not immediately pronounced for smaller datasets, the benefits are expected to be more substantial as the dataset size increases.

关键词： Sorting Algorithm Quick Sort QuickSort parallel parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Implementation of the computer tomography parallel algorithms with the incomplete set of data

引用

PEERJ COMPUTER SCIENCE 2021年 7卷 e339-e339页

作者： Pleszczynski, Mariusz Silesian Tech Univ Gliwice Fac Appl Math Gliwice Slaskie Poland

Computer tomography has a wide field of applicability;however, most of its applications assume that the data, obtained from the scans of the examined object, satisfy the expectations regarding their amount and quality. Unfortunately, sometimes such expected data cannot be achieved. Then we deal with the incomplete set of data. In the paper we consider an unusual case of such situation, which may occur when the access to the examined object is difficult. The previous research, conducted by the author, showed that the CT algorithms can be used successfully in this case as well, but the time of reconstruction is problematic. One of possibilities to reduce the time of reconstruction consists in executing the parallel calculations. In the analyzed approach the system of linear equations is divided into blocks, such that each block is operated by a different thread. Such investigations were performed only theoretically till now. In the current paper the usefulness of the parallel-block approach, proposed by the author, is examined. The conducted research has shown that also for an incomplete data set in the analyzed algorithm it is possible to select optimal values of the reconstruction parameters. We can also obtain (for a given number of pixels) a reconstruction with a given maximum error. The paper indicates the differences between the classical and the examined problem of CT. The obtained results confirm that the real implementation of the parallel algorithm is also convergent, which means it is useful.

关键词： Computer tomography parallel algorithms Incomplete set of data Big Data Signal and data processing

来源：评论

学校读者我要写书评

暂无评论

Fast parallel algorithms for finding the longest flow paths in flow direction grids

引用

ENVIRONMENTAL MODELLING & SOFTWARE 2023年第1期167卷

作者： Kotyra, Bartlomiej Chabudzinski, Lukasz Marie Curie Sklodowska Univ Inst Comp Sci ul Akad 9 PL-20033 Lublin Poland Marie Curie Sklodowska Univ Inst Earth & Environm Sci Al Krasnicka 2d PL-20718 Lublin Poland

In hydrological modeling, the longest flow path is an important feature used to characterize a catchment. Many existing GIS platforms offer dedicated software tools for its identification and delineation, generally implementing methods based on searching through the flow direction data. Unfortunately, currently available algorithms for this task often turn out to be inefficient, especially when working with modern large datasets. Moreover, existing methods often rely on incorrect assumptions or perform calculations in a way that can lead to precision issues. In this work, new parallel algorithms were developed, tested and presented. Measurements show that two of the newly proposed implementations are able to identify the longest flow paths in significantly less time compared with other existing methods.

关键词： Longest flow path GIS Hydrology parallel algorithms High-performance computing OpenMP

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for maximizing monotone one-sided σ-smooth functions

arXiv

引用

arXiv 2022年

作者： Zhang, Hongxiang Cheng, Yukun Wu, Chenchen Xu, Dachuan Du, Dingzhu Beijing Institute for Scientific and Engineering Computing Beijing University of Technology Beijing100124 China School of Business Suzhou University of Science and Technology Suzhou215009 China College of Science Tianjin University of Technology Tianjin 300384 China Department of Computer Science University of Texa Dallas Dallas75083 United States

In this paper, we study the problem of maximizing a monotone normalized one-sided σ-smooth (OSS for short) function F (x), subject to a convex polytope (no need to downward-closed [1]). A function F (x) is one-sided σ-smooth if (Equation presented), for all x, u ≥ 0, x ≠ 0. This problem was first introduced by Mehrdad et al. [1] to characterize the multilinear extension of some set functions. Different with the serial algorithm with name Jump-Start Continuous Greedy Algorithm by Mehrdad et al. [1], we propose Jump-Start parallel Greedy (JSPG for short) algorithm, the first parallel algorithm, for this problem. The approximation ratio of JSPG algorithm is proved to be (Equation presented) for any any number α ∈ (0, 1] and ǫ > 0, which improves the approximation ratio of JSCG algorithm in [1]. We also prove that our JSPG algorithm runs in (O(log n/ǫ2)) adaptive rounds and consumes O(n log n/ǫ2) queries, where the number of adaptive rounds and function evaluation queries approximately matches the known results for parallel submodular maximization. In addition, we study the stochastic version of maximizing monotone normalized OSS function, in which the objective function F (x) is defined as F (x) = Ey∼T f(x, y). Here f is a stochastic function with respect to the random variable Y, and y is the realization of Y drawn from a probability distribution T. For this stochastic version, we design Stochastic parallel-Greedy (SPG) algorithm, which achieves a result of (Equation presented), with the same time complexity of JSPG algorithm. Here (Equation presented) is related to the preset parameters σ, L, D and time t. Copyright © 2022, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Streaming and Massively parallel algorithms for Euclidean Max-Cut

arXiv

引用

arXiv 2025年

作者： Menand, Nicolas Waingarten, Erik University of Pennsylvania United States

Given a set of vectors X = {x1, . . ., xn} ⊂ Rd, the Euclidean max-cut problem asks to partition the vectors into two parts so as to maximize the sum of Euclidean distances which cross the partition. We design new algorithms for Euclidean max-cut in models for massive datasets: • We give a fully-scalable constant-round MPC algorithm using O(nd) + n · poly(log(n)/Ε) total space which gives a (1 + Ε)-approximate Euclidean max-cut. • We give a dynamic streaming algorithm using poly(dlog ∆/Ε) space when X ⊆ [∆]d, which provides oracle access to a (1 + Ε)-approximate Euclidean max-cut. Recently, Chen, Jiang, and Krauthgamer [STOC’23] gave a dynamic streaming algorithm with space poly(dlog ∆/Ε) to approximate the value of the Euclidean max-cut, but could not provide oracle access to an approximately optimal cut. This was left open in that work, and we resolve it here. Both algorithms follow from the same framework, which analyzes a "parallel" and "subsampled" (Euclidean) version of a greedy algorithm of Mathieu and Schudy [SODA’08] for dense max-cut. © 2025, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Optimal parallel algorithms for Dendrogram Computation and Single-Linkage Clustering

arXiv

引用

arXiv 2024年

作者： Dhulipala, Laxman Dong, Xiaojun Gowda, Kishen N. Gu, Yan University of Maryland College ParkMD United States University of California RiversideCA United States

Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree T, the SLD of T is a binary dendrogram that summarizes the n − 1 clusterings obtained by contracting the edges of T in order of weight. Existing algorithms for computing the SLD all require Ω(n log n) work where n = |T |. Furthermore, to the best of our knowledge no prior work provides a parallel algorithm obtaining non-trivial speedup for this problem. In this paper, we design faster parallel algorithms for computing SLDs both in theory and in practice based on new structural results about SLDs. In particular, we obtain a deterministic output-sensitive parallel algorithm based on parallel tree contraction that requires O(n log ℎ) work and o(log2 n log2 ℎ) depth, where ℎ is the height of the output SLD. We also give a deterministic bottom-up algorithm for the problem inspired by the nearest-neighbor chain algorithm for hierarchical agglomerative clustering, and show that it achieves O(n log ℎ) work and O(ℎ log n) depth. Our results are based on a novel divide-and-conquer framework for building SLDs, inspired by divide-and-conquer algorithms for Cartesian trees. Our new algorithms can quickly compute the SLD on billion-scale trees, and obtain up to 150x speedup over the highly-efficient Union-Find algorithm typically used to compute SLDs in practice. © 2024, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Evaluating the Influence of Graph Characteristics on parallel algorithms for Derived Graph Structures

Evaluating the Influence of Graph Characteristics on Paralle...

引用

IEEE International Conference on High Performance Computing Workshops (HiPCW)

作者： Maulein Pathak Samarth Kapila Yogish Sabharwal Neelima Gupta Dept of Computer Science University of Delhi India Keshav Mahavidyalaya University of Delhi India University Of British Columbia Vancouver IBM Research India India

ISBN: (数字)9798331509118

ISBN: (纸本)9798331509125

This work investigates how graph characteristics affect the quality of derived graphs, specifically focusing on graph spanners. Graph spanners retain all vertices and a subset of edges while preserving shortest distances with an allowable stretch, making them essential for efficiently approximating graph structures. We emphasize recent advancements in parallel algorithms for constructing spanners in sparse graphs, building on the work of Miller et al. and Forster et al. By extracting key graph properties and employing data analysis techniques—such as correlation analysis, linear regression, and random forest regression—we examine the relationships between these characteristics and the size of the derived graphs, which is vital for optimizing spanner construction in real-world applications.

关键词： Data analysis Correlation High performance computing Conferences Linear regression Buildings Focusing Data mining parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：