检索结果-内蒙古大学图书馆

36th acm symposium on parallelism in algorithms and architectures, SPAA 2024

ISBN: (纸本)9798400704161

The proceedings contain 54 papers. The topics discussed include: expediting hazard pointers with bounded RCU critical sections;Alock: asymmetric lock primitive for RDMA systems;when is parallelism fearless and zero-cost with rust?;efficient parallel reinforcement learning framework using the reactor model;parallel best arm identification in heterogeneous environments;brief announcement: lock-free learned search data structure;brief announcement: LIT: lookup interlocked table for range queries;brief announcement: a fast scalable detectable unrolled lock-based linked list;scheduling out-trees online to optimize maximum flow;optimizing dynamic data center provisioning through speed scaling: a primal-dual perspective;scheduling jobs with work-inefficient parallel solutions;and multi bucket queues: efficient concurrent priority scheduling.

关键词：

来源：评论

学校读者我要写书评

暂无评论

parallel Minimum Cuts in O(m log² n) Work and Low Depth

引用

acm TRANSACTIONS ON parallel COMPUTING 2023年第4期10卷 1-28页

作者： Anderson, Daniel Blelloch, Guy E. Carnegie Mellon Univ 5000 Forbes Ave Pittsburgh PA 15213 USA

We present a randomized O(m log(2) n) work, O( polylogn) depth parallel algorithm for minimum cut. This algorithm matches thework bounds of a recent sequential algorithm by Gawrychowski, Mozes, andWeimann [ICALP'20], and improves on the previously best parallel algorithm by Geissmann and Gianinazzi [SPAA'18], which performs O(m log(4) n) work in O(polylogn) depth. Our algorithm makes use of three components that might be of independent interest. First, we design a parallel data structure that efficiently supports batched mixed queries and updates on trees. It generalizes and improves thework bounds of a previous data structure of Geissmann and Gianinazzi and iswork efficient with respect to the best sequential algorithm. Second, we design a parallel algorithm for approximate minimum cut that improves on previous results by Karger and Motwani. We use this algorithm to give a work-efficient procedure to produce a tree packing, as in Karger's sequential algorithm for minimum cuts. Last, we design an efficient parallel algorithm for solving the minimum 2-respecting cut problem.

关键词： Minimum cut parallel algorithms graph algorithms dynamic trees

来源：评论

学校读者我要写书评

暂无评论

Practical Massively parallel Sorting 15

Practical Massively Parallel Sorting

引用

27th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Axtmann, Michael Bingmann, Timo Sanders, Peter Schulz, Christian Karlsruhe Inst Technol Karlsruhe Germany

ISBN: (纸本)9781450335881

Previous parallel sorting algorithms do not scale to the largest available machines, since they either have prohibitive communication volume or prohibitive critical path length. We describe algorithms that are a viable compromise and overcome this gap both in theory and practice. The algorithms are multi-level generalizations of the known algorithms sample sort and multiway mergesort. In particular, our sample sort variant turns out to be very scalable both in theory and practice where it scales up to 2(15) MPI processes with outstanding performance in particular for medium sized inputs. Some tools we develop may be of independent interest - a simple, practical, and flexible sorting algorithm for very small inputs, a near linear time optimal algorithm for solving a constrained bin packing problem, and an algorithm for data delivery, that guarantees a small number of message startups on each processor.

关键词： parallel sorting multiway mergesort sample sort

来源：评论

学校读者我要写书评

暂无评论

Space and Time Efficient parallel Graph Decomposition, Clustering, and Diameter Approximation 15

Space and Time Efficient Parallel Graph Decomposition, Clust...

引用

27th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Ceccarello, Matteo Pietracaprina, Andrea Pucci, Geppino Upfal, Eli Univ Padua Dept Informat Engn Padua Italy Brown Univ Dept Comp Sci Providence RI 02912 USA

ISBN: (纸本)9781450335881

We develop a novel parallel decomposition strategy for un-weighted, undirected graphs, based on growing disjoint connected clusters from batches of centers progressively selected from yet uncovered nodes. With respect to similar previous decompositions, our strategy exercises a tighter control on both the number of clusters and their maximum radius. We present two important applications of our parallel graph decomposition: (1) k-center clustering approximation;and (2) diameter approximation. In both cases, we obtain algorithms which feature a polylogarithmic approximation factor and are amenable to a distributed implementation that is geared for massive (long-diameter) graphs. The total space needed for the computation is linear in the problem size, and the parallel depth is substantially sublinear in the diameter for graphs with low doubling dimension. To the best of our knowledge, ours are the first parallel approximations for these problems which achieve sub-diameter parallel time, for a relevant class of graphs, using only linear space. Besides the theoretical guarantees, our algorithms allow for a very simple implementation on clustered architectures: we report on extensive experiments which demonstrate their effectiveness and efficiency on large graphs as compared to alternative known approaches.

关键词： parallel Graph algorithms Graph Decomposition k-Center Problem Diameter Approximation MapReduce

来源：评论

学校读者我要写书评

暂无评论

Exploiting flexibly assignable work to improve load balance 02

Exploiting flexibly assignable work to improve load balance

引用

Fourteenth annual acm symposium on parallel algorithms and architectures

作者： Pinar, Ali Hendrickson, Bruce NERSC Lawrence Berkeley National Lab. Berkeley CA 94720 United States Parallel Computing Sciences Dept. Sandia National Laboratories Albuquerque NM 87185-1110 United States

ISBN: (纸本)9781581135299

In many applications of parallel computing, distribution of the data unambiguously implies distribution of work among processors. But there are exceptions where some tasks can be assigned to one of several processors without altering the total volume of communication. In this paper, we study the problem of exploiting this flexibility in assignment of tasks to improve load balance. We first model the problem in terms of network flow and use combinatorial techniques for its solution. Our parametric search algorithms use maximum flow algorithms for probing on a candidate optimal solution value. We describe two algorithms to solve the assignment problem with log WT and |P| probe calls, where WT and |P|, respectively, denote the total workload and number of processors. We also define augmenting paths and cuts for this problem, and show that any algorithm based on augmenting paths can be used to find an optimal solution for the task assignment problem. We then consider a continuous version of the problem, and formulate it as a linearly constrained optimization problem, i.e., min Ax∞, s.t. Bx = d. To avoid solving an intractable ∞-norm optimization problem, we show that in this case minimizing the 2-norm is sufficient to minimize the ∞-norm, which reduces the problem to the well-studied linearly-constrained least squares problem. The continuous version of the problem has the advantage of being easily amenable to parallelization.

关键词： algorithms

来源：评论

学校读者我要写书评

暂无评论

Automatic parallel I/O performance optimization in Panda 98

Automatic parallel I/O performance optimization in Panda

引用

Proceedings of the 1998 10th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Chen, Y. Winslett, M. Cho, Y. Kuo, S. Univ of Illinois

ISBN: (纸本)9780897919890

parallel I/O systems typically consist of individual processors, communication networks, and a large number of disks. Managing and utilizing these resources to meet performance, portability and usability goals of applications has become a significant challenge. We believe that a parallel I/O system that automatically selects efficient I/O plans for user applications is a solution to this problem. In this paper, we present such an automatic performance optimization approach for scientific applications performing collective I/O requests on multidimensional arrays. Under our approach, an optimization engine in a parallel I/O system selects optimal I/O plans automatically without human intervention based on a description of the application I/O requests and the system configuration. To validate our hypothesis, we have built an optimizer that uses a rule-based and randomized search-based algorithms to select optimal parameter settings in Panda, a parallel I/O library for multidimensional arrays. Our performance results obtained from two IBM SPs with significantly different configurations show that the Panda optimizer is able to select high-quality I/O plans and deliver high performance under a variety of system configurations.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Simple randomized mergesort on parallel disks 96

Simple randomized mergesort on parallel disks

引用

Proceedings of the 1996 8th annual acm symposium on parallel algorithms and architectures

作者： Barve, Rakesh D. Grove, Edward F. Vitter, Jeffrey Scott Duke Univ Durham NC United States

ISBN: (纸本)9780897918091

We consider the problem of sorting a file of N records on the D-disk model of parallel I/O [VS94] in which there are two sources of parallelism. Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D disks in parallel. We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks. SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the forecasting technique of [Knu73], our algorithm is able to read in, at any time, the `right' block from any disk, and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the `right' blocks from memory to make space for new ones to be read in. The disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamental inefficiencies of previous mergesort algorithms. Our analysis technique involves a novel reduction to various maximum occupancy problems. We prove that the expected I/O performance of SRM is efficient under varying sizes of memory and that it compares favorably in practice to disk-striped mergesort (DSM). Our studies indicate that SRM outperforms DSM even when the number D of parallel disks is fairly small.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Localized algorithm for parallel association mining

Localized algorithm for parallel association mining

引用

Proceedings of the 1997 9th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Zaki, Mohammed Javeed Parthasarathy, Srinivasan Li, Wei Univ of Rochester Rochester United States

Discovery of association rules is an important database mining problem. Mining for association rules involves extracting patterns from large databases and inferring useful rules from them. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the commonly occurring patterns or itemsets (set of items), thus incurring high I/O overhead. In the parallel case, these algorithms do a reduction at the end of each pass to construct the global patterns, thus incurring high synchronization cost. In this paper we describe a new parallel association mining algorithm. Our algorithm is a result of detailed study of the available parallelism and the properties of associations. The algorithm uses a scheme to cluster related frequent itemsets together, and to partition them among the processors. At the same time it also uses a different database layout which clusters related transactions together, and selectively replicates the database so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithm eliminates the need for further communication or synchronization. The algorithm further scans the local database partition only three times, thus minimizing I/O overheads. Unlike previous approaches, the algorithms uses simple intersection operations to compute frequent itemsets and doesn't have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithm on various databases, and compare it against a well known parallel algorithm. Our algorithm outperforms it by an more than an order of magnitude.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

SPAA'12 - Proceedings of the 24th acm symposium on parallelism in algorithms and architectures

SPAA'12 - Proceedings of the 24th ACM Symposium on Paralleli...

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

ISBN: (纸本)9781450312134

The proceedings contain 40 papers. The topics discussed include: time vs. space trade-offs for rendezvous in trees;allowing each node to communicate only once in a distributed system: shared whiteboard models;optimal and competitive runtime bounds for continuous, local gathering of mobile robots;online multi-robot exploration of grid graphs with rectangular obstacles;in search of parallel dimensions;delegation and nesting in best-effort hardware transactional memory;design, verification and applications of a new read-write lock algorithm;a lock-free B+tree;brief announcement: the problem based benchmark suite;brief announcement: subgraph isomorphism on a multithreaded shared memory architecture;efficient cache oblivious algorithms for randomized divide-and-conquer on the multicore model;a scalable framework for heterogeneous GPU-based clusters;and faster and simpler width-independent parallel algorithms for positive semidefinite programming.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient low-contention parallel algorithms

引用

JOURNAL OF COMPUTER AND SYSTEM SCIENCES 1996年第3期53卷 417-442页

作者： Gibbons, PB Matias, Y Ramachandran, V UNIV TEXAS DEPT COMP SCI AUSTIN TX 78712 USA

The queue-read, queue-write (QRQW) parallel random access machine ( PRAM) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The QRQW PRAM model reflects the contention properties of most commercially available parallel machines more accurately than either the well-studied CRCW PRAM or EREW PRAM models, and can be efficiently emulated with only logarithmic slowdown on hypercube-type noncombining networks. This paper describes fast, low-contention, work-optimal, randomized QRQW PRAM algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting, These logarithmic or sublogarithmic time algorithms considerably improve upon the best known EREW PRAM algorithms for these problems, while avoiding the high-contention steps typical of CRCW PRAM algorithms. An illustrative experiment demonstrates the performance advantage of a new QRQW random permutation algorithm when compared with the popular EREW algorithm. Finally, this paper presents new randomized algorithms for integer sorting and general sorting. (C) 1996 Academic Press, Inc.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：