检索结果-内蒙古大学图书馆

Seventeenth annual acm symposium on parallelism in algorithms and architectures

作者： Williams, Ryan Computer Science Department Carnegie Mellon University Pittsburgh PA 15213 United States

ISBN: (纸本)9781581139860

We study the relatively old problem of asymptotically reducing the runtime of serial computations with polynomial size Boolean circuits. To the best of our knowledge, no progress on this problem has been formally reported in the literature for general computational models, although we observe that early work of Chandra, Stockmeyer, and Vishkin implies the existence of non-uniform unbounded fan-in circuits of tO(1) size and O(t/log log n) depth, for time t Turing machines. We give an algorithmic size-depth tradeoff for parallelizing time t random access Turing machines, a model at least as powerful as logarithmic cost RAMs. Our parallel simulation yields logspace-uniform tO(1) size, O(t/log t) depth Boolean circuits having semi-unbounded fan-in gates. In fact, for appropriate d, uniform tO(1)2 O(t/d) size circuits of depth O(d) can simulate time t. One corollary is that any log-cost time t RAM can be simulated by a log-cost CRCW PRAM using tO(1) processors and O(t/ log t) time. This is a major improvement over previous parallel speedups, which could only guarantee an Ω(log t) speedup with an exponential number of processors. Copyright 2005 acm.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

BSP versus LogP

引用

ALGORITHMICA 1999年第3-4期24卷 405-421页

作者： Bilardi, G Herley, KT Pietracaprina, A Pucci, G Spirakis, P Univ Padua Dipartimento Elettron & Informat I-35131 Padua Italy Univ Illinois Dept Elect Engn & Comp Sci Chicago IL 60607 USA Natl Univ Ireland Univ Coll Cork Dept Comp Sci Cork Ireland Comp Technol Inst GR-26110 Patras Greece

A quantitative comparison of the BSP and LogP models of parallel computation is developed. We concentrate on a variant of LogP that disallows the so-called stalling behavior, although issues surrounding the stalling phenomenon are also explored. Very efficient cross simulations between the two models are derived, showing their substantial equivalence for algorithmic design guided by asymptotic analysis. it is also shown that the two models can be implemented with similar performance on most point-to-point networks. In conclusion, within the limits of our analysis that is mainly of an asymptotic nature. BSP and (stall-free) LogP can be viewed as closely related variants within the bandwidth-latency framework for modeling parallel computation. BSP seems somewhat preferable due to its greater simplicity and portability, and slightly greater power. LogP lends itself more naturally to multiuser mode.

关键词： models of computation parallel computation bridging models portability BSP model LogP model

来源：评论

学校读者我要写书评

暂无评论

In-place techniques for parallel convex hull algorithms 3

In-place techniques for parallel convex hull algorithms

引用

3rd annual acm symposium on parallel algorithms and architectures, SPAA 1991

作者： Ghouse, Mujtaba R. Goodrich, Michael T. Dept. of Computer Science Johns Hopkins University BaltimoreMD21218-2686 United States

ISBN: (纸本)0897914384

We present a number of efficient parallel algorithms for constructing 2- and 3-dimensional convex hulls on a randomized CRCW PRAM. Specifically, we show how to build the convex hull of n pre-sorted points in the plane almost surely in O(1) time using O(n log n) processors, or, alternately, almost surely in O (log∗ n) time using an optimal number of processors. We also show how to find the convex hull of n unsorted points in R2 (resp., R3) in O(log2 n) time using O(n log h) work (resp., O(log2n) time using O(min{nlog2 h, n log n}) work), with very high probability, where h is the number of edges in the convex hull (h is O(n), but can be as small as O(l)). Our algorithms for unsorted input depend on the use of new in-place procedures, that is, procedures that are defined on a subset of elements in the input and that work without re-ordering the input. For the pre-sorted case we also exploit a technique that allows one to modify an algorithm that assumes it is given points so that it can be used on hulls;we call such algorithms point-hull invariant. © 1991 acm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Constructing trees in parallel 89

Constructing trees in parallel

引用

1st annual acm symposium on parallel algorithms and architectures, SPAA 1989

作者： Atallah, M.J. Kosaraju, S.R. Larmore, L.L. Miller, G.L. Teng, S.-H. Department of Computer Science Purdue University United States Department of Computer Science Johns Hopkins University United States ICS UC Irvine United States School of Computer Science CMU United States Department of Computer Science USC United States

ISBN: (纸本)089791323X

An O(log 2 n) time, n2/logn processor as well as an O(log n) time, n3/log n processor CREW deterministic parallel algorithms are presented for constructing Huffman codes from a given list of frequences. The time can be reduced to O(log n(loglog n) 2) on an CRCW model, using only n2/(log log n) 2 processors. Also presented is an optimal O(log n) time, O(n/log n) processor EREW parallel algorithm for constructing a tree given a list of leaf depths when the depths are monotonic. An O(log 2 n) time, n processor parallel algorithm is given for the general tree construction problem. We also give an O(log 2 n) time n2/log2n processor algorithm which finds a nearly optimal binary search tree. An O(log 2 n) time n 2 36 processor algorithm for recognizing linear context free languages is given. A crucial ingredient in achieving those bounds is a formulation of these problems as multiplications of special matrices which we call concave matrices. The structure of these matrices makes their parallel multiplication dramatically more efficient than that of arbitrary matrices. © 1989 acm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

On testing consecutive-ones property in parallel 95

On testing consecutive-ones property in parallel

引用

Proceedings of the 7th annual acm symposium on parallel algorithms and architectures, SPAA'95

作者： Annexstein, F.S. Swaminathan, R.P. Univ of Cincinnati Cincinnati OH United States

ISBN: (纸本)9780897917179

A n × m (0,1)-matrix is said to satisfy the consecutive-ones property if there is a permutation of the rows of the matrix such that in each column all non-zero entries are adjacent. The problem of determining such a permutation, if one exists, is the consecutive-ones property problem. Previously, Klein and Reif [13] gave a parallel solution for the consecutive-ones property problem with an algorithm based on complicated parallel PQ-tree manipulations. The work complexity of this algorithm was improved in [14] to run in time O(log2 n) with a linear number of CRCW processors. We present a new algorithm for this problem, based on a less sophisticated data structure, that improves upon the processor bounds of the previous algorithms by a factor of log n/log log n is general, and by a factor of log n for sufficiently dense problem instances. Our algorithm uses a novel divide-and-conquer approach, and uses for a fundamental data structure the decomposition of graphs into tri-connected components. Solutions to the consecutive-ones problem have important applications to a variety of problems in computational molecular biology, databases, distributed computing, VLSI placement and routing, and graph and network theory.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Linear-time approximation schemes for scheduling malleable parallel tasks

Linear-time approximation schemes for scheduling malleable p...

引用

10th annual acm-SIAM symposium on Discrete algorithms

作者： Jansen, K Porkolab, L IDSIA Lugano CH-6900 Lugano Switzerland

ISBN: (纸本)0898714346

A malleable parallel task is one whose execution time is a function of the number of (identical) processors alloted to it. We study the problem of scheduling a set of n independent malleable tasks on a fixed number of parallel processors, and propose an approximation scheme that for any fixed epsilon > 0 computes in O(n) time a non-preemptive schedule of length at most (1 + epsilon) times the optimum.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel metric tree embedding based on an algebraic view on moore-bellman-ford 16

Parallel metric tree embedding based on an algebraic view on...

引用

28th acm symposium on parallelism in algorithms and architectures, SPAA 2016

作者： Friedrichs, Stephan Lenzen, Christoph Max Planck Institute for Informatics Saarbrücken Graduate School of Computer Science Saarbrücken Germany

ISBN: (纸本)9781450342100

A metric tree embedding of expected stretch α maps a weighted n-node graph G = (V, E, ω) to a weighted tree T = (VT, ET, ωT) with V C VT, and dist(v, w, G) ≤ dist(v, w, T) and E[dist(v, w, T)] ≤ αdist(v, w, G) for all v, w ∈ V. Such embeddings are highly useful for designing fast approximation algorithms, as many hard problems are easy to solve on tree instances. However, to date the best parallel polylogn depth algorithm that achieves an asymptotically optimal expected stretch of α ∈ O(logn) uses Ω(n2) work and requires a metric as input. In this paper, we show how to achieve the same guarantees using Õ(m1/ϵ) work, where m is the number of edges of G and ϵ > 0 is an arbitrarily small constant. Moreover, one may reduce the work further to Õ(m + n1-ϵ), at the expense of increasing the expected stretch α to O(ϵ-1 log n) using the spanner construction of Baswana and Sen as preprocessing step. Our main tool in deriving these parallel algorithms is an algebraic characterization of a generalization of the classic Moore-Bellman-Ford algorithm. We consider this framework, which subsumes a large variety of previous "Moore-Bellman-Ford-flavored" algorithms, to be of independent interest.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: A parallel Architecture for Dynamic Approximate Membership 23

Brief Announcement: A Parallel Architecture for Dynamic Appr...

引用

35th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Even, Guy Domingues, Gabriel Marques Toutian, Parham Tel Aviv Univ Tel Aviv Israel

ISBN: (纸本)9781450395458

We present the first parallel architecture for a dynamic approximate membership data-structure (i.e., a filter) that supports insertions, deletions, and approximate membership queries. Our architecture borrows techniques from PRAM emulation to obtain a parallel filter based on two levels of fingerprint-dictionaries. A key component in the architecture is a special-purpose wide-word processor we designed to support operations over small dictionaries. We implemented this architecture on an FPGA running at 100MHz. The implementation stores up to 1.44 million keys, has a false-positive rate less than 0.3%, receives batches 16 of operations per cycle, preserves sequential order, and runs with a stable throughput of over a billion operations per second with respect to several benchmarks.

关键词： data structures Bloom filter PRAM

来源：评论

学校读者我要写书评

暂无评论

Large-scale sorting in parallel memories 91

Large-scale sorting in parallel memories

引用

Third annual acm symposium on parallel algorithms and architectures - SPAA'91

作者： Nodine, M.H. Vitter, J.S. Dept. of Computer Science Brown University Providence R. I.

ISBN: (纸本)0897914384

We present several algorithms for sorting efficiently with parallel two-level and multilevel memories. Our main result is an elegant, easy-to-implement, optimal, deterministic algorithm for external sorting with P disk drives. This result answers the open problem posed by Vitter and Shriver. Our measure of performance is the number of parallel input/output (I/O) operations, in which each of the P disks can simultaneously transfer a block of B contiguous records. Our optimal algorithm is deterministic, and thus it improves upon the optimal randomized algorithm of [ViS] as well as the well-known deterministic but nonoptimal technique of disk striping. The second part of the paper broadens our coverage from two-level memories to more general multilevel memories. In particular we consider the blocked uniform memory hierarchy (UMH) introduced by Alpern, Carter, and Feig, and its parallelization P-UMH, along with new variants. We give optimal and nearly-optimal algorithms for a wide range of bandwidth degradations, including a parsimonious algorithm for constant bandwidth. We also develop optimal sorting algorithms for all bandwidths for other versions of UMH and P-UMH, including natural restrictions we introduce called RUMH and P-RUMH, which more closely correspond to current programming languages. © 1991 acm.

关键词： Bandwidth

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Work Stealing through Partial Asynchronous Delegation 24

Brief Announcement: Work Stealing through Partial Asynchrono...

引用

36th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Wang, Jiawei Liu, Yutao Fu, Ming Haertig, Hermann Chen, Haibo Tech Univ Dresden Huawei Dresden Res Ctr Dresden Germany Huawei Dresden Res Ctr Dresden Germany Huawei Cent Software Inst Shenzhen Peoples R China Tech Univ Dresden Dresden Germany Shanghai Jiao Tong Univ Huawei Cent Software Inst Shanghai Peoples R China

ISBN: (纸本)9798400704161

Work stealing is a well-established technique in multi-core systems that aims to improve load balancing and task scheduling efficiency. Each processing unit maintains its own task queue, and when idle, it steals tasks from other units. Traditional work-stealing approaches face performance bottlenecks due to costly synchronization primitives and contention arising from concurrent access by both the queue owner and thieves. The state-of-the-art solution addresses these issues through coarse-grained synchronization;however, it restricts stealing in specific scenarios, thereby limiting parallelism. We introduce PadWS, a partial and asynchronous delegated work-stealing algorithm. PadWS employs a block-based design in which, under common cases, the queue owner and thieves work on separate blocks, reducing metadata contention. Delegation is partially enabled for the block in which the owner is located, allowing thieves to steal from it-an approach that deviates from the current block-based approach. Additionally, our delegation strategy is asynchronous, which removes the need for thieves to spin-wait after sending a request.

关键词： parallel processing scheduling work stealing delegation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：