检索结果-内蒙古大学图书馆

1st annual acm symposium on parallel algorithms and architectures, SPAA 1989

作者： Bellantoni, Stephen J. Department of Computer Science University of Toronto Canada

ISBN: (纸本)089791323X

The PRAM model of parallel computation is examined with respect to wordsize, the number of bits which can be held in each global memory cell. First, adversary arguments are used to show the incomparability of certain machines which store the same amount of global information but which differ in wordsize. Next, for machines with infinitely many memory cells, a counting argument is used to show a large lower bound and to separate a hierarchy of machine classes based on wordsize. Finally, an efficient simulation by boolean circuits is used to give a simple new proof of the tight ω(log n / log log n ) time bound for PARITY on small-word size machines. Overall the results suggest that, in some circumstances, the memory word size is a more significant resource than the write resolution rule, number of memory cells, or number of processors. © 1989 acm.

关键词： Random access storage

来源：评论

学校读者我要写书评

暂无评论

Efficient automatic simulation of parallel computation on networks of workstations

引用

DISCRETE APPLIED MATHEMATICS 2006年第10期154卷 1500-1509页

作者： Kaklamanis, Christos Krizanc, Danny Montangero, Manuela Persiano, Giuseppe Univ Modena & Reggio Emilia Dipartimento Ingn Informaz I-41100 Modena Italy Univ Patras Inst Comp Technol GR-26500 Patras Greece Univ Patras Dept Comp Engn & Informat GR-26500 Patras Greece Wesleyan Univ Dept Math & Comp Sci Middletown CT 06459 USA Univ Salerno Dipartimento Informat & Applicaz I-84081 Baronissi Italy

Andrews et al. [Automatic method for hiding latency in high bandwidth networks, in: proceedings of the acm symposium on Theory of Computing, 1996, pp. 257-265;Improved methods for hiding latency in high bandwidth networks, in: proceedings of the Eighth annual acm symposium on parallel algorithms and architectures, 1996, pp. 52-61] introduced a number of techniques for automatically hiding latency when performing simulations of networks with unit delay links on networks with arbitrary unequal delay links. In their work, they assume that processors of the host network are identical in computational power to those of the guest network being simulated. They further assume that the links of the host are able to pipeline messages, i.e., they are able to deliver P packets in time O(P + d) where d is the delay on the link. In this paper we examine the effect of eliminating one or both of these assumptions. In particular, we provide an efficient simulation of a linear array of homogeneous processors connected by unit-delay links on a linear array of heterogeneous processors connected by links with arbitrary delay. We show that the slowdown achieved by our simulation is optimal. We then consider the case of simulating cliques by cliques;i.e., a clique of heterogeneous processors with arbitrary delay links is used to simulate a clique of homogeneous processors with unit delay links. We reduce the slowdown from the obvious bound of the maximum delay link to the average of the link delays. In the case of the linear array we consider both links with and without pipelining. For the clique simulation the links are not assumed to support pipelining. The main motivation of our results (as was the case with Andrews et al.) is to mitigate the degradation of performance when executing parallel programs designed for different architectures on a network of workstations (NOW). In such a setting it is unlikely that the links provided by the NOW will support pipelining and it is quite probab

关键词： parallel computation distributed computation automatic simulation

来源：评论

学校读者我要写书评

暂无评论

Constructing arrangements optimally in parallel 91

Constructing arrangements optimally in parallel

引用

3rd annual acm symposium on parallel algorithms and architectures - SPAA'91

作者： Goodrich, M.T. Department of Computer Science The Johns Hopkins University Baltimore MD

We give two optimal parallel algorithms for constructing the arrangement of n lines in the plane. The first method is quite simple and runs in O(log2n) time using O(n2) work, and the second method, which is more sophi... 详细信息

ISBN: (纸本)0897914384

We give two optimal parallel algorithms for constructing the arrangement of n lines in the plane. The first method is quite simple and runs in O(log²n) time using O(n²) work, and the second method, which is more sophisticated, runs in O(log n) time using O(n²) work. This second result solves a well-known open problem in parallel computational geometry, and involves the use of a new algorithmic technique, the construction of e-pseudo-nets, which may be of interest in its own right. Our results immediately imply that one can optimally construct the arrangement of n hyperplanes in R^d in O(log n) time using O(n^d) work, for fixed d. Our algorithms are for the CREW PRAM. © 1991 acm.

关键词： Computational geometry

来源：评论

学校读者我要写书评

暂无评论

DREADLOCKS: Efficient Deadlock Detection 08

DREADLOCKS: Efficient Deadlock Detection

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Koskinen, Eric Herlihy, Maurice Brown Univ Dept Comp Sci Providence RI 02912 USA

ISBN: (纸本)9781595939739

We present Dreadlocks. an efficient new shared-memory spin lock that actively detects deadlocks. Instead of spinning on a Boolean value, each thread spins on the lock owner's per-thread digest, a compact representation of a portion of the lock's waits-for graph. Digests can be implemented either as bit vectors (for small numbers of threads) or as Bloom filters (for larger numbers of threads). Updates to digests are propagated dynamically as locks are acquired and released. Dreadlocks can be applied to any spin lock algorithm that allows threads to time out. Experimental results show that Dreadlocks outperform timeouts under many circumstances, and almost never do worse.

关键词： Concurrency parallel programming deadlock deadlock detection bloom filters transactional memory

来源：评论

学校读者我要写书评

暂无评论

Finding Strongly Connected Components in parallel using O(log² n) Reachability Queries 08

Finding Strongly Connected Components in Parallel using <i>O...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Schudy, Warren Brown Univ Providence RI 02912 USA

ISBN: (纸本)9781595939739

We give a randomized (Las-Vegas) parallel algorithm for computing strongly connected components of a graph with n vertices and m edges. The runtime is dominated by O(log(2) n) multi-source parallel reachability queries;i.e. O(log(2) n) calls to a subroutine that computes the union of the descendants of a given set of vertices in a given digraph. Our algorithm also topologically sorts the strongly connected components. Using Ullman and Yannakakis's [22] techniques for the reachability subroutine gives our algorithm runtime (O) over tilde (t) using mn/t(2) processors for any (n(2)/m)(1/3) <= t <= n. On sparse graphs, this improves the number of processors needed to compute strongly connected components and topological sort within time n(1/3) <= t <= n from the previously best known (n/t)(3) [20] to (n/t)(2).

关键词： Graph algorithms parallel algorithms Strongly connected components Topological sort Transitive closure bottleneck

来源：评论

学校读者我要写书评

暂无评论

Optimal trade-offs between size and slowdown for universal parallel networks

Optimal trade-offs between size and slowdown for universal p...

引用

proceedings of the 7th annual acm symposium on parallel algorithms and architectures, SPAA'95

作者： Meyer auf der Heide, Friedhelm Storch, Martin Wanka, Rolf Univ of Paderborn Paderborn Germany

In this paper, we address the question how efficiently a single constant-degree processor network can simulate the computation of any constant-degree processor network. We show the following lower bound trade-off: If M is an arbitrary constant-degree processor network of size m that can simulate all constant-degree processor networks of size n with slowdown s, then m·s = Ω(n log m). Our trade-off holds for a very general model of simulations. It covers all previously considered models and all known techniques for simulations among networks. For m ≥ n, this improves a previous lower bound by a factor of log log n, proved for a weaker simulation model. For m < n, this is the first non-trivial lower bound for this problem. In this case, this lower bound is asymptotically tight.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

SPAA'11 - proceedings of the 23rd annual symposium on parallelism in algorithms and architectures

SPAA'11 - Proceedings of the 23rd Annual Symposium on Parall...

引用

23rd acm symposium on parallelism in algorithms and architectures, SPAA'11

ISBN: (纸本)9781450307437

The proceedings contain 50 papers. The topics discussed include: graph expansion and communication costs of fast matrix multiplication;near linear-work parallel SDD solvers, low-diameter decomposition, and low-stretch subgraphs;linear-work greedy parallel approximate set cover and variants;optimizing hybrid transactional memory: the importance of nonspeculative operations;parallelism and data movement characterization of contemporary application classes;work-stealing for mixed-mode parallelism by deterministic team-building;full reversal routing as a linear dynamical system;reclaiming the energy of a schedule, models and algorithms;a tight runtime bound for synchronous gathering of autonomous robots with limited visibility;convergence of local communication chain strategies via linear transformations: or how to trade locality for speed;and convergence to equilibrium of logit dynamics for strategic games.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient load balancing and data remapping for adaptive grid calculations 97

Efficient load balancing and data remapping for adaptive gri...

引用

proceedings of the 1997 9th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Oliker, Leonid Biswas, Rupak NASA Ames Research Cent Moffett Field CA United States

ISBN: (纸本)9780897918909

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. We present a novel method to dynamically balance the processor workloads with a global view. This paper presents, for the first time, the implementation and integration of all major components within our dynamic load balancing strategy for adaptive grid calculations. Mesh adaption, repartitioning, processor assignment, and remapping are critical components of the framework that must be accomplished rapidly and efficiently so as not to cause a significant overhead to the numerical simulation. Previous results indicated that mesh repartitioning and data remapping are potential bottlenecks for performing large-scale scientific calculations. We resolve these issues and demonstrate that our framework remains viable on a large number of processors.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Fast parallel matching in expander graphs 93

Fast parallel matching in expander graphs

引用

proceedings of the 5th annual acm symposium on parallel algorithms and architectures

作者： Kelson, Pierre Univ. of British Columbia Vancouver B.C. Canada

Let iT be a bipartite graph with bipartition (A, B) where \A\-n and every subset X of A with at most a n elements has at least b\X\ neighbors (o 1). We consider the problem of computing a matching from a given subset... 详细信息

ISBN: (纸本)0897915992

Let iT be a bipartite graph with bipartition (A, B) where \A\-n and every subset X of A with at most a n elements has at least b\X\ neighbors (o < 1,6 > 1). We consider the problem of computing a matching from a given subset X A of size at most a n into B. By Hall's theorem such a matching does indeed exist. We propose two algorithms for this problem. The first algorithm is in NC for b > d' for a constant e > 0;here d denotes the maximum degree of a vertex in A. The second algorithm uses randomization and computes a matching for X provided b = Q(diogd). It terminates in 0(log n) steps for constant d and in polylog(n) time for d = O(polylog(n)) (with high probability). This algorithm is a local algorithm in the sense that the vertices in the graph establish the matching themselves in an online fashion. Both algorithms have applications to local and global routing in communication networks. In particular our results improve a construction of a self-routing nonblocking network by Arora, Leighton, and Maggs. © 1993 acm.

关键词： Computation theory

来源：评论

学校读者我要写书评

暂无评论

Foundation for sequentializing parallel code

Foundation for sequentializing parallel code

引用

SPAA '90 - proceedings of the 2nd annual acm symposium on parallel algorithms and architectures

作者： Simons, Barbara Alpern, David Ferrante, Jeanne IBM Almaden Research Cent San Jose United States

ISBN: (纸本)0897913701

The Program Dependence Graph (PDG), which represents the data and the control dependences of a program is considered. Attention is limited to the subgraph of the PDG which contains only the control dependence edges. This subgraph is called the Control Dependence Graph (CDG). The authors formalize what it means for a CDG to have a corresponding sequential version and characterize the conditions under which there is such a corresponding sequentialization not requiring duplication.

关键词： Computer Systems Programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：