检索结果-内蒙古大学图书馆

4th acm Federated Computing Research Conference, FCRC 2003

作者： Naor, Moni Wieder, Udi Weizmann Institute of Science

Novel architectures for P2P applications were discussed. A distributed hash table (DHT) approach is studied. The probability a processor participates in a random lookup should be small. The memory requirements from ea... 详细信息

关键词： Computer networks

来源：评论

学校读者我要写书评

暂无评论

Modeling the benefits of mixed data and task parallelism 95

Modeling the benefits of mixed data and task parallelism

引用

Proceedings of the 7th annual acm symposium on parallel algorithms and architectures, SPAA'95

作者： Chakrabarti, Soumen Demmel, James Yelick, Katherine U.C. Berkeley Berkeley CA United States

ISBN: (纸本)9780897917179

Mixed task and data parallelism exists naturally in many applications, but utilizing it may require sophisticated scheduling algorithms and software support. Recently, significant research effort has been applied to exploiting mixed parallelism in both theory and systems communities. In this paper, we ask how much mixed parallelism will improve performance in practice, and how architectural evolution impacts these estimates. First, we build and validate a performance model for a class of mixed task and data parallel problems based on machine and problem parameters. Second, we use this model to estimate the gains from mixed parallelism for some scientific applications on current machines. This quantifies our intuition that mixed parallelism is best when either communication is slow or the number of processors is large. Third, we show that, for balanced divide and conquer trees, a simple one-time switch between data and task parallelism gets most of the benefit of general mixed parallelism. Fourth, we establish upper bounds to the benefits of mixed parallelism for irregular task graphs. Apart from these detailed analyses, we provide a framework in which other applications and machines can be evaluated.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Brief announcement: Strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds 12

Brief announcement: Strong scaling of matrix multiplication ...

引用

24th acm symposium on parallelism in algorithms and architectures, SPAA'12

作者： Ballard, Grey Demmel, James Holtz, Olga Lipshitz, Benjamin Schwartz, Oded UC Berkeley United States TU Berlin Germany

ISBN: (纸本)9781450312134

A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ballard, Demmel, Holtz, Lipshitz, and Schwartz, 2012). Both algorithms scale perfectly, but only up to some number of processors where the inter-processor communication no longer scales. We obtain a memory-independent communication cost lower bound on classical and Strassen-based distributed-memory matrix multiplication algorithms. These bounds imply that no classical or Strassen-based parallel matrix multiplication algorithm can strongly scale perfectly beyond the ranges already attained by the two parallel algorithms mentioned above. The memory-independent bounds and the strong scaling bounds generalize to other algorithms. Copyright is held by the author/owner(s).

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

From algorithm parallelism to instruction-level parallelism: An encode-decode chain using prefix-sum

From algorithm parallelism to instruction-level parallelism:...

引用

Proceedings of the 1997 9th annual acm symposium on parallel algorithms and architectures, SPAA

作者： Vishkin, Uzi

A novel comprehensive and coherent approach for the purpose of increasing instruction-level parallelism (ILP) is devised. The key new tool in our envisioned system update is the addition of a parallel prefix-sum (PS) instruction, which will have efficient implementation in hardware, to the instruction-set architecture. This addition gives for the first time a concrete way for recruiting the whole knowledge base of parallel algorithms for that purpose. The potential increase in ILP is demonstrated by experimental results for a test application. The main technical contribution is in the form of a `completeness theorem'. Perhaps surprisingly, the current abstract proves that in an envisioned system which employs parallel PS functional units, a proper use of a serial programming language suffices for the following. With a moderate effort, one can program a parallel algorithm (in a serial language), so that a parallelizing compiler (even without run-time methods!) will be able to extract the same (i.e., `complete') ILP from such serial code as from code written in a parallel language. Alternatively, rather than have the programmer produce the serial code, a precompiler could derive it from a parallel language. The most interesting idea in the proof is the reliance on the new parallel PS for circumventing collision-ambiguity in references to memory. Other new ideas in the paper include hardware-design of a prefix-sum unit and an on-line algorithm for high-bandwidth register-files. An informal upshot of this paper is the following general insight: to accommodate parallelism in uniprocessor systems (from algorithms to ILP), it is sufficient to only add (and, of course, incorporate) parallel prefix-sum functional units to standard serial system designs.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Randomized parallel algorithm for planar graph isomorphism

Randomized parallel algorithm for planar graph isomorphism

引用

SPAA '90 - Proceedings of the 2nd annual acm symposium on parallel algorithms and architectures

作者： Gazit, Hillel Reif, John H. Duke Univ Durham United States

ISBN: (纸本)0897913701

We present a parallel randomized algorithm for finding if two planar graphs are isomorphic. Assuming that we have a tree of separators for each planar graph, our algorithm takes O(log(n)) time with P = (n1.5·√log(n)) processors with probability to fail of 1/n or less, where n is the number of vertices. The algorithms needs 2·log(m)·log(n) + O(log(n)) random bits. The number of random bits can be decreased to O(log(n)) by increasing the processors number to n3/2+Ε. This algorithm significantly improves the previous results of n4 processors.

关键词： Computer Programming

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: parallel Dynamic Tree Contraction via Self-Adjusting Computation 17

Brief Announcement: Parallel Dynamic Tree Contraction via Se...

引用

29th acm symposium on parallelism in algorithms and architectures (SPAA)

作者： Acar, Umut A. Aksenov, Vitaly Westrick, Sam Carnegie Mellon Univ Pittsburgh PA 15213 USA INRIA Paris France ITMO Univ St Petersburg Russia

ISBN: (纸本)9781450345934

Dynamic algorithms are used to compute a property of some data while the data undergoes changes over time. Many dynamic algorithms have been proposed but nearly all are sequential. In this paper, we present our ongoing work on designing a parallel algorithm for the dynamic trees problem, which requires computing a property of a forest as the forest undergoes changes. Our algorithm allows insertion and/or deletion of both vertices and edges anywhere in the input and performs updates in parallel. We obtain our algorithm by applying a dynamization technique called self-adjusting computation to the classic algorithm of Miller and Reif for tree contraction.

关键词： change propagation tree contraction dynamic parallel self-adjusting computation

来源：评论

学校读者我要写书评

暂无评论

Near Linear-Work parallel SDD Solvers, Low-Diameter Decomposition, and Low-Stretch Subgraphs 11

Near Linear-Work Parallel SDD Solvers, Low-Diameter Decompos...

引用

23rd annual symposium on parallelism in algorithms and architectures

作者： Blelloch, Guy E. Gupta, Anupam Koutis, Ioannis Miller, Gary L. Peng, Richard Tangwongsan, Kanat Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781450307437

This paper presents the design and analysis of a near linear-work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input of a SDD n-by-n matrix A with m non-zero entries and a vector h, our algorithm computes a vector (x) over tilde;such that parallel to(x) over tilde - A(+)b parallel to A <= epsilon . parallel to A(+)b parallel to A in O(m log(O(1)) n log 1/epsilon) work and O(m(1/3+theta) log 1/epsilon) depth for any fixed theta > 0. The algorithm relies on a parallel algorithm for generating low-stretch spanning trees or spanning subgraphs. To this end, we first develop a parallel decomposition algorithm that in polylogarithmic depth and (O) over tilde vertical bar E vertical bar) work(1), partitions a graph into components with polylogarithmic diameter such that only a small fraction of the original edges are between the components. This can be used to generate low-stretch spanning trees with average stretch O(n(alpha)) in O(n(1+alpha)) work and O(n(alpha)) depth. Alternatively, it can be used to generate spanning subgraphs with polylogarithmic average stretch in O(vertical bar E vertical bar) work and polylogarithmic depth. We apply this subgraph construction to derive our solver. By using the linear system solver in known applications, our results imply improved parallel randomized algorithms for several problems, including single-source shortest paths, maximum flow, min-cost flow, and approximate max-flow.

关键词： parallel algorithms linear systems low-stretch spanning trees low-stretch subgraphs low-diameter decomposition

来源：评论

学校读者我要写书评

暂无评论

On parallel evaluation of game trees 89

On parallel evaluation of game trees

引用

1st annual acm symposium on parallel algorithms and architectures, SPAA 1989

作者： Karp, Richard M. Zhang, Yanjun Computer Science Division University of California BerkeleyCA94720 United States

ISBN: (纸本)089791323X

We present parallel algorithms for evaluating game trees. These algorithms parallelize the "left-To-right" sequential algorithm for evaluating AND]OR trees and the α-β pruning procedure for evaluating MIN/MAX trees. We show that, on every instance of a uniform tree, these parallel algorithms achieve a linear speed-up over their corresponding sequential algorithms, if the number of processors used is close to the height of the input tree. These are the first non-Trivial deterministic speed-up bounds known for the "left-To-right" algorithm and the α-β pruning procedure. © 1989 acm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

High-probability parallel transitive closure algorithms 90

High-probability parallel transitive closure algorithms

引用

SPAA '90 - Proceedings of the 2nd annual acm symposium on parallel algorithms and architectures

作者： Ullman, Jeffrey D. Yannakakis, Mihalis Stanford Univ Stanford KY United States

ISBN: (纸本)0897913701

The authors address the apparently difficult problem of doing parallel transitive closure when the (directed) graph is sparse and/or, only single-source information is desired. O(e) work is their target for the single-source problem. When the graph is sparse, then the all-pairs transitive closure problem can be solved by performing a depth-first search from each node, taking O(ne) time;that is their target for the all-pairs problem. The authors do not reach either target, except for the all-pairs case when e is fairly large. However, they make significant progress, in the sense that they have the first algorithms that simultaneously use time much less than linear and use work that is less than M(n).

关键词： Computer programming

来源：评论

学校读者我要写书评

暂无评论

Efficient automatic simulation of parallel computation on networks of workstations

引用

DISCRETE APPLIED MATHEMATICS 2006年第10期154卷 1500-1509页

作者： Kaklamanis, Christos Krizanc, Danny Montangero, Manuela Persiano, Giuseppe Univ Modena & Reggio Emilia Dipartimento Ingn Informaz I-41100 Modena Italy Univ Patras Inst Comp Technol GR-26500 Patras Greece Univ Patras Dept Comp Engn & Informat GR-26500 Patras Greece Wesleyan Univ Dept Math & Comp Sci Middletown CT 06459 USA Univ Salerno Dipartimento Informat & Applicaz I-84081 Baronissi Italy

Andrews et al. [Automatic method for hiding latency in high bandwidth networks, in: Proceedings of the acm symposium on Theory of Computing, 1996, pp. 257-265;Improved methods for hiding latency in high bandwidth networks, in: Proceedings of the Eighth annual acm symposium on parallel algorithms and architectures, 1996, pp. 52-61] introduced a number of techniques for automatically hiding latency when performing simulations of networks with unit delay links on networks with arbitrary unequal delay links. In their work, they assume that processors of the host network are identical in computational power to those of the guest network being simulated. They further assume that the links of the host are able to pipeline messages, i.e., they are able to deliver P packets in time O(P + d) where d is the delay on the link. In this paper we examine the effect of eliminating one or both of these assumptions. In particular, we provide an efficient simulation of a linear array of homogeneous processors connected by unit-delay links on a linear array of heterogeneous processors connected by links with arbitrary delay. We show that the slowdown achieved by our simulation is optimal. We then consider the case of simulating cliques by cliques;i.e., a clique of heterogeneous processors with arbitrary delay links is used to simulate a clique of homogeneous processors with unit delay links. We reduce the slowdown from the obvious bound of the maximum delay link to the average of the link delays. In the case of the linear array we consider both links with and without pipelining. For the clique simulation the links are not assumed to support pipelining. The main motivation of our results (as was the case with Andrews et al.) is to mitigate the degradation of performance when executing parallel programs designed for different architectures on a network of workstations (NOW). In such a setting it is unlikely that the links provided by the NOW will support pipelining and it is quite probab

关键词： parallel computation distributed computation automatic simulation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：