检索结果-内蒙古大学图书馆

21st acm symposium on parallelism in algorithms and architectures

作者： Ballard, Grey Demmel, James Holtz, Olga Schwartz, Oded Univ Calif Berkeley Dept Comp Sci Berkeley CA 94720 USA

ISBN: (纸本)9781605586069

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower bounds on the communication cost (both for bandwidth and for latency) of conventional (O(n(3))) matrix multiplication to Cholesky factorization, which is used for solving dense symmetric positive definite linear systems. Second, we compare the cost of various Cholesky decomposition implementations to this lower bound, and draw the following conclusions: (1) "Naive" sequential algorithms for Cholesky attain neither the bandwidth nor latency lower bounds. (2) The sequential blocked algorithm in LAPACK (with the right block size), as well as various recursive algorithms [AP00, GJ01, ACW01, ST04] and one based on work of Toledo [Tol97], can attain the bandwidth lower bound. (3) The LAPACK algorithm can also attain the latency bound if used with blocked data structures rather than column-wise or row-wise matrix data structures, though the Toledo algorithm cannot. (4) The recursive sequential algorithm due to [AP00], attains the bandwidth and latency lower bounds at every level of a multi-level memory hierarchy, in a "cache-oblivious" way. (5) The parallel implementation of Cholesky in the ScaLA-PACK library (again with the right block-size) attains both the bandwidth and latency lower bounds to within a poly-logarithmic factor. Combined with prior results in [DGHL08a, DGHL08b, DGX08] this gives a complete set;of communication-optimal algorithms for O(n(3)) implementations of three basic factorizations of dense linear algebra: LU with pivoting, QR and Cholesky. But it goes beyond this prior work on sequential LU and QR by optimizing communication for any number of levels of memo

关键词： communication avoiding Cholesky decomposition lower bound bandwidth latency algorithm

来源：评论

学校读者我要写书评

暂无评论

Preliminary Results on NB-FEB, a Synchronization Primitive for parallel Programming

Preliminary Results on NB-FEB, a Synchronization Primitive f...

引用

14th acm SIGPLAN symposium on Principles and Practice of parallel Programming

作者： Ha, Phuong Hoai Tsigas, Philippas Anshus, Otto J. Univ Tromso N-9001 Tromso Norway Chalmers Univ Technol Gothenburg Sweden

ISBN: (纸本)9781605583976

We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallel programming on may-core architectures. We show that the NB-FEB primitive is universal, scalable and feasible. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently mitigates performance degradation due to synchronization "hot spots" (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility).

关键词： algorithms Reliability Theory many-core architectures non-blocking synchronization full/empty bit universal primitives combinability

来源：评论

学校读者我要写书评

暂无评论

Small-size ε-Nets for Axis-parallel Rectangles and Boxes

Small-size ε-Nets for Axis-Parallel Rectangles and Boxes

引用

41st annual acm symposium on Theory of Computing

作者： Aronov, Boris Ezra, Esther Sharir, Micha NYU Polytech Inst Dept Comp Sci & Engn Brooklyn NY 11201 USA

ISBN: (纸本)9781605586137

We show the existence of epsilon-nets of size O(1/epsilon log log 1/epsilon) for planar point sets and axis-parallel rectangular ranges. The same bound holds for points in the plane with "fat" triangular ranges, and for point sets in R-3 and axis-parallel boxes;these are the first, known non-trivial bounds for these range spaces. Our technique also yields improved bounds on the size of epsilon-nets in the more general context considered by Chirkson and Varadarajan. For example. we show the existence of, epsilon-nets, of size O(1/epsilon log log log 1/epsilon) for the dual range space of "fat" regions and planar point sets ( where the regions are the ground objects and the ranges are subsets stabbed by points). Plugging our bounds into the, technique of Bronnimann and Goodrich, we obtain improved approximation factors (computable in randomized polynomial time) for the, HITTING SET or Hie SET COVER problems associated with the corresponding range spaces.

关键词： Geometric range spaces randomized algorithms epsilon-nets set cover hitting set

来源：评论

学校读者我要写书评

暂无评论

Distributed and parallel algorithms for Weighted Vertex Cover and other Covering Problems

Distributed and Parallel Algorithms for Weighted Vertex Cove...

引用

28th annual acm symposium on Principles of Distributed Computing

作者： Koufogiannakis, Christos Young, Neal E. Univ Calif Riverside Dept Comp Sci & Engn Riverside CA 92521 USA

ISBN: (纸本)9781605583969

The paper presents distributed and parallel delta-approximation algorithms for covering problems, where delta is the maximum number of variables on which any constraint depends (for example, delta = 2 for VERTEX COVER). Specific results include the following. For WEIGHTED VERTEX COVER, the first distributed 2-approximation algorithm taking O(log n) rounds and the first parallel 2-approximation algorithm in RNC. The algorithms generalize to covering mixed integer linear programs (CMIP) with two variables per constraint (delta = 2). For any covering problem with monotone constraints and submodular cost, a distributed delta-approximation algorithm taking O(log(2) vertical bar C vertical bar) rounds, where vertical bar C vertical bar is the number of constraints. (Special cases include CMIP, facility location, and probabilistic (two-stage) variants of these problems.)

关键词： distributed vertex cover distributed covering

来源：评论

学校读者我要写书评

暂无评论

The Work of Leslie Valiant 09

The Work of Leslie Valiant

引用

41st annual acm symposium on Theory of Computing

作者： Wigderson, Avi Inst Adv Study Sch Math Princeton NJ 08540 USA

ISBN: (纸本)9781605586137

On Saturday, May 30, one day before the start of the regular STOC 2009 program, a workshop vas held in celebration of Leslie Valiant's 60th birthday. Talks were given by Jin-Yi Cai, Stephen Cook, Vitaly Feldman, Mark Jerrum, Michael Kearns, Mike Paterson, Michael Rabin, Rocco Servedio, Paul Valiant, Vijay Vazirani, and Avi Wigderson. The workshop was organized by Michael Kearns, Rocco Servedio, and Salil Vadhan, with support from the STOC local arrangements team and program committee. To accompany the workshop, here we briefly survey Valiant's many fundamental contributions to the theory of computing.

关键词： holographic algorithms learning theory cognitive theory parallel computing computational complexity

来源：评论

学校读者我要写书评

暂无评论

On the Price of Heterogeneity in parallel Systems

引用

THEORY OF COMPUTING SYSTEMS 2009年第2期45卷 280-301页

作者： Godfrey, P. Brighten Karp, Richard M. Univ Calif Berkeley Div Comp Sci Berkeley CA 94720 USA

Suppose we have a parallel or distributed system whose nodes have limited capacities, such as processing speed, bandwidth, memory, or disk space. How does the performance of the system depend on the amount of heterogeneity of its capacity distribution? We propose a general framework to quantify the worst-case effect of increasing heterogeneity in models of parallel systems. Given a cost function g(C,W) representing the system's performance as a function of its nodes' capacities C and workload W (such as the makespan of an optimum schedule of jobs W on machines C), we say that g has price of heterogeneity alpha when for any workload, cost cannot increase by more than a factor alpha if node capacities become arbitrarily more heterogeneous. The price of heterogeneity also upper bounds the "value of parallelism": the maximum benefit obtained by increasing parallelism at the expense of decreasing processor speed. We give constant or logarithmic bounds on the price of heterogeneity of several well-known job scheduling and graph degree/diameter problems, indicating that in many cases, increasing heterogeneity can never be much of a disadvantage.

关键词： Heterogeneity parallel systems Scheduling Makespan Precedence constrained scheduling

来源：评论

学校读者我要写书评

暂无评论

On the Price of Heterogeneity in parallel Systems

On the Price of Heterogeneity in Parallel Systems

引用

18th annual acm symposium on parallelism in algorithms and architectures

作者： Godfrey, P. Brighten Karp, Richard M. Univ Calif Berkeley Div Comp Sci Berkeley CA 94720 USA

关键词： Heterogeneity parallel systems Scheduling Makespan Precedence constrained scheduling

来源：评论

学校读者我要写书评

暂无评论

Celling SHIM: Compiling deterministic concurrency to a heterogeneous multicore 09

Celling SHIM: Compiling deterministic concurrency to a heter...

引用

24th annual acm symposium on Applied Computing, SAC 2009

作者： Vasudevan, Nalini Edwards, Stephen A. Columbia University New York NY United States

ISBN: (纸本)9781605581668

parallel architectures are the way of the future, but are notoriously difficult to program. In addition to the low-level constructs they often present (e.g., locks, DMA, and non-sequential memory models), most parallel programming environments admit data races: the environment may make nondeterministic scheduling choices that can change the function of the program. We believe the solution is model-based design, where the programmer is presented with a constrained higher-level language that prevents certain unwanted behavior. In this paper, we describe a compiler for the SHIM scheduling-independent concurrent language that generates code for the Cell Broadband heterogeneous multicore processor. The complexity of the code our compiler generates relative to the source illustrates how difficult it is to manually write code for the Cell. We demonstrate the efficacy of our compiler on two examples. While the SHIM language is (by design) not ideal for every algorithm, it works well for certain applications and simplifies the parallel programming process, especially on the Cell architecture. Copyright 2009 acm.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

Coloring Unstructured Wireless Multi-Hop Networks

Coloring Unstructured Wireless Multi-Hop Networks

引用

28th annual acm symposium on Principles of Distributed Computing

作者： Schneider, Johannes Wattenhofer, Roger ETH Comp Engn & Networks Lab CH-8092 Zurich Switzerland

ISBN: (纸本)9781605583969

We present a randomized coloring algorithm for the unstructured radio network model, a model comprising autonomous nodes, asynchronous wake-up, no collision detection and an unknown but geometric network topology. The current state-of-the-art coloring algorithm needs with high probability O(Delta . log n) time and uses O(Delta) colors, where n and Delta are the number of nodes in the network and the maximum degree, respectively;this algorithm requires knowledge of a linear bound on n and Delta. We improve this result in three ways: Firstly, we improve the time complexity, instead of the logarithmic factor we just need a polylogarithmic additive term, more specifically, our time complexity is O(Delta + log Delta . log n) given an estimate of n and Delta, and O(Delta + log(2) n) without knowledge of Delta. Secondly, our vertex coloring algorithm needs Delta + 1 colors only. Thirdly, our algorithm manages to do a distance-d coloring with asymptotically optimal O(Delta) colors for a constant d.

关键词： Ad Hoc Networks Sensor Networks Unstructured Radio Networks Unit Disk Graphs Growth Bounded Graphs Bounded Independence Graphs Local algorithms parallel algorithms Distributed algorithms Coloring

来源：评论

学校读者我要写书评

暂无评论

Latency-aware leader election 09

Latency-aware leader election

引用

24th annual acm symposium on Applied Computing, SAC 2009

作者： Santos, Nuno Hutle, Martin Schiper, André 1015 Lausanne Switzerland

ISBN: (纸本)9781605581668

Experimental studies have shown that electing a leader based on measurements of the underlying communication network can be beneficial. We use this approach to study the problem of electing a leader that is eventually not only correct (as captured by the Ω failure detector abstraction), but also optimal with respect to the transmission delays to its peers. We give the definitions of this problem and a suitable model, thus allowing us to make an analytical analysis of the problem, which is in contrast to previous work on that topic. Copyright 2009 acm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：