检索结果-内蒙古大学图书馆

Proceedings of the twenty-first annual symposium on parallelism in algorithms and architectures

作者： Sarita V. Adve University of Illinois at Urbana-Champaign Urbana IL USA

ISBN: (纸本)9781605586069

The era of parallel computing for the masses is here, but writing correct parallel programs remains a challenge--popular parallel environments offer no analogs for the concepts of structured and safe sequential programming. The memory model forms the heart of the concurrency semantics of any (shared-memory) parallel language or hardware. Unfortunately, it has involved a tradeoff between programmability and performance, and has arguably been one of the most challenging and contentious areas in shared-memory specification. Recent broad community-scale efforts have finally led to a convergence in this debate, with popular languages such as Java and C++ and most hardware vendors publishing compatible memory model specifications. Although this convergence is a dramatic improvement, it has exposed fundamental shortcomings in current popular languages and systems that prevent achieving the vision of structured and safe parallel programming.I will discuss the path to the above convergence, the hard lessons learned, and their implications. A cornerstone of this convergence has been the view that the memory model should be a contract between the programmer and the system--if the programmer writes well-structured (data-race-free) programs, the system will provide high programmability (sequential consistency) and performance. I will discuss why this view is the best we can do with current popular languages, and why it should be unacceptable moving forward. I will then discuss research directions that eliminate worrying about the memory model, but require rethinking popular parallel languages and hardware. In particular, I will argue for languages that eliminate data races by design and provide determinism by default, while retaining the advantages of modern object-oriented programming. I will also argue for hardware that takes advantage of such disciplined programming models to enable energy-efficient performance scalability. I will use the Deterministic parallel Java language

关键词： memory models memory consistency models safe programming multicore architecture

来源：评论

学校读者我要写书评

暂无评论

annual acm symposium on parallelism in algorithms and architectures: Foreword

Annual ACM Symposium on Parallelism in Algorithms and Archit...

引用

annual acm symposium on parallelism in algorithms and architectures 2008年 iii页

作者： Shavit, Nir Tel-Aviv University Sun Labs Israel

来源：评论

学校读者我要写书评

暂无评论

Brief Announcement: Optimal Speedup on a Low-Degree Multi-Core parallel Architecture (LoPRAM)

Brief Announcement: Optimal Speedup on a Low-Degree Multi-Co...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Dorrigiv, Reza Lopez-Ortiz, Alejandro Salinger, Alejandro Univ Waterloo Sch Comp Sci Waterloo ON N2L 3G1 Canada

ISBN: (纸本)9781595939739

Over the last five years, major microprocessor manufacturers have released plans for a rapidly increasing number of cores per microprossesor, with upwards of 64 cores by 2015. In this setting, a sequential RAM computer will no longer accurately reflect the architecture on which algorithms are being executed. In this paper we propose a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures. This model supports a high level of abstraction that simplifies the design and analysis of parallel programs. More importantly we show that in many instances it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms.

关键词： algorithms Theory

来源：评论

学校读者我要写书评

暂无评论

Fundamental parallel algorithms for Private-Cache Chip Multiprocessors 08

Fundamental Parallel Algorithms for Private-Cache Chip Multi...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Arge, Lars Goodrich, Michael T. Nelson, Michael Sitchinava, Nodari Univ Aarhus MADALGO Aarhus Denmark Univ Calif Irvine Irvine CA 92697 USA

ISBN: (纸本)9781595939739

In this paper, we study parallel algorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that are scalable with the number of cores. By focusing on private-cache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs.

关键词： parallel External Memory PEM private-cache CMP

来源：评论

学校读者我要写书评

暂无评论

Processor efficient parallel matching

引用

THEORY OF COMPUTING SYSTEMS 2008年第1期42卷 73-90页

作者： Sankowski, Piotr Warsaw Univ Inst Informat PL-02097 Warsaw Poland

In this paper we consider the problem of finding perfect matchings in parallel. We present a RNC algorithm with almost optimal work with respect to sequential algorithms, i.e., it uses O(n(omega)) processors, where omega is the matrix multiplication exponent. Our algorithm is based on an RNC algorithm for computing determinant of a degree one polynomial matrix which is of independent interest.

关键词： parallel algorithms maximum matching fast matrix multiplication RNC algorithm

来源：评论

学校读者我要写书评

暂无评论

Directed Transmission Method, a Fully Asynchronous Approach to Solve Sparse Linear Systems in parallel 08

Directed Transmission Method, a Fully Asynchronous Approach ...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Wei, Fei Yang, Huazhong Tsinghua Univ Dept Elect Engn Beijing Peoples R China

ISBN: (纸本)9781595939739

There are many algorithms to solve large sparse linear systems in parallel;however, most of them acquire synchronization and thus are lack of scalability. In this paper, we propose a new distributed numerical algorithm, called Directed Transmission Method (DTM). DTM is a fully asynchronous, scalable and continuous-time iterative algorithm to solve the arbitrarily-large sparse linear system whose coefficient matrix is symmetric-positive-definite (SPD). DTM is able to be freely running on the heterogeneous parallel computer with arbitrary number of processors, which might be manycore microprocessors, clusters, grids, clouds, and the Internet. We proved that DTM is convergent by making use of the final value theorem of Laplacian Transformation. Numerical experiments show that DTM is efficient.

关键词： Asynchronous Algorithm Convergence Theory Directed Transmission Method (DTM) Distributed Algorithm Sparse Linear System Virtual Transmission Method (VTM)

来源：评论

学校读者我要写书评

暂无评论

Finding Strongly Connected Components in parallel using O(log² n) Reachability Queries 08

Finding Strongly Connected Components in Parallel using <i>O...

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Schudy, Warren Brown Univ Providence RI 02912 USA

ISBN: (纸本)9781595939739

We give a randomized (Las-Vegas) parallel algorithm for computing strongly connected components of a graph with n vertices and m edges. The runtime is dominated by O(log(2) n) multi-source parallel reachability queries;i.e. O(log(2) n) calls to a subroutine that computes the union of the descendants of a given set of vertices in a given digraph. Our algorithm also topologically sorts the strongly connected components. Using Ullman and Yannakakis's [22] techniques for the reachability subroutine gives our algorithm runtime (O) over tilde (t) using mn/t(2) processors for any (n(2)/m)(1/3) <= t <= n. On sparse graphs, this improves the number of processors needed to compute strongly connected components and topological sort within time n(1/3) <= t <= n from the previously best known (n/t)(3) [20] to (n/t)(2).

关键词： Graph algorithms parallel algorithms Strongly connected components Topological sort Transitive closure bottleneck

来源：评论

学校读者我要写书评

暂无评论

DREADLOCKS: Efficient Deadlock Detection 08

DREADLOCKS: Efficient Deadlock Detection

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Koskinen, Eric Herlihy, Maurice Brown Univ Dept Comp Sci Providence RI 02912 USA

ISBN: (纸本)9781595939739

We present Dreadlocks. an efficient new shared-memory spin lock that actively detects deadlocks. Instead of spinning on a Boolean value, each thread spins on the lock owner's per-thread digest, a compact representation of a portion of the lock's waits-for graph. Digests can be implemented either as bit vectors (for small numbers of threads) or as Bloom filters (for larger numbers of threads). Updates to digests are propagated dynamically as locks are acquired and released. Dreadlocks can be applied to any spin lock algorithm that allows threads to time out. Experimental results show that Dreadlocks outperform timeouts under many circumstances, and almost never do worse.

关键词： Concurrency parallel programming deadlock deadlock detection bloom filters transactional memory

来源：评论

学校读者我要写书评

暂无评论

Checkpoints and Continuations Instead of Nested Transactions 08

Checkpoints and Continuations Instead of Nested Transactions

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Koskinen, Eric Herlihy, Maurice Brown Univ Dept Comp Sci Providence RI 02912 USA

ISBN: (纸本)9781595939739

We present a mechanism for partially aborting transactions through the use of data structure checkpoints and control-flow continuations. In particular, we show that boosted transactions [9] already have built-in restoration points and afford a simple, efficient implementation. Our mechanism is far simpler than previous work, which relied on complex nesting schemes to establish checkpoints. We demonstrate syntactic advantages and we quantify the overhead of checkpoints and explore several examples, illustrating the utility of partially aborting transactions. We additionally present a novel queue-based spin lock which allows threads to timeout and differ in priority. Unlike the known lock due to Craig [5], our lock is more efficient for priority schemes of few levels.

关键词： Concurrency parallel programming transactional memory boosting checkpoints continuations

来源：评论

学校读者我要写书评

暂无评论

parallelizing Dynamic Information Flow Tracking 08

Parallelizing Dynamic Information Flow Tracking

引用

20th acm symposium on parallelism in algorithms and architectures

作者： Ruwase, Olatunji Gibbons, Phillip B. Mowry, Todd C. Ramachandran, Vijaya Chen, Shimin Kozuch, Michael Ryan, Michael Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781595939739

Dynamic information flow tracking (DIFT) is an important tool for detecting common security attacks and memory bugs. A DIFT tool tracks the flow of information through a monitored program's registers and memory locations as the program executes, detecting and containing/fixing problems on-the-fly. Unfortunately, sequential DIFT tools are quite slow, and DIFT is quite challenging to parallelize. In this paper, we present a new approach to parallelizing DIFT-like functionality. Extending our recent work on accelerating sequential DIFT, we consider a variant of DIFT that tracks the information flow only through unary operations (relaxed DIFT), and yet makes sense for detecting security attacks and memory bugs. We present a parallel algorithm for relaxed DIFT, based on symbolic inheritance tracking, which achieves linear speed-up asymptotically. Moreover, we describe techniques for reducing the constant factors, so that speed-ups can be obtained even with just a few processors. We implemented the algorithm in the context of a Log-Based architectures (LBA) system, which provides hardware support for logging a program trace and delivering it to other (monitoring) processors. Our simulation results on SPEC benchmarks and a video player show that our parallel relaxed DIFT reduces the overhead to as low as 1.2X using 9 monitoring cores on a 16-core chip multiprocessor.

关键词： dynamic information flow tracking (DIFT) program monitoring log-based monitoring parallel algorithm taint analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：