检索结果-内蒙古大学图书馆

36th annual International symposium on Computer Architecture

作者： Bhattacharjee, Abhishek Martonosi, Margaret Princeton Univ Dept Elect Engn Princeton NJ 08544 USA

ISBN: (纸本)9781605585260

With the shift towards chip multiprocessors (CMPs), exploiting and managing parallelism has become a central problem in computer systems. Many issues of parallelism management boil down to discerning which running threads or processes are critical, or slowest, versus which are non-critical If one can accurately predict critical threads in a parallel program, then one can respond in a variety of ways. Possibilities include running the critical thread at a faster clock rate, performing load balancing techniques to offload work onto currently non-critical threads, or giving the critical thread more on-chip resources to execute faster. This paper proposes and evaluates simple but effective thread criticality predictors for parallel applications. We show that accurate predictors can be built using counters that are typically already available on-chip. Our predictor, based on memory hierarchy statistics, identifies thread criticality with an average accuracy of 93% across a range of architectures. We also demonstrate two applications of our predictor. First, we show how Intel's Threading Building Blocks (TBB) parallel runtime system can benefit from task stealing techniques that use our criticality predictor to reduce load imbalance. Using criticality prediction to guide TBB's task-stealing decisions improves performance by 13-32% for TBB-based PARSEC benchmarks running on a 32-core CMP. As a second application, criticality prediction guides dynamic energy optimizations in barrier-based applications. By running the predicted critical thread at the full clock rate and frequency-scaling non-critical threads, this approach achieves average energy savings of 15% while negligibly degrading performance for SPLASH-2 and PARSEC benchmarks.

关键词： Thread Criticality Prediction parallel Processing Intel TBB DVFS Caches

来源：评论

学校读者我要写书评

暂无评论

A New Approach to parallelising Tracing algorithms 09

A New Approach to Parallelising Tracing Algorithms

引用

International Conference on Memory Management

作者： Oancea, Cosmin E. Mycroft, Alan Watt, Stephen M. Univ Cambridge Comp Lab Cambridge CB2 3QG England

ISBN: (纸本)9781605583471

Tracing algorithms visit reachable nodes in a graph and are central to activities such as garbage collection, marshalling etc. Traditional sequential algorithms use a worklist, replacing a nodes with their unvisited children. Previous work on parallel tracing is processor-oriented in associating one worklist per processor: worklist insertion and removal requires no locking, and load balancing requires only occasional locking. However, since multiple queues may contain the same node, significant locking is necessary to avoid concurrent visits by competing processors. This paper presents a memory-oriented solution: memory is partitioned into segments and each segment has its own worklist containing only nodes in that segment. At a given time at most one processor owns a given worklist. By arranging separate single-readersingle-writer forwarding queues to pass nodes from processor i to processor j we can process objects in an order that gives lock-free mainline code and improved locality of reference. This refactoring is analogous to the way in which a compiler changes an iteration space to eliminate data dependencies. While it is clear that our solution can be more effective on NUMA systems, and even necessary when processor-local memory may not be addressed from other processors, slightly surprisingly, it often gives significantly better speed-up on modem multi-cores architectures too. Using caches to hide memory latency loses much of its effectiveness when there is significant cross-processor memory contention or when locking is necessary.

关键词： memory-centric tracing algorithm parallel

来源：评论

学校读者我要写书评

暂无评论

SPAA'09: Proceedings of the Twenty-First annual symposium on parallelism in algorithms and architectures - Foreword

Annual ACM Symposium on Parallelism in Algorithms and Archit...

引用

annual acm symposium on parallelism in algorithms and architectures 2009年 iii页

作者： Bender, Michael A. Stony Brook University Tokutek Inc. United States

来源：评论

学校读者我要写书评

暂无评论

The Theory of Deadlock Avoidance via Discrete Control 09

The Theory of Deadlock Avoidance via Discrete Control

引用

36th acm SIGPLAN-SIGACT symposium on Principles of Programming Languages

作者： Wang, Yin Lafortune, Stephane Kelly, Terence Kudlur, Manjunath Mahlke, Scott Univ Michigan Ann Arbor MI 48109 USA

ISBN: (纸本)9781605583792

Deadlock in multithreaded programs is an increasingly important problem as ubiquitous multicore architectures force parallelization upon an ever wider range of software. This paper presents a theoretical foundation for dynamic deadlock avoidance in concurrent programs that employ conventional mutual exclusion and synchronization primitives (e. g., multithreaded C/Pthreads programs). Beginning with control flow graphs extracted from program source code, we construct a formal model of the program and then apply Discrete Control Theory to automatically synthesize deadlock-avoidance control logic that is implemented by program instrumentation. At run time, the control logic avoids deadlocks by postponing lock acquisitions. Discrete Control Theory guarantees that the program instrumented with our synthesized control logic cannot deadlock. Our method furthermore guarantees that the control logic is maximally permissive: it postpones lock acquisitions only when necessary to prevent deadlocks, and therefore permits maximal run-time concurrency. Our prototype for C/Pthreads scales to real software including Apache, OpenLDAP, and two kinds of benchmarks, automatically avoiding both injected and naturally occurring deadlocks while imposing modest runtime overheads.

关键词： algorithms Languages Theory Verification Dynamic Deadlock Avoidance Discrete Control Theory Concurrent Programming parallel Programming Multithreaded Programming Multicore Processors

来源：评论

学校读者我要写书评

暂无评论

Optimizing techniques for saturated arithmetic with first-order linear recurrence 09

Optimizing techniques for saturated arithmetic with first-or...

引用

24th annual acm symposium on Applied Computing, SAC 2009

作者： Zhang, Weihua Liu, Lili Zhang, Chen Zhang, Hongjiong Zang, Binyu Zhu, Chuanqi Parallel Processing Institute Fudan University Shanghai China

ISBN: (纸本)9781605581668

Saturated arithmetic is a typical operation in multimedia applications, most multimedia extensions in the instruction set architecture (ISA) of modern processors provide saturation instructions for such operation. Therefore, extensive researches have focused on how to utilize saturation instructions to optimize programs. Previous algorithms mainly focus on purely saturated arithmetic, however saturated arithmetic is often mingled with first-order linear recurrence (FOLR) in real life applications. When FLOR pattern appears in the program, previous algorithms can not identify the saturated arithmetic as well. In fact, the saturated arithmetic with FOLR (SAWF) is a new and significant pattern, especially, SAWF with one as coefficient is frequently used in multimedia applications. Hence, it is necessary to explore a method with which such pattern can be efficiently vectorized. This paper discusses how to vectorize SAWF, explores the efficient method to vectorize SAWF with one as coefficient and gives its evaluation and implement a library for the optimizing technique. Such an implementation manner can make compilers are able to exploit it more easily. The experimental results shows the optimizing technique can achieve a speedup of 1.19 to 1.46 on Pentium IV processor. At the same time, the optimizing techniques in this paper can also be used to develop a library for SAWF so a programmer can benefit even without changing the compiler. Copyright 2009 acm.

关键词： Computer architecture

来源：评论

学校读者我要写书评

暂无评论

Preliminary Results on NB-FEB, a Synchronization Primitive for parallel Programming

引用

acm SIGPLAN NOTICES 2009年第4期44卷 295-296页

作者： Ha, Phuong Hoai Tsigas, Philippas Anshus, Otto J. Univ Tromso N-9001 Tromso Norway Chalmers Univ Technol Gothenburg Sweden

We introduce a non-blocking full/empty bit primitive, or NB-FEB for short, as a promising synchronization primitive for parallel programming on may-core architectures. We show that the NB-FEB primitive is universal, scalable and feasible. NB-FEB, together with registers, can solve the consensus problem for an arbitrary number of processes (universality). NB-FEB is combinable, namely its memory requests to the same memory location can be combined into only one memory request, which consequently mitigates performance degradation due to synchronization "hot spots" (scalability). Since NB-FEB is a variant of the original full/empty bit that always returns a value instead of waiting for a conditional flag, it is as feasible as the original full/empty bit, which has been implemented in many computer systems (feasibility).

关键词： algorithms Reliability Theory many-core architectures non-blocking synchronization full/empty bit universal primitives combinability

来源：评论

学校读者我要写书评

暂无评论

Session details: High-performance parallel computation 09

Session details: High-performance parallel computation

引用

Proceedings of the twenty-first annual symposium on parallelism in algorithms and architectures

作者： Vitus Leung Sandia National Labs

No abstract available.

ISBN: (纸本)9781605586069

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Session details: Brief announcements: performance of parallel algorithm 09

Session details: Brief announcements: performance of paralle...

引用

Proceedings of the twenty-first annual symposium on parallelism in algorithms and architectures

作者： Jeremy Fineman Massachusetts Institute of Technology

No abstract available.

ISBN: (纸本)9781605586069

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Improved Approximation algorithms for Scheduling with Fixed Jobs 09

Improved Approximation Algorithms for Scheduling with Fixed ...

引用

annual acm-Society for Industrial and Applied Mathmatics symposium on Discrete algorithms

作者： Florian Diedrich Klaus Janseny Institut fur Informatik Universitat zu Kiel

We study two closely related problems in non-preemptive scheduling of sequential jobs on identical parallel machines. In these two settings there are either fixed jobs or nonavailability intervals during which the machines are not available; in either case, the objective is to minimize the makespan. Both formulations have different applications, e.g. in turnaround scheduling or overlay computing. For both problems we contribute approximation algorithms with an improved ratio of 3/2 + ε, respectively. For scheduling with fixed jobs, a lower bound of 3/2 on the approximation ratio has been obtained by Scharbrodt, Steger&Weisser; for scheduling with non-availability we provide the same lower bound. In total, our approximation ratio for both problems is essentially tight via suitable inapproximability results. We use dual approximation, creation of a gap structure and job configurations, and a PTAS for the multiple subset sum problem. However, the main feature of our algorithms is a new technique for the assignment of large jobs via flexible rounding. Our new technique is based on an interesting cyclic shifting argument in combination with a network flow model for the assignment of jobs to large gaps.

关键词： Approximation algorithms jobs gap structure new technique Flow models parallel machines Fixed

来源：评论

学校读者我要写书评

暂无评论

Emerging challenges and opportunities in parallel computing: the cretaceous redux? 09

Emerging challenges and opportunities in parallel computing:...

引用

Proceedings of the twenty-first annual symposium on parallelism in algorithms and architectures

作者： Bruce Hendrickson Sandia National Laboratories Albuquerque NM USA

ISBN: (纸本)9781605586069

Despite decades of activity, parallel computing remains immature. Like much of computer science, advances in the field are driven by a mixture of theoretical insights and technological advances. But in parallel computing, the gap between theory and practice remains disconcertingly *** theoretical concepts in parallel computing were developed in the seventies including P-completeness, PRAMs, boolean circuits, and more. A rich set of algorithms and complexity classes was built on top of these models, and these ideas influenced early thinking about parallel machines. But applied parallel computing today is little in uenced by these *** the past twenty years computational scientists and engineers have embraced parallelism as an essential computational tool for solving ever larger and more complex problems. By the mid-nineties, the computational science community had settled on a computing paradigm involving distributed memory machines programmed with explicit message passing. Although not elegant, this approach allows for portability and the use of inexpensive commodity hardware. This hardware is sufficiently different from the abstract models proposed in the seventies that insights from the theory community have had minimal impact on mainstream parallel computing practice.A variety of alternative theoretical models have been proposed in the interim that are more reflective of hardware realities than their counterparts from the seventies. These include the Bulk Synchronous Processing and LogP complexity models as well as computational abstractions like Active Messages partitioned global address space languages. But these concepts and tools have been consigned to niche roles within the larger MPI-dominated parallel computing ***, several forces are rearranging the parallel computing landscape and will raise the importance of ideas and techniques from the theory community. The most familiar change is the rise of multi-core processors. The need for n

关键词： parallel computing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：