检索结果-内蒙古大学图书馆

8th acm sigplan symposium on the principles and practice of parallel Computing

作者： Rencuzogullari, U Dwarkadas, S Univ Rochester Dept Comp Sci Rochester NY 14627 USA

ISBN: (纸本)9781581133462

Networks of workstations (NOWs), which are generally composed of autonomous compute elements networked together, axe an attractive parallel computing platform since they offer high performance at low cost. The autonomous nature of the environment, however, often results in inefficient utilization due to load imbalances caused by three primary factors: 1) unequal load (compute or communication) assignment to equally-powerful compute nodes, 2) unequal resources at compute nodes, and 3) multiprogramming. These load imbalances result in idle waiting time on cooperating processes that need to synchronize or communicate data. Additional waiting time may result due to local scheduling decisions in a multiprogrammed environment. In this paper, we present a combined approach of compile-time analysis, run-time load distribution, and operating system scheduler cooperation for improved utilization of available resources in an autonomous NOW. The techniques we propose allow efficient resource utilization by taking into consideration all three causes of load imbalance in addition to locality of access in the process of load distribution. The resulting adaptive load distribution and cooperative scheduling system allows applications to take advantage of parallel resources when available by providing better performance than when the loaded resources axe not used at all.

关键词： unbalanced load Load distribution parallel PROCESSING (COMPUTERS) networks of workstations Workstations autonomous network

来源：评论

学校读者我要写书评

暂无评论

Contention elimination by replication of sequential sections in distributed shared memory programs 01

Contention elimination by replication of sequential sections...

引用

8th acm sigplan symposium on principles and practice of parallel programming

作者： Lu, H. Cox, A.L. Zwaenepoel, W. Department of Computer Science Rice University Houston TX 77005 United States

ISBN: (纸本)9781581133462

In shared memory programs contention often occurs at the transition between a sequential and a parallel section of the code. As all threads start executing the parallel section, they often access data just modified by the thread that executed the sequential section, causing a flurry of data requests to converge on that processor. We address this problem in a software distributed shared memory system by replicating the execution of the sequential sections on all processors. Communication during this replicated sequential execution is reduced by using multicast. We have implemented replicated sequential execution with multicast support in OpenMP/NOW, a version of of OpenMP that runs on networks of workstations. We do not rely on compile-time data analysis, and therefore we can handle irregular and pointer-based applications. We show significant improvement for two pointer-based applications that suffer from severe contention without replicated sequential execution.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Accurate data redistribution cost estimation in software distributed shared memory systems 01

Accurate data redistribution cost estimation in software dis...

引用

8th acm sigplan symposium on principles and practice of parallel programming

作者： Morris III, D.G. Lowenthal, D.K. Hewlett-Packard Company 19447 Pruneridge Avenue Cupertino CA 95014 United States

ISBN: (纸本)9781581133462

Distributing data is one of the key problems in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in programs where data redistribution between computational phases is considered. The global data distribution problem is to find the optimal distribution in multi-phase parallel programs. Solving this problem requires accurate knowledge of data redistribution cost. We are investigating this problem in the context of a software distributed shared memory (SDSM) system, in which obtaining accurate redistribution cost estimates is difficult. This is because SDSM communication is implicit: It depends on access patterns, page locations, and the SDSM consistency protocol. We have developed integrated compile- and run-time analysis for SDSM systems to determine accurate redistribution cost estimates with low overhead. Our resulting system, SUIF-Adapt, can efficiently and accurately estimate execution time, including redistribution, to within 5% of the actual time in all of our test cases and is often much closer. These precise costs enable SUIF-Adapt to find efficient global data distributions in multiple-phase programs.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Reference idempotency analysis: A framework for optimizing speculative execution

Reference idempotency analysis: A framework for optimizing s...

引用

8th acm sigplan symposium on principles and practice of parallel programming

作者： Seon Wook Kim Ooi, C.-L. Eigenmann, R. Falsafi, B. Vijaykumar, T.N. Intel Corp. Champaign IL United States

Recent proposals for multithreaded architectures allow threads with unknown dependences to execute speculatively in parallel. These architectures use hardware speculative storage to buffer uncertain data, track data dependences and roll back incorrect executions. Because all memory references access the speculative storage, current proposals implement this storage using small memory structures for fast access. The limited capacity of the speculative storage causes considerable performance loss due to speculative storage overflow whenever a thread's speculative state exceeds the storage capacity. Larger threads exacerbate the over-flow problem but are preferable to smaller threads, as larger threads uncover more parallelism. In this paper, we discover a new program property called memory reference idempotency. Idempotent references need not be tracked in the speculative storage, and instead can directly access non-speculative storage (i.e., the conventional memory hierarchy). Thus, we reduce the demand fo r speculative storage space. We define a formal framework for reference idempotency and present a novel compiler-assisted speculative execution model. We prove the necessary and sufficient conditions for reference idempotency using our model. We present a compiler algorithm to label idempotent memory references for the hardware. Experimental results show that for our benchmarks, over 60% of the references in non-parallelizable program sections are idempotent.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Evaluation of computing paradigms for n-body simulations on distributed memory architectures

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 25-36页

作者： McCurdy, Collin Mellor-Crummey, John Univ of Wisconsin Madison United States

The efficiency of HPF with respect to irregular applications is still largely unproven. While recent work has shown that a highly irregular hierarchical n-body force calculation method can be implemented in HPF, we have found that the implementation contains inefficiencies which cause it to run up to a factor of three times slower than our hand-coded, explicitly parallel implementation. Our work examines these inefficiencies, determines that most of the extra overhead is due to a single aspect of the communication strategy, and demonstrates that fixing the communication strategy can bring the overheads of the HPF application to within 25% of those of the hand-coded version.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Dynamic instrumentation of threaded applications

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 49-59页

作者： Xu, Zhichen Miller, Barton P. Naim, Oscar Univ of Wisconsin Madison WI United States

The design of non-trace based performance instrumentation techniques for threaded programs is investigated to provide detailed performance data while maintaining control of instrumentation costs. The design is based on low contention data structures. The Paradyn's dynamic instrumentation is extended to handle threaded programs. To associate data with individual threads, all threads must share the same instrumentation code and assign each thread with its own private copy of performance counters or timers. The asynchrony in a threaded program poses a major challenge to dynamic instrumentation.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Code motion for explicitly parallel programs

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 13-24页

作者： Knoop, Jens Steffen, Bernhard Universitaet Dortmund Dortmund Germany

In comparison to automatic parallelization, which is thoroughly studied in the literature, classical analyses and optimizations of explicitly parallel programs were more or less neglected. This may be due to the fact that naive adaptations of the sequential techniques fail, and their straightforward correct ones have unacceptable costs caused by the interleavings, which manifest the possible executions of a parallel program. Recently, however, we showed that unidirectional bitvector analyses can be performed for parallel programs as easily and as efficiently as for sequential ones, a necessary condition for the successful transfer of the classical optimizations to the parallel setting. In this article we focus on possible subsequent code motion transformations, which turn out to require much more care than originally conjectured. Essentially, this is due to the fact that interleaving semantics, although being adequate for correctness considerations, fails when it comes to reasoning about efficiency of parallel programs. This deficiency, however, can be overcome by strengthening the specific treatment of synchronization points.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Performance prediction of large parallel applications using parallel simulations

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 151-162页

作者： Bagrodia, Rajive Deelman, Ewa Docy, Steven Phan, Thomas Univ of California Los Angeles Los Angeles United States

Accurate simulation of large parallel applications can be facilitated with the use of direct execution and parallel discrete event simulation. This paper describes the use of COMPASS, a direct execution-driven, parallel simulator for performance prediction of programs that include both communication and I/O intensive applications. The simulator has been used to predict the performance of such applications on both distributed memory machines like the IBM SP and shared-memory machines like the SGI Origin 2000. The paper illustrates the usefulness of COMPASS as a versatile performance prediction tool. We use both real-world applications and synthetic benchmarks to study application scalability, sensitivity to communication latency, and the interplay between factors like communication pattern and parallel file system caching on application performance. We also show that the simulator is accurate in its predictions and that it is also efficient in its ability to use parallel simulation to reduce its own execution time which, in some cases, has yielded a near-linear speedup.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Basic compiler algorithms for parallel programs

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 1-12页

作者： Lee, Jaejin Padua, David A. Midkiff, Samuel P. Univ of Illinois Urbana United States

Traditional compiler techniques developed for sequential programs do not guarantee the correctness (sequential consistency) of compiler transformations when applied to parallel programs. This is because traditional compilers for sequential programs do not account for the updates to a shared variable by different threads. We present a concurrent static single assignment (CSSA) form for parallel programs containing cobegin/coend and parallel do constructs and post/wait synchronization primitives. Based on the CSSA form, we present copy propagation and dead code elimination techniques. Also, a global value numbering technique that detects equivalent variables in parallel programs is presented. By using global value numbering and the CSSA form, we extend classical common subexpression elimination, redundant load/store elimination, and loop invariant detection to parallel programs without violating sequential consistency. These optimization techniques are the most commonly used techniques for sequential programs. By extending these techniques to parallel programs, we can guarantee the correctness of the optimized program and maintain single processor performance in a multiprocessor environment.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

SUIF Explorer: An interactive and interprocedural parallelizer

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 37-48页

作者： Liao, Shih-Wei Diwan, Amer Bosch Jr., Robert P. Ghuloum, Anwar Lam, Monica S. Stanford Univ United States

The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in parallelizing many coarse-grain loops, thus minimizing the number of spurious dependences requiting attention. Second, the system uses dynamic execution analyzers to identify those important loops that are likely to be parallelizable. Third, the SUIF Explorer is the first to apply program slicing to aid programmers in interactive parallelization. The system guides the programmers in the parallelization process using a set of sophisticated visualization technique. This paper demonstrates the effectiveness of the SUIF Explorer with three case studies. The programmer was able to speed up all three programs by examining only a small fraction of the program and privatizing a few variables.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：