检索结果-内蒙古大学图书馆

Proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 96-106页

作者： Scherer, Alex Lu, Honghui Gross, thomas Zwaenepoel, Willy ETH Zurich Zurich Switzerland

We present a system that allows OpenMP programs to execute on a network of workstations with a variable number of nodes. the ability to adapt to a variable number of nodes allows a program to take advantage of additional nodes that become available after it starts execution, or to gracefully scale down when the number of available nodes is reduced. We demonstrate that the cost of adaptation is modest;the system allows a program to adapt at a moderate rate without much performance loss. Two ideas underlie the efficiency of our design. First, we recognize that OpenMP programs exhibit convenient adaptation points during their execution, points at which the cost of adaptation can be much reduced. Second, by allowing a process a certain grace period before it must leave a node, we insure that most adaptations can occur at these adaptation points, and thus at low cost. Migration of a process, a much more expensive method for providing adaptivity, is used only as a back-up solution, when the process cannot reach an adaptation point within the grace period. Our implementation consists of an OpenMP pre-processor that generates TreadMarks distributed shared memory (DSM) programs, and a version of TreadMarks modified to adapt to a variable number of nodes. Using a DSM as the underlying substrate facilitates the data (re-)distribution necessary after an adaptation.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

Proceedings of the 1997 6th ACM SIGPLAN Symposium on Princip...

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

the proceedings contains 25 papers. Topics discussed include data and task parallelism, irregular applications, coherence protocols, shared memory, compilers and performances issue.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 5th acm sigplan symposium on principles and practice of parallel programming

Proceedings of the 5th ACM SIGPLAN Symposium on Principles a...

引用

Proceedings of the 5th acm sigplan symposium on principles and practice of parallel programming

the proceedings contains 21 papers from the Fifth acm sigplan symposium on principles & practice of parallel programming PPOPP. Topics discussed include data parallel programs;data libraries;data caches;data acces... 详细信息

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Automatic placement of communications in mesh-partitioning parallelization 97

Automatic placement of communications in mesh-partitioning p...

引用

6th acm sigplan symposium on principles and practice of parallel programming

作者： Hascoet, L INRIA Sophia-Antipolis B.P. 93 06902 Sophia-Antipolis France

ISBN: (纸本)9780897919067

We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. this conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries of the sub-meshes. the program is then run in SPMD mode on a parallel architecture with distributed memory. It is necessary to add calls to communication routines at a few carefully selected locations in the code. the tool presented here uses the data-dependence information to mechanize the placement of these synchronizations. Additionally, we see that there is not a unique solution for placing these synchronizations, and performance depends on this choice.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Space and time efficient execution of parallel irregular computations

Space and time efficient execution of parallel irregular com...

引用

6th acm sigplan symposium on principles and practice of parallel programming

作者： Fu, C Yang, T Univ of California Santa Barbara United States

Solving problems of large sizes is an important goal for parallel machines with multiple CPU and memory resources. In this paper, issues of efficient execution of overhead-sensitive parallel irregular computation under memory constraints are addressed. the irregular parallelism is modeled by task dependence graphs with mixed granularities. the trade-off in achieving both time and space efficiency is investigated. the main difficulty of designing efficient run-time system support is caused by the use of fast communication primitives available on modern parallel architectures. A run-time active memory management scheme and new scheduling techniques are proposed to improve memory utilization while retaining good time efficiency, and a theoretical analysis on correctness and performance is provided. this work is implemented in the context of RAPID system [5] which provides run-time support for parallelizing irregular code on distributed memory machines and the effectiveness of the proposed techniques is verified on sparse Cholesky and LU factorization with partial pivoting. the experimental results on Cray-T3D show that solvable problem sizes can be increased substantially under limited memory capacities and the loss of execution efficiency caused by the extra memory managing overhead is reasonable.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Shared-memory performance profiling 97

Shared-memory performance profiling

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Xu, Zhichen Larus, James R. Miller, Barton P. Univ of Wisconsin Madison WI United States

ISBN: (纸本)9780897919067

this paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn parallel Performance Tools running with the Blizzard fine-grain distributed shared memory system. this approach exploits the underlying system's cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in a data-centric manner. As a demonstration, Paradyn helped us improve the performance of a new shared-memory application program by a factor of four.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel breadth-first BDD construction

Parallel breadth-first BDD construction

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Yang, Bwolen O'Hallaron, David R. Carnegie Mellon Univ Pittsburgh United States

With the increasing complexity of protocol and circuit designs, formal verification has become an important research area and binary decision diagrams (BDDs) have been shown to be a powerful tool in formal verification. this paper presents a parallel algorithm for BDD construction targeted at shared memory multiprocessors and distributed shared memory systems. this algorithm focuses on improving memory access locality through specialized memory managers and partial breadth-first expansion, and on improving processor utilization through dynamic load balancing. the results on a shared memory system show speedups of over two on four processors and speedups of up to four on eight processors. the measured results clearly identify the main source of bottlenecks and point out some interesting directions for further improvements.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Space-efficient implementation of nested parallelism

Space-efficient implementation of nested parallelism

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Narlikar, Girija J. Blelloch, Guy E. CMU Sch of Computer Science Pittsburgh United States

Many of today's high level parallel languages support dynamic, fine-grained parallelism. these languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. this paper presents a scheduling algorithm that is provably space-efficient and time-efficient for nested parallel languages. In addition to proving the space and time bounds of the parallel schedule generated by the algorithm, we demonstrate that it is efficient in practice. We have implemented a runtime system that uses our algorithm to schedule parallel threads. the results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

High performance Fortran for highly irregular problems

High performance Fortran for highly irregular problems

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Hu, Y.Charlie Johnsson, S.Lennart Teng, Shang-Hua Harvard Univ Cambridge United States

We present a general data parallel formulation for highly irregular problems in High Performance Fortran (HPF). Our formulation consists of (1) a method for linearizing irregular data structures (2) a data parallel implementation (in HPF) of graph partitioning algorithms applied to the linearized data structure, (3) techniques for expressing irregular communication and nonuniform computations associated with the elements of linearized data structures. We demonstrate and evaluate our formulation on a parallel, hierarchical N-body method for the evaluation of potentials and forces of nonuniform particle distributions. Our experimental results demonstrate that efficient data parallel (HPF) implementations of highly nonuniform problems are feasible with the proper language/compiler/runtime support. Our data parallel N-body code provides a much needed 'benchmark' code for evaluating and improving HPF compilers.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Compilation of parallel multimedia computations - extending retiming theory and Amdahl's law 97

Compilation of parallel multimedia computations - extending ...

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Prasanna, G.N.Srinivasa Lucent Technologies Murray Hill NJ United States

ISBN: (纸本)9780897919067

Multimedia applications operate on downstreams. A large class of multimedia applications is described by the macro-dataflow graph model. this study attempted to examine how such multimedia applications can be compiled to run efficiently on parallel machines, by optimizing both throughput (T) and latency (L), using two techniques based on task speedup functions. the first step chooses an appropriate pipeline structure for the system while the second exploits the dataset parallelism intrinsic in the period datastream, and runs multiple datasets in parallel (task/cluster multiplicity) for each clustering. Both techniques were used to compile real-time image-processing problems on an NCUBE-2 multiprocessor. the two techniques showed substantial performance gains.

关键词： Program compilers

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：