检索结果-内蒙古大学图书馆

Proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 173-182页

作者： Maassen, Jason van Nieuwpoort, Rob Veldema, Ronald Bal, Henri E. Plaat, Aske Vrije Universiteit Amsterdam Netherlands

Java offers interesting opportunities for parallel computing. In particular, Java Remote Method Invocation provides an unusually flexible kind of Remote Procedure Call. Unlike RPC, RMI supports polymorphism, which requires the system to be able to download remote classes into a running application. Sun's RMI implementation achieves this kind of flexibility by passing around object type information and processing it at run time, which causes a major run time overhead. Using Sun's JDK 1.1.4 on a Pentium Pro/Myrinet cluster, for example, the latency for a null RMI (without parameters or a return value) is 1228 μsec, which is about a factor of 40 higher than that of a user-level RPC. In this paper, we study an alternative approach for implementing RML based on native compilation. this approach allows for better optimization, eliminate the need for processing of type information at run time, and makes a light weight communication protocol possible. We have built a Java system based on a native compiler, which supports both compile time and run time generation of marshallers. We find that almost all of the run time overhead of RMI can be pushed to compile time. With this approach, the latency of a null RMI is reduced to 34 μsec, while still supporting polymorphic RMIs (and allowing interoperability with other JVMs).

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Space-time memory: A parallel programming abstraction for interactive multimedia applications

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 183-192页

作者： Ramachandran, Umakishore Nikhil, Rishiyur S. Harel, Nissim Rehg, James M. Knobe, Kathleen Georgia Inst of Technology Atlanta United States

Realistic interactive multimedia involving vision, animation, and multimedia collaboration is likely to become an important aspect of future computer applications. the scalable parallelism inherent in such applications coupled with their computational demands make them ideal candidates for SMPs and clusters of SMPs. these applications have novel requirements that offer new kinds of challenges for parallel system design. We have designed a programming system called Stampede that offers many functionalities needed to simplify development of such applications (such as high-level data sharing abstractions, dynamic cluster-wide threads, and multiple address spaces). We have built Stampede and it runs on clusters of SMPs. To date we have implemented two applications on Stampede, one of which is discussed herein. In this paper we describe a part of Stampede called Space-Time Memory (STM). It is a novel data sharing abstraction that enables interactive multimedia applications to manage a collection of time-sequenced data items simply, efficiently, and transparently across a cluster. STM relieves the application programmer from low level synchronization and data communication by providing a high level interface that subsumes buffer management, inter-thread synchronization, and location transparency for data produced and accessed anywhere in the cluster. STM also automatically handles garbage collection of data items that will no longer be accessed by any of the application threads. We discuss ease of use issues for developing applications using STM, and present preliminary/performance results to show that STM's overhead is low.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Transparent adaptive parallelism on NOWs using OpenMP

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 96-106页

作者： Scherer, Alex Lu, Honghui Gross, thomas Zwaenepoel, Willy ETH Zurich Zurich Switzerland

We present a system that allows OpenMP programs to execute on a network of workstations with a variable number of nodes. the ability to adapt to a variable number of nodes allows a program to take advantage of additional nodes that become available after it starts execution, or to gracefully scale down when the number of available nodes is reduced. We demonstrate that the cost of adaptation is modest;the system allows a program to adapt at a moderate rate without much performance loss. Two ideas underlie the efficiency of our design. First, we recognize that OpenMP programs exhibit convenient adaptation points during their execution, points at which the cost of adaptation can be much reduced. Second, by allowing a process a certain grace period before it must leave a node, we insure that most adaptations can occur at these adaptation points, and thus at low cost. Migration of a process, a much more expensive method for providing adaptivity, is used only as a back-up solution, when the process cannot reach an adaptation point within the grace period. Our implementation consists of an OpenMP pre-processor that generates TreadMarks distributed shared memory (DSM) programs, and a version of TreadMarks modified to adapt to a variable number of nodes. Using a DSM as the underlying substrate facilitates the data (re-)distribution necessary after an adaptation.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Code motion for explicitly parallel programs 99

Code motion for explicitly parallel programs

引用

Proceedings of the seventh acm sigplan symposium on principles and practice of parallel programming

作者： Jens Knoop Bernhard Steffen Fachbereich Informatik Universität Dortmund Baroper Straβe 301 D-44227 Dortmund Germany

ISBN: (纸本)9781581131000

In comparison to automatic parallelization, which is thoroughly studied in the literature [31, 33], classical analyses and optimizations of explicitly parallel programs were more or less neglected. this may be due to the fact that naive adaptations of the sequential techniques fail [24], and their straightforward correct ones have unacceptable costs caused by the interleavings, which manifest the possible executions of a parallel program. Recently, however, we showed that unidirectional bitvector analyses can be performed for parallel programs as easily and as efficiently as for sequential ones [17], a necessary condition for the successful transfer of the classical optimizations to the parallel *** this article we focus on possible subsequent code motion transformations, which turn out to require much more care than originally conjectured [17]. Essentially, this is due to the fact that interleaving semantics, although being adequate for correctness considerations, fails when it comes to reasoning about efficiency of parallel programs. this deficiency, however, can be overcome by strengthening the specific treatment of synchronization points.

关键词： code motion (partial redundancy elimination) interleaving semantics shared memory bitvector problems synchronization code optimization data-flow analysis

来源：评论

学校读者我要写书评

暂无评论

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

Proceedings of the 1997 6th ACM SIGPLAN Symposium on Princip...

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

the proceedings contains 25 papers. Topics discussed include data and task parallelism, irregular applications, coherence protocols, shared memory, compilers and performances issue.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Automatic placement of communications in mesh-partitioning parallelization 97

Automatic placement of communications in mesh-partitioning p...

引用

6th acm sigplan symposium on principles and practice of parallel programming

作者： Hascoet, L INRIA Sophia-Antipolis B.P. 93 06902 Sophia-Antipolis France

ISBN: (纸本)9780897919067

We present a tool for mesh-partitioning parallelization of numerical programs working iteratively on an unstructured mesh. this conventional method splits a mesh into sub-meshes, adding some overlap on the boundaries of the sub-meshes. the program is then run in SPMD mode on a parallel architecture with distributed memory. It is necessary to add calls to communication routines at a few carefully selected locations in the code. the tool presented here uses the data-dependence information to mechanize the placement of these synchronizations. Additionally, we see that there is not a unique solution for placing these synchronizations, and performance depends on this choice.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Space and time efficient execution of parallel irregular computations

Space and time efficient execution of parallel irregular com...

引用

6th acm sigplan symposium on principles and practice of parallel programming

作者： Fu, C Yang, T Univ of California Santa Barbara United States

Solving problems of large sizes is an important goal for parallel machines with multiple CPU and memory resources. In this paper, issues of efficient execution of overhead-sensitive parallel irregular computation under memory constraints are addressed. the irregular parallelism is modeled by task dependence graphs with mixed granularities. the trade-off in achieving both time and space efficiency is investigated. the main difficulty of designing efficient run-time system support is caused by the use of fast communication primitives available on modern parallel architectures. A run-time active memory management scheme and new scheduling techniques are proposed to improve memory utilization while retaining good time efficiency, and a theoretical analysis on correctness and performance is provided. this work is implemented in the context of RAPID system [5] which provides run-time support for parallelizing irregular code on distributed memory machines and the effectiveness of the proposed techniques is verified on sparse Cholesky and LU factorization with partial pivoting. the experimental results on Cray-T3D show that solvable problem sizes can be increased substantially under limited memory capacities and the loss of execution efficiency caused by the extra memory managing overhead is reasonable.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Shared-memory performance profiling 97

Shared-memory performance profiling

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Xu, Zhichen Larus, James R. Miller, Barton P. Univ of Wisconsin Madison WI United States

ISBN: (纸本)9780897919067

this paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn parallel Performance Tools running with the Blizzard fine-grain distributed shared memory system. this approach exploits the underlying system's cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in a data-centric manner. As a demonstration, Paradyn helped us improve the performance of a new shared-memory application program by a factor of four.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel breadth-first BDD construction

Parallel breadth-first BDD construction

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Yang, Bwolen O'Hallaron, David R. Carnegie Mellon Univ Pittsburgh United States

With the increasing complexity of protocol and circuit designs, formal verification has become an important research area and binary decision diagrams (BDDs) have been shown to be a powerful tool in formal verification. this paper presents a parallel algorithm for BDD construction targeted at shared memory multiprocessors and distributed shared memory systems. this algorithm focuses on improving memory access locality through specialized memory managers and partial breadth-first expansion, and on improving processor utilization through dynamic load balancing. the results on a shared memory system show speedups of over two on four processors and speedups of up to four on eight processors. the measured results clearly identify the main source of bottlenecks and point out some interesting directions for further improvements.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Space-efficient implementation of nested parallelism

Space-efficient implementation of nested parallelism

引用

Proceedings of the 1997 6th acm sigplan symposium on principles and practice of parallel programming

作者： Narlikar, Girija J. Blelloch, Guy E. CMU Sch of Computer Science Pittsburgh United States

Many of today's high level parallel languages support dynamic, fine-grained parallelism. these languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. this paper presents a scheduling algorithm that is provably space-efficient and time-efficient for nested parallel languages. In addition to proving the space and time bounds of the parallel schedule generated by the algorithm, we demonstrate that it is efficient in practice. We have implemented a runtime system that uses our algorithm to schedule parallel threads. the results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：