检索结果-内蒙古大学图书馆

Proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 163-172页

作者： Subhlok, Jaspal Lieu, Peter Lowekamp, Bruce Univ of Houston Houston United States

A central problem in executing performance critical parallel and distributed applications on shared networks is the selection of computation nodes and communication paths for execution. Automatic selection of nodes is complex as the best choice depends on the application structure as well as the expected availability of computation and communication resources. this paper presents a solution to this problem for realistic application and network scenarios. A new algorithm to jointly analyze computation and communication resources for different application demands is introduced and a framework for automatic node selection is developed on top of Remos, which is a query interface to network information. the paper reports results from a set of applications, including Airshed pollution modeling and magnetic resonance imaging, executing on a high speed network testbed. the results demonstrate that node selection is effective in enhancing application performance in the presence of computation load as well as network traffic. Under the network conditions used for experiments, the increase in execution time due to compute loads and network congestion was reduced by half with node selection. the node selection algorithms developed in this research are also applicable to dynamic migration of long running jobs.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Protocol service decomposition for high-performance networking 93

Protocol service decomposition for high-performance networki...

引用

14th acm symposium on Operating Systems principles, SOSP 1993

作者： Maeda, Chris Bershad, Brian N. School of Computer Science Carnegie Mellon University 5000 Forbes Ave PittsburghPA United States Department of Computer Science and Engineering University of Washington SeattleWA98195 United States

ISBN: (纸本)9780897916325

In this paper we describe a new approach to implementing network protocols that enables them to have high performance and high flexibility, while retaining complete conformity to existing application programming interfaces. the key insight behind our work is that an application's interface to the network is distinct and separable from its interface to the operating system. We have separated these interfaces for two protocol implementations, TCP/IP and UDP/IP, running on the Mach 3.0 operating system and UNIX server. Specifically, library code in the application's address space implements the network protocols and transfers data to and from the network, while an operating system server manages the heavyweight abstractions that applications use when manipulating the network through operations other than send and receive. On DECstation 5000/200 systems connected by 10Mb/sec Ethernet, this approach to protocol decomposition achieves TCP/IP throughput of 1088 KB/second, which is comparable to that of a high-quality in-kernel TCP/IP implementation, and substantially better than a server-based one. Our approach achieves small-packet UDP/IP round trip latencies of 1.23 ms, again comparable to a kernel-based implementation and more than twice as fast as a server-based one. © 1993 acm.

关键词： Application programming interfaces (API)

来源：评论

学校读者我要写书评

暂无评论

Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 107-118页

作者： Tang, Hong Shen, Kai Yang, Tao Univ of California Santa Barbara CA United States

MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, which suffers serious performance degradation in the presence of multiprogramming, especially when a space/time sharing policy is employed in OS job scheduling. In this paper, we study compile-time and run-time support for MPI by using threads and demonstrate our optimization techniques for executing a large class of MPI programs written in C. the compile-time transformation adopts thread-specific data structures to eliminate the use of global and static variables in C code. the run-time support includes an efficient point-to-point communication protocol based on a novel lock-free queue management scheme. Our experiments on an SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and it has significant performance advantages with up to a 23-fold improvement in a multiprogrammed environment.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Using generative design patterns to generate parallel code for a distributed memory environment

引用

acm sigplan NOTICES 2003年第10期38卷 202-214页

作者： Tan, K Szafron, D Schaeffer, J Anvik, J MacDonald, S Univ Alberta Dept Comp Sci Edmonton AB T6G 2E8 Canada Univ Waterloo Sch Comp Sci Waterloo ON N2L 3G1 Canada

A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. parallel design patterns reflect commonly occurring parallel communication and synchronization structures. Our tools, CO2P3S (Correct Object-Oriented Pattern-based parallel programming System) and MetaCO(2)P(3)S, use generative design patterns. A programmer selects the parallel design patterns that are appropriate for an application, and then adapts the patterns for that specific application by selecting from a small set of code-configuration options. CO2P3S then generates a custom framework for the application that includes all of the structural code necessary for the application to ran in parallel. the programmer is only required to write simple code that launches the application and to fill in some application-specific sequential hook routines. We use generative design patterns to take an application specification (parallel design patterns + sequential user code) and use it to generate parallel application code that achieves good performance in shared memory and distributed memory environments. Although our implementations are for Java, the approach we describe is tool and language independent. this paper describes generalizing CO2P3S to generate distributed-memory parallel solutions.

关键词： performance design reliability languages parallel programming design patterns frameworks programming tools

来源：评论

学校读者我要写书评

暂无评论

programming the FlexRAM parallel intelligent memory system

引用

acm sigplan NOTICES 2003年第10期38卷 49-60页

作者： Fraguela, BB Renau, J Feautrier, P Padua, D Torrellas, J Univ A Coruna Dept Elect & Sistemas Coruna Spain Univ Illinois Dept Comp Sci Urbana IL 61801 USA Ecole Normale Super Lyon LIP F-69364 Lyon France

In an intelligent memory architecture, the main memory of a computer is enhanced with many simple processors. the result is a highly-parallel, heterogeneous machine that is able to exploit computation in the main memory. While several instantiations of this architecture have been proposed, the question of how to effectively program them with little effort has remained a major challenge. In this paper, we show how to effectively hand-program an intelligent memory architecture at a high level and with very modest effort. We use FlexRAM as a prototype architecture. To program it, we propose a family of high-level compiler directives inspired by OpenMP called CFlex. Such directives enable the processors in memory to execute the program in cooperation with the main processor. In addition, we propose libraries of highly-optimized functions called Intelligent Memory Operations (IMOs). these functions program the processors in memory through CFlex, but make them completely transparent to the programmer. Simulation results show that, with CFlex and IMOs, a server with 64 simple processors in memory runs on average 10 times faster than a conventional server. Moreover, a set of conventional programs with 240 lines on average are transformed into CFlex parallel form with only 7 CFlex directives and 2 additional statements on average.

关键词： languages intelligent memory architecture compiler directives programming heterogeneous computers parallel languages

来源：评论

学校读者我要写书评

暂无评论

LogP: Towards a realistic model of parallel computation

LogP: Towards a realistic model of parallel computation

引用

Proceedings of the 4th acm sigplan symposium on principles & practice of parallel programming

作者： Culler, David Karp, Richard Patterson, David Sahay, Abhijit Schauser, Klaus Erik Santos, Eunice Subramonian, Ramesh von Eicken, thorsten Univ of California Berkeley United States

ISBN: (纸本)0897915895

A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. this paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. the model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. the utility of the model is demonstrated through examples that are implemented on the CM-5.

关键词： Program processors

来源：评论

学校读者我要写书评

暂无评论

Session details: parallel algorithms 08

Session details: Parallel algorithms

引用

Proceedings of the 13th acm sigplan symposium on principles and practice of parallel programming

作者： Greg Bronevetsky Lawrence Livermore National Laboratory

No abstract available.

ISBN: (纸本)9781595937957

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Session details: Session order 8: programming systems session 14

Session details: Session order 8: programming systems sessio...

引用

Proceedings of the 19th acm sigplan symposium on principles and practice of parallel programming

作者： Kunle Olukotun Stanford

No abstract available.

ISBN: (纸本)9781450326568

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Transparent adaptive parallelism on NOWs using OpenMP

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm sigplan symposium on principles and practice of parallel programming, PPOPP 1999年 96-106页

作者： Scherer, Alex Lu, Honghui Gross, thomas Zwaenepoel, Willy ETH Zurich Zurich Switzerland

We present a system that allows OpenMP programs to execute on a network of workstations with a variable number of nodes. the ability to adapt to a variable number of nodes allows a program to take advantage of additional nodes that become available after it starts execution, or to gracefully scale down when the number of available nodes is reduced. We demonstrate that the cost of adaptation is modest;the system allows a program to adapt at a moderate rate without much performance loss. Two ideas underlie the efficiency of our design. First, we recognize that OpenMP programs exhibit convenient adaptation points during their execution, points at which the cost of adaptation can be much reduced. Second, by allowing a process a certain grace period before it must leave a node, we insure that most adaptations can occur at these adaptation points, and thus at low cost. Migration of a process, a much more expensive method for providing adaptivity, is used only as a back-up solution, when the process cannot reach an adaptation point within the grace period. Our implementation consists of an OpenMP pre-processor that generates TreadMarks distributed shared memory (DSM) programs, and a version of TreadMarks modified to adapt to a variable number of nodes. Using a DSM as the underlying substrate facilitates the data (re-)distribution necessary after an adaptation.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Session details: parallel applications 07

Session details: Parallel applications

引用

Proceedings of the 12th acm sigplan symposium on principles and practice of parallel programming

作者： P. Sadayappan Ohio State University

No abstract available.

ISBN: (纸本)9781595936028

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：