检索结果-内蒙古大学图书馆

Proceedings of the 19th acm SIGPLAN symposium on principles and practice of parallel programming

作者： Kunle Olukotun Stanford

No abstract available.

ISBN: (纸本)9781450326568

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Automatic node selection for high performance applications on networks

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm SIGPLAN symposium on principles and practice of parallel programming, PPOPP 1999年 163-172页

作者： Subhlok, Jaspal Lieu, Peter Lowekamp, Bruce Univ of Houston Houston United States

A central problem in executing performance critical parallel and distributed applications on shared networks is the selection of computation nodes and communication paths for execution. Automatic selection of nodes is complex as the best choice depends on the application structure as well as the expected availability of computation and communication resources. this paper presents a solution to this problem for realistic application and network scenarios. A new algorithm to jointly analyze computation and communication resources for different application demands is introduced and a framework for automatic node selection is developed on top of Remos, which is a query interface to network information. the paper reports results from a set of applications, including Airshed pollution modeling and magnetic resonance imaging, executing on a high speed network testbed. the results demonstrate that node selection is effective in enhancing application performance in the presence of computation load as well as network traffic. Under the network conditions used for experiments, the increase in execution time due to compute loads and network congestion was reduced by half with node selection. the node selection algorithms developed in this research are also applicable to dynamic migration of long running jobs.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Evaluation of predicated array data-flow analysis for automatic parallelization

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm SIGPLAN symposium on principles and practice of parallel programming, PPOPP 1999年 84-95页

作者： Moon, Sungdo Hall, Mary W. Univ of Southern California Marina del Rey United States

this paper presents an evaluation of a new analysis for parallelizing compilers called predicated array data-flow analysis. this analysis extends array data-flow analysis for parallelization and privatization to associate predicates with data-flow values. these predicates can be used to derive conditions under which dependences can be eliminated or privatization is possible. these conditions can be used both to enhance compile-time analysis and to introduce run-time tests that guard safe execution of a parallelized version of a computation. As compared to previous work that combines predicates with array data-flow analysis, our approach is distinguished by two features: (1) it derives low-cost, run-time parallelization tests;and, (2) it incorporates predicate embedding and predicate extraction, which translate between the domain of predicates and data-flow values to derive more precise analysis results. We present extensive experimental results across three benchmark suites and one additional program, demonstrating that predicated array data-flow analysis parallelizes more than 40% of the remaining inherently parallel loops left unparallelized by the SUIF compiler and that it yields improved speedups for 5 programs.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Stackthreads/MP: Integrating futures into calling standards

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm SIGPLAN symposium on principles and practice of parallel programming, PPOPP 1999年 60-71页

作者： Taura, Kenjiro Tabata, Kunio Yonezawa, Akinori Univ of Tokyo Tokyo Japan

An implementation scheme of fine-grain multithreading that needs no changes to current calling standards for sequential languages and modest extensions to sequential compilers is described. Like previous similar systems, it performs an asynchronous call as if it were an ordinary procedure call, and detaches the callee from the caller when the callee suspends or either of them migrates to another processor. Unlike previous similar systems, it detaches and connects arbitrary frames generated by off-the-shelf sequential compilers obeying calling standards. As a consequence, it requires neither a frontend preprocessor nor a native code generator that has a builtin notion of parallelism. the system practically works with unmodified GNU C compiler (GCC). Desirable extensions to sequential compilers for guaranteeing portability and correctness of the scheme are clarified and claimed modest. Experiments indicate that sequential performance is not sacrificed for practical applications and both sequential and parallel performance are comparable to Cilk, whose current implementation requires a fairly sophisticated preprocessor to C. these results show that efficient asynchronous calls (a.k.a. future calls) can be integrated into current calling standard with a very small impact both on sequential performance and compiler engineering.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines

Proceedings of the ACM SIGPLAN Symposium on Principles and P...

引用

Proceedings of the acm SIGPLAN symposium on principles and practice of parallel programming, PPOPP 1999年 107-118页

作者： Tang, Hong Shen, Kai Yang, Tao Univ of California Santa Barbara CA United States

MPI is a message-passing standard widely used for developing high-performance parallel applications. Because of the restriction in the MPI computation model, conventional implementations on shared memory machines map each MPI node to an OS process, which suffers serious performance degradation in the presence of multiprogramming, especially when a space/time sharing policy is employed in OS job scheduling. In this paper, we study compile-time and run-time support for MPI by using threads and demonstrate our optimization techniques for executing a large class of MPI programs written in C. the compile-time transformation adopts thread-specific data structures to eliminate the use of global and static variables in C code. the run-time support includes an efficient point-to-point communication protocol based on a novel lock-free queue management scheme. Our experiments on an SGI Origin 2000 show that our MPI prototype called TMPI using the proposed techniques is competitive with SGI's native MPI implementation in a dedicated environment, and it has significant performance advantages with up to a 23-fold improvement in a multiprogrammed environment.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Session details: parallel algorithms 08

Session details: Parallel algorithms

引用

Proceedings of the 13th acm SIGPLAN symposium on principles and practice of parallel programming

作者： Greg Bronevetsky Lawrence Livermore National Laboratory

No abstract available.

ISBN: (纸本)9781595937957

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Performance without pain = Productivity data layout and collective communication in UPC

Performance without pain = Productivity data layout and coll...

引用

作者： Nishtala, Rajesh Almási, George Caşcaval, CǍlin Computer Science Division University of California at Berkeley Berkeley CA United States IBM T.J. Watson Research Center Yorktown Heights NY United States

ISBN: (纸本)9781595939609

the next generations of supercomputers are projected to have hundreds of thousands of processors. However, as the numbers of processors grow, the scalability of applications will be the dominant challenge. this forces us to reexamine some of our fundamental ways that we approach the design and use of parallel languages and runtime systems. In this paper we show how the globally shared arrays in a popular Partitioned Global Address Space (PGAS) language, Unified parallel C (UPC), can be combined with a new collective interface to improve both performance and scalability. this interface allows subsets, or teams, of threads to perform a collective together. As opposed to MPI's communicators, our interface allows set of threads to be placed in teams instantly rather than explicitly constructing communicators, thus allowing for a more dynamic team construction and manipulation. We motivate our ideas with three application kernels: Dense Matrix Multiplication, Dense Cholesky factorization and multidimensional Fourier transforms. We describe how the three aforementioned applications can be succinctly written in UPC thereby aiding productivity. We also show how such an interface allows for scalability by running on up to 16,384 processors on the BlueGene/L. In a few lines of UPC code, we wrote a dense matrix multiply routine achieves 28.8 TFlop/s and a 3D FFT that achieves 2.1 TFlop/s. We analyze our performance results through models and show that the machine resources rather than the interfaces themselves limit the performance. Copyright © 2008 acm.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Session details: parallel applications 07

Session details: Parallel applications

引用

Proceedings of the 12th acm SIGPLAN symposium on principles and practice of parallel programming

作者： P. Sadayappan Ohio State University

No abstract available.

ISBN: (纸本)9781595936028

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Session details: programming model extensions 08

Session details: Programming model extensions

引用

Proceedings of the 13th acm SIGPLAN symposium on principles and practice of parallel programming

作者： Lauren Smith U.S. Department of Defense

No abstract available.

ISBN: (纸本)9781595937957

No abstract available.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Exploiting task-level concurrency in a programmable network interface

引用

acm SIGPLAN NOTICES 2003年第10期38卷 61-72页

作者： Kim, HY Pai, VS Rixner, S Rice Univ Houston TX 77251 USA

Programmable network interfaces provide the potential to extend the functionality of network services but lead to instruction processing overheads when compared to application-specific network interfaces. this paper aims to offset those performance disadvantages by exploiting task-level concurrency in the workload to parallelize the network interface firmware for a programmable controller with two processors. By carefully partitioning the handler procedures that process various events related to the progress of a packet, the system can minimize sharing, achieve load balance, and efficiently utilize on-chip storage. Compared to the uniprocessor firmware released by the manufacturer, the parallelized network interface firmware increases throughput by 65% for bidirectional UDP traffic of maximum-sized packets, 157% for bidirectional UDP traffic of minimum-sized packets, and 32-107% for real network services. this parallelization results in performance within 10-20% of a modem ASIC-based network interface for real network services.

关键词： experimentation, performance programmable network interface parallel programming ethernet firmware

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：