Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. this may in some cases also include the cover...
Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. this may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original conference proceedings.
this paper describes an approach for automatically generating optimized parallel code from serial Fortran programs annotated withhighlevel directives. A preprocessor analyzes boththe program and the directives and ...
详细信息
ISBN:
(纸本)0769521517
this paper describes an approach for automatically generating optimized parallel code from serial Fortran programs annotated withhighlevel directives. A preprocessor analyzes boththe program and the directives and generates efficient parallel Fortran code that runs on a number of parallel architectures, such as clusters or SMPs. the unique aspect of this approach is that the directives and optimizations can be customized and extended by the expert programmers who would be using them in their applications. this approach enables the creation of parallel extensions to Fortran that are specific to individual applications or science domains.
the CCA is a component architecture for high-performance scientific applications. In this architecture components are parallel entities that are connected directly or in a distributed manner. the problem of communicat...
详细信息
ISBN:
(纸本)0769521517
the CCA is a component architecture for high-performance scientific applications. In this architecture components are parallel entities that are connected directly or in a distributed manner. the problem of communication between scientific parallel programs with differing numbers of processes is called the "MxN problem". this paper discusses problems and solutions regarding the MxN problem in the context of the CCA. We also present a prototype implementation of a distributed CCA framework with MxN capabilities. this implementation reuses many MPI concepts and constructions to build the parallel-remote port invocation mechanism. Leveraging MPI helps developers that are familiar withthat communication library and benefits from its performance and high degree of scalability.
this paper introduces a new, high-levelparallelprogramming construct called MultiLoop that is designed to extend existing imperative languages such as C and Java. A MultiLoop statement translates to SPMD specificati...
详细信息
ISBN:
(纸本)0769521517
this paper introduces a new, high-levelparallelprogramming construct called MultiLoop that is designed to extend existing imperative languages such as C and Java. A MultiLoop statement translates to SPMD specification of a named group of synchronous-iterative processes. For efficient iterative communication, MultiLoop provides a new publish/subscribe model of shared variable access. Under this model the sequential consistency of shared memory is maintained by a new, simple and efficient adaptation of virtual time paradigm. Virtual time is a localised message tagging and queuing procedure that provides a highly efficient alternative to barrier calls. ML-C, a prototype implementation based on C has been developed. We describe the programming model, discuss its implementation and present some empirical data showing good performance obtained for an example of the target class of applications.
parallel/Distributed application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind Of application development cycle. this means that de...
详细信息
ISBN:
(纸本)0769521517
parallel/Distributed application development is a very difficult task for non-expert programmers, and therefore support tools are needed for all phases of this kind Of application development cycle. this means that developing applications using predefined programming structures (frameworks) should be easier than doing it from scratch. We propose to take advantage of the knowledge about the structure of the application in order to develop a dynamic and automatic tuning tool. In this sense, we have designed POETRIES, which is a dynamic performance tuning tool based on the idea that a performance model could be associated to the high-level structure of the application. this way, the tool could efficiently make better tuning decisions. Specifically, we focus this work on the definition of the performance model associated to applications developed withthe Master-Worker framework.
In the era of future embedded systems the designer is confronted with multi-processor systems both for performance and energy reasons. Exploiting (sub)task-levelparallelism is becoming crucial because the instruction...
详细信息
ISBN:
(纸本)0769521517
In the era of future embedded systems the designer is confronted with multi-processor systems both for performance and energy reasons. Exploiting (sub)task-levelparallelism is becoming crucial because the instruction-levelparallelism alone is insufficient. the challenge is to build compiler tools that support the exploration of the task-levelparallelism in the programs. To achieve this goal, we have designed an analysis framework to evaluate the potential parallelism from sequential object-oriented programs. parallel-performance and data-access analysis are the crucial techniques for estimation of the transformation effects. We have implemented support for platform-independent data-access analysis and profiling of Java programs, which is an extension to our earlier parallel-performance analysis framework. the toolkit comprises automated design-time analysis for performance and data-access characterisation, program instrumentation, program-profiling support and post-processing analysis. We demonstrate the usability of our approach on a number of realistic Java applications.
the pipeline is a simple and intuitive structure to speed up many problems. Novice parallel programmers are usually taught this structure early on. However, expert parallel programmers typically eschew using the pipel...
详细信息
ISBN:
(纸本)0769521517
the pipeline is a simple and intuitive structure to speed up many problems. Novice parallel programmers are usually taught this structure early on. However, expert parallel programmers typically eschew using the pipeline in coarse-grained applications because it has three serious problems that make it difficult to implement efficiently. First, processors are idle when the pipeline is not full. Second, load balancing is crucial to obtaining good speedup. third, it is difficult to incrementally incorporate more processors into an existing pipeline. Instead, experts recast the problem as a master/slave structure which does not suffer from these problems. this paper details a transformation that allows programs written in a pipeline style to execute using the master/slave structure. parallel programmers can benefit from boththe intuitive simplicity of the pipeline and the efficient execution of a master/slave structure. this is demonstrated by performance results from two applications.
暂无评论