This paper presents an extension of a library for the Coq interactive theorem prover that enables the development of correct functional parallel programs based on sequential program transformation and automatic parall...
详细信息
ISBN:
(纸本)9781538620878
This paper presents an extension of a library for the Coq interactive theorem prover that enables the development of correct functional parallel programs based on sequential program transformation and automatic parallelization using an algorithmic skeleton named accumulate. Such an algorithmic skeleton is a pattern of a parallel algorithm that is provided as a high-order function implemented in parallel. The use of this framework is illustrated with the bracket matching problem, including experiments on a parallel machine.
Although the field of datacenter computing is arguably still in its relative infancy, a sizable body of work from both academia and industry is already available and some consistent technological trends have begun to ...
详细信息
Although the field of datacenter computing is arguably still in its relative infancy, a sizable body of work from both academia and industry is already available and some consistent technological trends have begun to emerge. This special issue presents a small sample of the work underway by researchers and professionals in this new field. The selection of articles presented reflects the key role that hardware-software codesign plays in the development of effective datacenter-scale computer systems.
Large-scale parallel codes require the data to be decomposed between the set of processes active in the computation. This data decomposition implies recurring communication schemes. T he paper introduces generic templ...
详细信息
Large-scale parallel codes require the data to be decomposed between the set of processes active in the computation. This data decomposition implies recurring communication schemes. T he paper introduces generic template classes in C++ for describing the data decomposition. The aim is to store the data in arbitrary existent efficient sequential data structures. Each entry in the sequential data structure corresponds to an entry in the virtual global view of the container. Once the decomposition is setup the needed communication schemes can be created automatically and can be used to communicate values from containers of various types. Even containers with a varying number of values associated with an entry are possible. The framework abstracts the decomposition information and the communication in the client code from the eventual parallel paradigm choice. A prototype based on Message Passing Interface standard is presented. It relieves the user from specifying information that is already known at compile time.
We propose a novel execution model for the implicitly parallel execution of dataparallel programs in the presence of general I/O operations. This model is called hybrid because it combines the advantages of the stand...
详细信息
We propose a novel execution model for the implicitly parallel execution of dataparallel programs in the presence of general I/O operations. This model is called hybrid because it combines the advantages of the standard execution models fork/join and SPMD. Based on program analysis the hybrid model adapts itself to one or the other on the granularity of individual instructions. We outline compilation techniques that systematically derive the organization of parallel code from data flow characteristics aiming at the reduction of execution mode switches in general and synchronization/communication requirements in particular. Experiments based on a prototype implementation show the effectiveness of the hybrid execution model for reducing parallel overhead.
We give an in-depth introduction to the design of our functional array programming language SAC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The langua...
详细信息
We give an in-depth introduction to the design of our functional array programming language SAC, the main aspects of its compilation into host machine code, and its parallelisation based on multi-threading. The language design of SAC aims at combining high-level, compositional array programming with fully automatic resource management for highly productive code development and maintenance. We outline the compilation process that maps SAC programs to computing machinery. Here, our focus is on optimisation techniques that aim at restructuring entire applications from nested compositions of general fine-grained operations into specialised coarse-grained operations. We present our implicit parallelisation technology for shared memory architectures based on multi-threading and discuss further optimisation opportunities on this level of code generation. Both optimisation and parallelisation rigorously exploit the absence of side-effects and the explicit data flow characteristic of a functional setting.
Fortran 90 provides a rich set of array intrinsic functions that are useful for representing array expressions and data parallel programming. However, the application of these intrinsic functions to sparse data sets i...
详细信息
Fortran 90 provides a rich set of array intrinsic functions that are useful for representing array expressions and data parallel programming. However, the application of these intrinsic functions to sparse data sets in distributed memory environments, is currently not supported by vendors of Fortran 90 and HPF compilers. Our recent research work has been aimed at, providing parallel processing supports for sparse array intrinsics of Fortran 90. Our supporting library uses the following two-level design: (1) in our low-level routines, a sparse input matrix needs to be specified with compression/distribution schemes by programmers, and (2) in the high-level representation, sparse array functions are overloaded for array intrinsic interfaces so that programmers need not be concerned about low-level details. This raises a very interesting optimization problem in the strategies used to transform high-level representations to low-level routines by the automatic selection of distribution and compression schemes for sparse data sets. In this paper, we propose solutions to address this optimization problem, which is shown to be NP-hard. We develop a heuristic algorithm based on annotated program graphs. To the best of our knowledge, our selection scheme, is the first to automatically select compression and distribution schemes for sparse data arrays with the array intrinsics of Fortran 90 in a distributed memory environment. Experimental results show that our selection algorithms are consistent with our cost model, and effective in selecting appropriate compression and distribution schemes for improving the performance of application programs that operate on sparse data sets. Our experiments were performed on an IBM SP-2 machine using our parallel sparse array intrinsics for Fortran 90. (C) 2004 Published by Elsevier B.V.
We developed a theory in order to address crucial questions of program design methodology. This theory deals with data locality which is a main issue in parallelprogramming. In this article, we regard this theory and...
详细信息
We developed a theory in order to address crucial questions of program design methodology. This theory deals with data locality which is a main issue in parallelprogramming. In this article, we regard this theory and its model as a minimum semantic domain for dataparallel languages. The introduction of a semantic domain is justified because the classical dataparallel languages (HPF and C*) have different intuitive semantics: Indeed, they use different concepts in order to express data locality. These concepts are alignment in HPF and shape in C*. Consequently these two languages define their own balance between compiler and programmer investments in order to reach program efficiency. We present our theory as a foundation for defining a better balance. (C) 2003 Elsevier B.V. All rights reserved.
High Performance Fortran (HPF) offers an attractive high-level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and deleg...
详细信息
High Performance Fortran (HPF) offers an attractive high-level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and delegating to the compiler the task of generating an explicitly parallel program. Available HPF compilers can handle regular codes quite efficiently, but dramatic performance losses may be encountered for applications which are based on highly irregular, dynamically changing data structures and access patterns. In this paper we introduce the Vienna Fortran Compiler (VFC), a new source-to-source parallelization system for HPF+, an optimized version of HPF, which addresses the requirements of irregular applications. In addition to extended data distribution and work distribution mechanisms, HPF+ provides the user with language features for specifying certain information that decisively influence a program's performance. This comprises data locality assertions, non-local access specifications and the possibility of reusing runtime-generated communication schedules of irregular loops. Performance measurements of kernels from advanced applications demonstrate that with a high-level dataparallel language such as HPF+ a performance close to hand-written message-passing programs can be achieved even for highly irregular codes.
We present in this paper the strong points and limitations of semi-automatic parallelization, data parallel programming and message passing programming. We apply these on two numerical algorithms namely a bi-dimension...
详细信息
We present in this paper the strong points and limitations of semi-automatic parallelization, data parallel programming and message passing programming. We apply these on two numerical algorithms namely a bi-dimensional Fourier transform algorithm and a conjugate gradient programs. We implemented this program for each of the different methods on a Gray T3D. The results of these experiments demonstrate the accuracy of our proposition that when the three methods are combined, efficiency, portability and easiness of parallelprogramming may be achieved.
暂无评论