UPC, or Unified parallel C, is a parallel extension of ANSI C. UPC is developed around the distributed shared-memory programming model with constructs that can allow programmers to exploit memory locality, by placing ...
详细信息
ISBN:
(纸本)0769512585
UPC, or Unified parallel C, is a parallel extension of ANSI C. UPC is developed around the distributed shared-memory programming model with constructs that can allow programmers to exploit memory locality, by placing data close to the threads that manipulate them in order to minimize remote accesses. Under the UPC memory sharing model, each thread owns a private memory, and has a logical association (affininty) with a partition of the shared memory. This paper discusses an early release of UPC_Bench, a benchmark designed to reveal UPC compilers performance weaknesses to uncover opportunities for compiler optimizations. The experimental results from UPC_Bench over the Compaq AlphaServer SC will show that UPC_Bench is capable of discovering such compiler performance problems. Further, it will show that if such performance pitfalls are avoided through compiler optimizations, distributed shared memory programming paradigms can result in high-performance, while the ease of programming is enjoyed*.
Our proposal has the following key features: 1) The separation of a distributed program into a pure algorithm (PurAl) and a distribution/communication declaration (DUAL). This yields flexible programs capable of handl...
详细信息
ISBN:
(纸本)354041729X
Our proposal has the following key features: 1) The separation of a distributed program into a pure algorithm (PurAl) and a distribution/communication declaration (DUAL). This yields flexible programs capable of handling different kinds of data/program distribution with no change to the pure algorithm. 2) Implicit or automatic handling of communication via externally mapped variables and generalizations of assignment and reference to these variables. This provides unified device independent view and processing of internal data and external distributed data at the user programming language level. 3) Programs need only know of the direct binds with distributed correspondents (mailbox driver, file manager, remote task, window manager etc.). This avoids the need for a central description of all the interconnections. The main short-range benefits of this proposal are to facilitate parallel computations. parallel programming is a fundamental challenge in computer science, nowadays. Improving these techniques will lead to simplify the programming, eliminate the communication statements, and unify the various communication by using an implicit method for the transfer of data which is becoming essential with the proliferation of distributed networked environment. We present 2 experiments of separation between PurAl and DUAL, using a preprocessor or an object-type library. This new approach might be of interest to both academic and industrial researchers.
Over the years, hundreds of parallel programming Application programming Interfaces (API) have come and gone. Most grow a small user base and then fade away. A few API's, however, are "winners" and catch...
详细信息
Clusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application de...
详细信息
ISBN:
(数字)9783540454014
ISBN:
(纸本)3540419446
Clusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application development when using the low-level application programming interfaces MPI and OpenMP, forcing users to deal with both distributed-memory and shared-memory parallelization details. In this paper we present extensions of High Performance Fortran for SMP clusters which enable the compiler to adopt a hybrid parallelization strategy, efficiently combining distributed-memory with shared-memory parallelism. By means of a small set of new language features, the hierarchical structure of SMP clusters may be specified. This information is utilized by the compiler to derive inter-node data mappings for controlling distributed-memory parallelization across the nodes of a cluster, and intra-node data mappings for extracting shared-memory paxallelism. within nodes. Additional mechanisms are proposed for specifying inter- and intra-node data mappings explicitly, for controlling specific SM parallelization issues, and for integrating OpenMP routines in HPF applications. The proposed features are being realized within the ADAPTOR and VFC compiler. The parallelization strategy for clusters of SMPs adopted by these compilers is discussed as well as a hybrid-parallel execution model based on a combination of MPI and OpenMP. Early experimental results indicate the effectiveness of the proposed features.
We provide a parametric framework for verifying safety properties of concurrent Java programs. The framework combines thread-scheduling information with information about the shape of the heap. This leads to error-det...
详细信息
ISBN:
(纸本)9781581133363
We provide a parametric framework for verifying safety properties of concurrent Java programs. The framework combines thread-scheduling information with information about the shape of the heap. This leads to error-detection algorithms that are more precise than existing techniques. The framework also provides the most precise shape-analysis algorithm for concurrent programs. In contrast to existing verification techniques, we do not put a bound on the number of allocated objects. The framework even produces interesting results when analyzing Java programs with an unbounded number of threads. The framework is applied to successfully verify the following properties of a concurrent program: Concurrent manipulation of linked-list based ADT preserves the ADT datatype invariant [19]. The program does not perform inconsistent updates due to interference. The program does not reach a deadlock. The program does not produce run-time errors due to illegal thread interactions. We also find bugs in erroneous versions of such implementations. A prototype of our framework has been implemented.
This paper presents the recent development of the environment of on-line tools for parallel programming support, based on a universal monitoring system, the OCM, which is built in compliance with the OMIS specificatio...
详细信息
ISBN:
(纸本)3540422935
This paper presents the recent development of the environment of on-line tools for parallel programming support, based on a universal monitoring system, the OCM, which is built in compliance with the OMIS specification. Issues covered include enhancements needed both at the monitoring level and at the user interface level in order to achieve full tool support for message-passing parallel applications, and to enable interoperability of tools. We focus on the evolution of the environment towards support for performance analysis of MPI applications, and interoperability of two tools: the PATOP performance analyzer and the DETOP debugger. We also outline perspectives for further research to extend the environment's capabilities to support other parallel programming paradigms.
This paper proposes a set of extensions to the OpenMP programming model to express complex pipelined computations. This is accomplished by defining, in the form of directives, precedence relations among the tasks orig...
详细信息
ISBN:
(纸本)0769512585
This paper proposes a set of extensions to the OpenMP programming model to express complex pipelined computations. This is accomplished by defining, in the form of directives, precedence relations among the tasks originated from work-sharing constructs. The proposal is based on the definition of a name space that identifies the work parceled out by these work-sharing constructs. Then the programmer defines the precedence relations using this name space. This relieves the programmer from the burden of defining complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program difficult to understand and maintain. This work is transparently done by the compiler with the support of the OpenMP runtime library. The proposal is motivated and evaluated with a synthetic multi-block example. The paper also includes a description of the compiler and runtime support in the framework of the NanosCompiler for OpenMP.
作者:
Pressel, DMSahu, JHeavey, KRUSA
Res Lab Computat & Informat Sci Directorate Aberdeen Proving Ground MD 21005 USA USA
Res Lab Weap & Mat Res Directorate Aberdeen Proving Ground MD 21005 USA
One of the major challenges facing high performance computing is the daunting task of producing programs that will achieve acceptable levels of performance when run on parallel architectures. Although many organizatio...
详细信息
ISBN:
(数字)9783540454014
ISBN:
(纸本)3540419446
One of the major challenges facing high performance computing is the daunting task of producing programs that will achieve acceptable levels of performance when run on parallel architectures. Although many organizations have been actively working in this area for some time, many programs have yet to be parallelized. Furthermore, some programs that were parallelized were done so for obsolete systems. These programs may run poorly, if at all, on the current generation of parallel computers. Therefore, a straightforward approach to parallelizing vectorizable codes is needed without introducing any changes to the algorithm or the convergence properties of the codes. Using the combination of loop-level parallelism, and RISC-based shared memory SMPs has proven to be a successful approach to solving this problem.
POSIX Threads and OpenMP were used to implement parallelism in the nuclear reactor transient analysis code PARCS on multiprocessor SUN and SGI workstations. The achievable parallel performance for practical applicatio...
详细信息
ISBN:
(纸本)354042346X
POSIX Threads and OpenMP were used to implement parallelism in the nuclear reactor transient analysis code PARCS on multiprocessor SUN and SGI workstations. The achievable parallel performance for practical applications is compared for each of the code modules using POSIX dreads and OpenMP. A detailed analysis of the cache misses was performed on the SGI to explain the observed performance. Considering the effort required for implementation, the directive based standard OpenMP appears to be the preferred choice for parallel programming on a shared memory address machine.
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems - e.g., logic-based and constraint-based systems. The idea of i...
详细信息
ISBN:
(纸本)0769512585
Incremental stack-copying is a technique which has been successfully used to support efficient parallel execution of a variety of search-based Al systems - e.g., logic-based and constraint-based systems. The idea of incremental stack-copying is to only copy the difference between the data areas of two agents, instead of copying them entirely, when distributing parallel work. In order to further reduce the communication during stack-copying and make its implementation efficient on message-passing platforms, a new technique, called stack-splitting, has recently been proposed. In this paper, we describe a scheme to effectively combine stack-splitting with incremental stack copying, to achieve superior parallel performance in a nonshared memory environment. We also describe a scheduling scheme for this incremental stack-splitting strategy. These techniques are currently being implemented in the PALS system-a parallel constraint logic programming system.
暂无评论