This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed ...
详细信息
This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. Our connected components algorithm uses a novel approach for parallel merging which performs drastically limited updating during iterative steps, and concludes with a total consistency update at the final step. The algorithms have been coded in Split-C and run on a variety of platforms. Our experimental results are consistent with the theoretical analysis and provide the best known execution times for these two primitives, even when compared with machine-specific implementations. More efficient implementations of Split-C will likely result in even faster execution times.
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows pr...
ISBN:
(纸本)9780897917001
It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows processors independent access to their own data and a corresponding compilation strategy that integrates data-parallel computation with data distribution for out-of-core problems. Our results compare several communication methods and I/O optimizations using two out-of-core problems, Jacobi iteration and LU factorization.
In this paper, we propose a straightforward solution to the problems of compositional parallelprogramming by using skeletons as the uniform mechanism for structured composition. In our approach parallel programs are ...
ISBN:
(纸本)9780897917001
In this paper, we propose a straightforward solution to the problems of compositional parallelprogramming by using skeletons as the uniform mechanism for structured composition. In our approach parallel programs are constructed by composing procedures in a conventional base language using a set of high-level, pre-defined, functional, parallel computational forms known as skeletons. The ability to compose skeletons provides us with the essential tools for building further and more complex application-oriented skeletons specifying important aspects of parallel computation. Compared with the process network based composition approach, such as PCN, the skeleton approach abstracts away the fine details of connecting communication ports to the higher level mechanism of making data distributions conform, thus avoiding the complexity of using lower level ports as the means of interaction. Thus, the framework provides a natural integration of the compositional programming approach with the data parallelprogramming paradigm.
Data-parallel languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. Novel features of these languages call for the develop...
ISBN:
(纸本)9780897917001
Data-parallel languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. Novel features of these languages call for the development of new techniques in both compilers and run-time systems. In this paper, we present an improved algorithm for finding the local memory access sequence in computations involving regular sections of arrays with cyclic(k) distributions. After establishing the fact that regular section indices correspond to elements of an integer lattice, we show how to find a lattice basis that allows for simple and fast enumeration of memory accesses. The complexity of our algorithm is shown to be lower than that of the previous solution for the same problem. In addition, the experimental results demonstrate the efficiency of our method in practice.
This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed ...
详细信息
ISBN:
(纸本)9780897917001
This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components. Our general framework is a single-address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. Our connected components algorithm uses a novel approach for parallel merging which performs drastically limited updating during iterative steps, and concludes with a total consistency update at the final step. The algorithms have been coded in Split-C and run on a variety of platforms. Our experimental results are consistent with the theoretical analysis and provide the best known execution times for these two primitives, even when compared with machine-specific implementations. More efficient implementations of Split-C will likely result in even faster execution times.
For a long time efficient use of parallel computers has been hindered by dependencies introduced in software through low-level implementation practice. In this paper we present a programming environment and language c...
ISBN:
(纸本)9780897917001
For a long time efficient use of parallel computers has been hindered by dependencies introduced in software through low-level implementation practice. In this paper we present a programming environment and language called Object-Math (Object oriented Mathematical language for scientific computing), which aims at eliminating this problem by allowing the user to represent mathematical equation-based models directly in the system. The system performs analysis of mathematical models to extract parallelism and automatically generates parallel code for numerical *** the context of industrial applications in mechanical analysis, we have so far primarily explored generation of parallel code for solving systems of ordinary differential equations (ODEs), in addition to preliminary work on generating code for solving partial differential equations. Two approaches to extracting parallelism have been implemented and evaluated: extracting parallelism at the equation system level and at the single equation level, respectively. We found that for several applications the corresponding systems of equations do not partition well into subsystems. This means that the equation system level approach is of restricted general applicability. Thus, we focused on the equation-level approach which yielded significant parallelism for ODE systems solution. For the bearing simulation applications we present here, the achieved speedup is however critically dependent on low communication latency of the parallel computer.
The proceedings contain 26 papers. The topics discussed include: LogP: towards a realistic model of parallel computation;exploiting task and data parallelism on a multicomputer;ActorSpace: an open distributed programm...
ISBN:
(纸本)0897915895
The proceedings contain 26 papers. The topics discussed include: LogP: towards a realistic model of parallel computation;exploiting task and data parallelism on a multicomputer;ActorSpace: an open distributed programming paradigm;experiences using the ParaScope editor: an interactive parallelprogramming tool;perturbation analysis of high level instrumentation for SPMD programs;integrating message-passing and shared-memory: early experience;using scheduler information to achieve optimal barrier synchronization performance;and a concurrent copying garbage collector for languages that distinguish (im)mutable data.
The symposium materials contain 26 papers covering the spectrum from models of parallel computing to implementation techniques, and from compilation algorithms to application development tools and case studies, thus s...
详细信息
ISBN:
(纸本)0897915895
The symposium materials contain 26 papers covering the spectrum from models of parallel computing to implementation techniques, and from compilation algorithms to application development tools and case studies, thus satisfying the goal of broadly covering the active areas of parallelprogramming research.
Efficient parallel execution of a high-level data-parallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of ne...
详细信息
作者:
Siegl, Kurt
Johannes Kepler University LinzA-4040 Austria
||MAPLE|| (speak: parallel Maple) is a portable system for parallel symbolic computation. The system is built as an interface between the parallel declarative programming language Strand and the sequential computer al...
详细信息
ISBN:
(纸本)0897915895
||MAPLE|| (speak: parallel Maple) is a portable system for parallel symbolic computation. The system is built as an interface between the parallel declarative programming language Strand and the sequential computer algebra system Maple, thus providing the elegance of Strand and the power of the existing sequential algorithms in Maple. The implementation of difFerent parallelprogramming paradigms shows that it is fairly easy to parallelize even complex algebraic algorithms using this system. Sample applications (among them algorithms solving multivariate nonlinear equation systems) are implemented on various parallel architectures. For example a straightforward parallelization of the complex and important problem of real root isolation has been parallelized using a generic Strand program of fewer than 20 Unes of code and a slight modification of 5 lines in the original sequential Maple source. Even with such a simple modification we gained a speed-up of 5 times, that is better than those reported by others in the literature.
暂无评论