We examine the suitability of CORBA based solutions for meeting application requirements in the field of distributed parallel programming. We outline concepts defined within CORBA which are helpful for the development...
详细信息
We examine the suitability of CORBA based solutions for meeting application requirements in the field of distributed parallel programming. We outline concepts defined within CORBA which are helpful for the development of parallel applications, and we describe which programming techniques are at hand for this purpose. Subsequently, we present an object group service which facilitates the development of CORBA based distributed and parallel software applications. Moreover, we introduce some basic ideas of how the Unified Modeling Language (UML) can be used for modeling parallel applications.
Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applicati...
详细信息
Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applications by comparing implementations of the NAS MG benchmark in several parallel programming languages: Co-Array Fortran, High Performance Fortran, Single Assignment C, and ZPL. We evaluate each language in terms of its portability, its performance, and its ability to express the algorithm clearly and concisely. Experimental platforms include the Cray T3E, IBM SP, SGI Origin, Sun Enterprise 5500, and a high-performance Linux cluster. Our findings indicate that while it is possible to achieve good portability, performance, and expressiveness, most languages currently fall short in at least one of these areas. We find a strong correlation between expressiveness and a language’s support for a global view of computation, and we identify key factors for achieving portable performance in multigrid applications.
By means of some design algorithms and experimental results, this paper depicts the design, implementation and performance of DPVM (Dawning PVM), which is a port of the PVM (parallel Virtual Machine) to the Dawning cl...
详细信息
ISBN:
(纸本)0769505892
By means of some design algorithms and experimental results, this paper depicts the design, implementation and performance of DPVM (Dawning PVM), which is a port of the PVM (parallel Virtual Machine) to the Dawning cluster systems. DPVM is derived from the original PVM 3.3.11 source code to retain compatibility with existing PVM applications. However, it uses a low-level direct communication protocol and the resource management software of the Dawning2000 cluster system. Experimental results show that DPVM is a well-implemented PVM implementation on an IBM RS6000 workstation cluster, and that it efficiently utilizes the Dawning cluster system with the efficient support of a low-level communication protocol and resource management service.
Volume rendering is computation intensive and can benefit from the combined processing power of a workstation cluster. Carefully balancing the workload among all workstations in the cluster is critical in achieving hi...
详细信息
Volume rendering is computation intensive and can benefit from the combined processing power of a workstation cluster. Carefully balancing the workload among all workstations in the cluster is critical in achieving high efficiency in parallel volume rendering. We describe how the load balance of a BSP model-based parallel volume render program can be improved substantially using a profile visualiser. The profile visualiser distinguishes between the two types of load imbalance: those caused by the poor design of the parallel volume renderer and those caused by the sharing of the workstations by the other users. This information allows us to concentrate on improving the performance of the parallel program by designing a better load balance strategy for the parallel volume render.
In recent years there has been growing interest in systematic methods for refining Z specifications into programs. We consider a transformational programming strategy known as filter promotion and examine its use for ...
详细信息
In recent years there has been growing interest in systematic methods for refining Z specifications into programs. We consider a transformational programming strategy known as filter promotion and examine its use for refining a class of Z specifications into sequential as well as parallel programs. This strategy is particularly useful for transforming specification of generate and test problems into efficient algorithms. We find it convenient to use different notations at different level of abstractions: Z to capture the starting specification, Bird-Meertens functional notation to express algorithms and Hoare's CSP to describe parallelism and communications. The basic ideas are illustrated by systematic transformational developments of sequential and parallel algorithms for sorting and searching problems.
In this paper we study one kind of irregular computation on distributed arrays, the irregular prefix operation, that is currently not well taken into account by the standard data-parallel language HPF2. We show a para...
详细信息
In this paper we study one kind of irregular computation on distributed arrays, the irregular prefix operation, that is currently not well taken into account by the standard data-parallel language HPF2. We show a parallel implementation that efficiently takes advantage of the independent computations arising in this irregular operation. Our approach is based on the use of a directive which characterizes an irregular prefix operation and on inspector/executor support, implemented in the CoLuMBO library, which optimizes the execution by using an asynchronous communication scheme and then communication/computation overlap. We validate our contribution with results achieved on IBM SP2 for basic experiments and for a sparse Cholesky factorization algorithm applied to real size problems.
Collective operations on distributed data sets foster a high-level data-parallel programming style that eases many aspects of parallel programming significantly. In this paper we describe how higher-order collective o...
详细信息
Collective operations on distributed data sets foster a high-level data-parallel programming style that eases many aspects of parallel programming significantly. In this paper we describe how higher-order collective operations on distributed object sets can be introduced in a structured way by means of reusable topology classes and C++ templates.
Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel computing environments on clusters of works...
详细信息
Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel computing environments on clusters of workstations. We propose a new, efficient coordinated checkpointing technique, called coherence-based coordinated checkpointing (CCC), for SDSM. Our CCC minimizes both the checkpointing overhead during failure-free execution and the cost of recovery from failures by leveraging existing coherence information maintained by SDSM. In the presence of system failures, it allows SDSM to recover from the most recent checkpoint, saving the re-computation time. We have performed experiments on a cluster of eight Sun Ultra-5 workstations, comparing our CCC technique against both simple coordinated checkpointing (SCC) and incremental coordinated checkpointing (ICC) techniques by actually implementing these techniques in TreadMarks, a stare-of-the-art SDSM system. The experimental results demonstrate that our CCC technique consistently outperforms both SCC and ICC techniques. In particular our technique increases the execution time slightly by 0.5% to 4% for a 2-minute checkpointing interval during failure-free execution, while SCC and ICC techniques result in the execution time overhead of 4% to 100% and 3% to 64%, respectively for the same checkpointing interval.
This paper presents the benchmarking results and performance evaluation of the PC cluster built at the National Center for High-Performance Computing (NCHC) in Taiwan. The evaluation compares different cluster archite...
详细信息
ISBN:
(纸本)0769505892
This paper presents the benchmarking results and performance evaluation of the PC cluster built at the National Center for High-Performance Computing (NCHC) in Taiwan. The evaluation compares different cluster architecture and software platforms. The results indicate that PC cluster in general has the advantage of better cost/performance ratio, and has the potential to outperform some conventional parallel machines, such as IBM SP2, for specific application problems. However, the performance of PC cluster is largely application-dependent. The results in this paper provide useful information about what kinds of applications have the largest potential to take advantage of PC clusters and about the relative advantage of different cluster architectures and software platforms.
parallel discrete event simulation (PDES) techniques have not yet made a substantial impact on the network simulation community because of the need to recast the simulation models using a new set of tools. To address ...
详细信息
parallel discrete event simulation (PDES) techniques have not yet made a substantial impact on the network simulation community because of the need to recast the simulation models using a new set of tools. To address this problem, we present a case study in transparently parallelizing a widely used network simulator, called ns. The use of this parallel ns does not require the modeler to learn any new tools or complex PDES techniques. The paper describes our approach and design choices to build the parallel ns and presents preliminary performance results, which are very encouraging.
暂无评论