Redundancy is a basic technique for achieving fault tolerance, but the overhead introduced by redundancy may degrade system's performance. In this paper, we propose efficient replication based algorithms for fault...
详细信息
Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMT-based architectures, it is imperative to obtain insight on the interaction between meshin...
详细信息
ISBN:
(纸本)9781595931672
Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMT-based architectures, it is imperative to obtain insight on the interaction between meshing algorithms and these systems. We focus on parallel Constrained Delaunay Mesh (PCDM) generation. We exploit coarse-grain parallelism at the subdomain level and fine-grain at the element level. this multigrain data parallel approach targets clusters built from low-end, commercially available SMTs. Our experimental evaluation shows that current SMTs are not capable of executing fine-grain parallelism in PCDM. However, experiments on a simulated SMT indicate that with modest hardware support it is possible to exploit fine-grain parallelism opportunities. the exploitation of fine-grain parallelism results to higher performance than a pure MPI implementation and closes the gap between the performance of PCDM and the state-of-the-art sequential mesher on a single physical processor. Our findings extend to other adaptive and irregular multigrain, parallelalgorithms. Copyright 2005 ACM.
Measurement and modelling of distributions of data communication times is commonly done for telecommunication networks, but this has not previously been done for message passing communications on parallel computers. W...
详细信息
ISBN:
(纸本)3540292357
Measurement and modelling of distributions of data communication times is commonly done for telecommunication networks, but this has not previously been done for message passing communications on parallel computers. We have used the MPIBench program to measure distributions of point-to-point MPI communication times for two different parallel computers, with a low-end Ethernet network and a high-end Quadrics network respectively. Here we present and discuss the results of efforts to fit the measured distributions with standard probability distribution functions such as exponential, lognormal, Erlang, gamma, Pearson 5 and Weibull distributions.
this paper proposes efficient techniques to reconfigure a multi-processor array, which embedded in a 6-port switch lattice in the form of a rectangular grid. It has been shown that the proposed architecture with 6-por...
详细信息
ISBN:
(纸本)3540292357
this paper proposes efficient techniques to reconfigure a multi-processor array, which embedded in a 6-port switch lattice in the form of a rectangular grid. It has been shown that the proposed architecture with 6-port switches eliminate gate delays and notably increase the harvest when compared with one using 4-port switches. A new rerouting algorithm combines the latest techniques to maximize harvest without increase in reconfiguration time. Experimental results show that the new reconfiguration algorithm consistently outperforms the most efficient algorithm proposed in literature.
A semi-dynamic system is presented that is capable of predicting the performance of parallel programs at runtime. the functionality given by the system allows for efficient handling of portability and irregularity of ...
详细信息
A semi-dynamic system is presented that is capable of predicting the performance of parallel programs at runtime. the functionality given by the system allows for efficient handling of portability and irregularity of parallel programs. Two forms of parallelism are addressed: loop level parallelism and task level parallelism.
the present paper discusses scalable implementations of sparse matrix-vector products using OpenMP to execute the iterative method on the SGI Altix3700, the IBM eServer p5 595 and the Sun SunFire.15K. three storage fo...
详细信息
ISBN:
(纸本)0769524869
the present paper discusses scalable implementations of sparse matrix-vector products using OpenMP to execute the iterative method on the SGI Altix3700, the IBM eServer p5 595 and the Sun SunFire.15K. three storage formats (CRS, BSR and DIA)for sparse matrices are evaluated. the present implementation provides satisfactory scalabilities. In some cases, an optimal storage format with data conversion should be used. In addition, the influence of the cache/memory bus architectures on the optimum choice of the storage format is examined.
In this paper we propose a new parallelization scheme for Simulated Annealing - Hierarchical parallel SA (HPSA). this new scheme features coarse-granularity in parallelization, directed at message-passing systems such...
详细信息
ISBN:
(纸本)3540292357
In this paper we propose a new parallelization scheme for Simulated Annealing - Hierarchical parallel SA (HPSA). this new scheme features coarse-granularity in parallelization, directed at message-passing systems such as clusters. It combines heuristics such as adaptive clustering with SA to achieve more efficiency in local search. through experiments with various optimization problems and comparison with some available schemes, we show that HPSA is a powerful general-purposed optimization method. It can also serve as a framework for meta-heuristics to gain broader application.
this paper presents a parallel version for the Propagation Algorithm which belongs to the region growing family of algorithms. the main goal of our implementation is to decrease de Propagation Algorithm execution time...
详细信息
the Array-OL specification model has been introduced to model systematic signal processing applications. this model is multidimensional and allows to express the full potential parallelism of an application: both task...
详细信息
the Array-OL specification model has been introduced to model systematic signal processing applications. this model is multidimensional and allows to express the full potential parallelism of an application: both task and data parallelism. the Array-OL language is an expression of data dependences and thus allows many execution orders. In order to execute Array-OL applications on distributed architectures, we show here how to project such specification onto the Kahn process network model of computation. We show how Array-OL code transformations allow to choose a projection adapted to the target architecture.
暂无评论