An efficient assignment of tasks to the processors is imperative for achieving a fast job turnaround time in a parallel or distributed environment. The assignment problem is well known to be NP-complete, except in a f...
详细信息
An efficient assignment of tasks to the processors is imperative for achieving a fast job turnaround time in a parallel or distributed environment. The assignment problem is well known to be NP-complete, except in a few special cases. Thus heuristics are used to obtain suboptimal solutions in reasonable amount of time. While a plethora of such heuristics have been documented in the literature, in this paper we aim to develop techniques for finding optimal solutions under the most relaxed assumptions. We propose a best-first search based parallel algorithm that generates optimal solution for assigning an arbitrary task graph to an arbitrary network of homogeneous or heterogeneous processors. The parallel algorithm running on the Intel Paragon gives optimal assignments for problems of medium to large sizes. We believe our algorithms to be novel in solving an indispensable problem in parallel and distributedcomputing.
This paper introduces an algorithm that can generate huge node dataflow by compiling existing programs. The purpose of this algorithm is to improve the speed of parallel processing and utilize the large amount of exis...
详细信息
ISBN:
(纸本)0818678763
This paper introduces an algorithm that can generate huge node dataflow by compiling existing programs. The purpose of this algorithm is to improve the speed of parallel processing and utilize the large amount of existing program resourses. In addition, this idea of huge node data flow algorithm can also be used in distributed processing and multi-thread processing.
We introduce a new parallel programming paradigm, namely synchronous parallel critical sections. Such parallel critical sections must be seen in the context of switching between synchronous and asynchronous modes of c...
详细信息
We introduce a new parallel programming paradigm, namely synchronous parallel critical sections. Such parallel critical sections must be seen in the context of switching between synchronous and asynchronous modes of computation. Thread farming allows to generate bunches of threads to solve independent subproblems asynchronously and in parallel. Opposed to that, synchronous parallel critical sections allow to organize bunches of asynchronous parallel threads to execute certain tasks jointly and synchronously. We show how the PRAM language Fork95 can be extended by a construct join supporting parallel critical sections. We explain its semantics and implementation, and discuss possible applications.
In this paper, an approach using least square method for Callback implementation is presented. Experiments on Callback approved that Callback was not only effective in high-level Metasystem but also in low level heter...
详细信息
ISBN:
(纸本)0818678763
In this paper, an approach using least square method for Callback implementation is presented. Experiments on Callback approved that Callback was not only effective in high-level Metasystem but also in low level heterogeneous system.
In this paper a parallel algorithm for solving tridiagonal equations based on recurrence is presented. Compared with the parallel prefix method (PP) [3] which is also based on the recursive method, the computation cos...
详细信息
ISBN:
(纸本)0818678763
In this paper a parallel algorithm for solving tridiagonal equations based on recurrence is presented. Compared with the parallel prefix method (PP) [3] which is also based on the recursive method, the computation cost is reduced by a factor of two while maintaining the same communication cost. The method can be viewed as a modified prefix method or prefix with substructuring. The complexity of the algorithm is analysed using the BSP model(Bulk Synchronous parallel). Experimental results are obtained on a Sun workstation using the Oxford BSP Library.
In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems are existing, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynami...
详细信息
In the field of parallel FEM methods a number of highly efficient solutions for distributed memory systems are existing, but the passage to adaptive parallel FEM simulations leads, in all probability, to a more dynamic behaviour with respect to data placement and load balancing. Therefore shared-memory architecture seems to be a more appropriate solution for getting efficient implementations. This work presents a parallelized CG-method for shared memory systems which was implemented on a 4-processor SMP system and makes explicit use of shared memory to enhance the communication between different domains. It is based on an idea for implementing parallelization on distributed memory systems and represents an appropriate modification of this method. The results show that an increased synchronization expense can partially compensate the advantages of shared memory communication depending on the levels of refinement and the processor number.
We present a strategy for optimizing parallel algorithms introducing redundant computations. In order to calculate the optimal amount of redundancy, we generalize the LogP model to capture messages of varying sizes us...
详细信息
ISBN:
(纸本)0818678763
We present a strategy for optimizing parallel algorithms introducing redundant computations. In order to calculate the optimal amount of redundancy, we generalize the LogP model to capture messages of varying sizes using functions instead of constants for the machine parameters. We validate our method for a wave simulation algorithm on a Parsytec PowerXplorer with eight processors and a workstation cluster with four workstations.
The framework of constructing a distributed multimedia system based on the server/client architecture is described in this paper. We focus our attention on the realization of synchronization presentation of different ...
详细信息
ISBN:
(纸本)0818678763
The framework of constructing a distributed multimedia system based on the server/client architecture is described in this paper. We focus our attention on the realization of synchronization presentation of different media in a multimedia application, and a set of eos(quality of service) parameters is given as a criterion to make a trade-off between overall performance of the system and the synchronization presentation in each multimedia application.
This paper characterizes the structure and resource requirements of the NAS parallel Benchmarks (NPB), a popular benchmark suite used to evaluate various parallel computers. The phase parallel model is used to obtain ...
详细信息
ISBN:
(纸本)0818678763
This paper characterizes the structure and resource requirements of the NAS parallel Benchmarks (NPB), a popular benchmark suite used to evaluate various parallel computers. The phase parallel model is used to obtain parameter values for memory, I/O, and communication latency and bandwidth requirements. These quantitative parameters are useful in the design and evaluation of various parallel computers. The results of this study is being used in designing Dawning 2000, which is NCIC's second generation MPP.
In this paper, we consider the parallel implementation of solving generalized eigenproblem of Hermitian type matrices on Dawning-1000. It arises from the theoretical analysis of nonlinear optical crystal structures. W...
详细信息
ISBN:
(纸本)0818678763
In this paper, we consider the parallel implementation of solving generalized eigenproblem of Hermitian type matrices on Dawning-1000. It arises from the theoretical analysis of nonlinear optical crystal structures. We uses Cholesky factorisation, Househoulder transformation, bisection method and inverse iteration to complete the computation. The implementation is base on the BLAS library and communication function library provided on Dawning-1000. The numerical results show very good performance and the application in physics is satisfactory.
暂无评论