An efficient assignment of tasks to the processors is imperative for achieving a fast job turnaround time in a parallel or distributed environment. The assignment problem is well known to be NP-complete, except in a f...
详细信息
An efficient assignment of tasks to the processors is imperative for achieving a fast job turnaround time in a parallel or distributed environment. The assignment problem is well known to be NP-complete, except in a few special cases. Thus heuristics are used to obtain suboptimal solutions in reasonable amount of time. While a plethora of such heuristics have been documented in the literature, in this paper we aim to develop techniques for finding optimal solutions under the most relaxed assumptions. We propose a best-first search based parallel algorithm that generates optimal solution for assigning an arbitrary task graph to an arbitrary network of homogeneous or heterogeneous processors. The parallel algorithm running on the Intel Paragon gives optimal assignments for problems of medium to large sizes. We believe our algorithms to be novel in solving an indispensable problem in parallel and distributedcomputing.
Neural system, as processors of time-sequence patterns, have been successfully applied to several speaker-dependent speech recognition computing. They can be efficiently implemented by a pipeline architecture. In this...
详细信息
Neural system, as processors of time-sequence patterns, have been successfully applied to several speaker-dependent speech recognition computing. They can be efficiently implemented by a pipeline architecture. In this paper, a parallel time-delay speech recognition computing for VLSI neural systems is presented. The system design methodology is to emphasize a coordination between computational model, architectural description, and VLSI systolic implementation. Examples of time-delay speech recognition applications to VLSI neural system design and performance analysis are given to illustrate effectiveness of the parallel computation.
In this paper, we present an original approach for the design and execution of distributed applications that require numerous tasks of variable grain. The approach is based on the concept of task cluster which is an e...
详细信息
In this paper, we present an original approach for the design and execution of distributed applications that require numerous tasks of variable grain. The approach is based on the concept of task cluster which is an entity that groups tasks with strong logical interaction and that guarantees efficient communications between them. We describe the implementation of the model, that mainly relies on the use of lightweight processes as support for the distributed tasks. We also illustrates the use of the proposed approach on real size applications where it has improved both the ease of design and the performance.
This paper presents the coherent parallel programming concept using a new parallel language called C (pronounced C parallel). The language is based on the standard C language with a small set of extended constructs fo...
详细信息
This paper presents the coherent parallel programming concept using a new parallel language called C (pronounced C parallel). The language is based on the standard C language with a small set of extended constructs for parallelism and process interaction. At the core of C is a structured construct called coherent region, which facilitates the development of coherent programs, i.e., parallel programs that are structured, determinate, terminative, and compositional. We present the basic features of C and show that coherent region is a versatile construct.
In this paper, an approach using least square method for Callback implementation is presented. Experiments on Callback approved that Callback was not only effective in high-level Metasystem but also in low level heter...
详细信息
ISBN:
(纸本)0818678763
In this paper, an approach using least square method for Callback implementation is presented. Experiments on Callback approved that Callback was not only effective in high-level Metasystem but also in low level heterogeneous system.
In this work, it is argued that because of recent advance of network & CPU technologies, workstation clusters are poised to become the primary parallelcomputing infrastructure for science and engineering computin...
详细信息
In this work, it is argued that because of recent advance of network & CPU technologies, workstation clusters are poised to become the primary parallelcomputing infrastructure for science and engineering computing. After analyzing and comparing the communication performance of three popular networks: 10Mbps Ethernet, 100Mbps Ethernet and 640Mbps Myrinet on an experimental workstation cluster, it is pointed that two main factors hinder the wider application of workstation cluster. These problems are overcome by implementing two workstation cluster systems for different performance/price rate requirements.
We present a strategy for optimizing parallel algorithms introducing redundant computations. In order to calculate the optimal amount of redundancy, we generalize the LogP model to capture messages of varying sizes us...
详细信息
ISBN:
(纸本)0818678763
We present a strategy for optimizing parallel algorithms introducing redundant computations. In order to calculate the optimal amount of redundancy, we generalize the LogP model to capture messages of varying sizes using functions instead of constants for the machine parameters. We validate our method for a wave simulation algorithm on a Parsytec PowerXplorer with eight processors and a workstation cluster with four workstations.
Exact direction and distance vectors are essential for detecting hierarchical parallelism and examining legality of loop transformation for a multiple level loop nest. Much of this work has been concentrated on array ...
详细信息
ISBN:
(纸本)0818678763
Exact direction and distance vectors are essential for detecting hierarchical parallelism and examining legality of loop transformation for a multiple level loop nest. Much of this work has been concentrated on array references. Little has been done to address the problems of finding precise dependences between scalar references, except to use extended SSA form with factored use-def links. In this paper, ne present a technique for calculating precise direction and distance vectors for scalar references within nested loops without using any forms of SSA. To do this, we use conventional use-def links in combination with joint dominator and joint postdominator relationships, which are extended front dominator and postdominator respectively in standard data flow analysis. The precision of dependence information gathered hv our algorithm can not be achieved by traditional analysis of dominator or reaching definitions.
The framework of constructing a distributed multimedia system based on the server/client architecture is described in this paper. We focus our attention on the realization of synchronization presentation of different ...
详细信息
ISBN:
(纸本)0818678763
The framework of constructing a distributed multimedia system based on the server/client architecture is described in this paper. We focus our attention on the realization of synchronization presentation of different media in a multimedia application, and a set of eos(quality of service) parameters is given as a criterion to make a trade-off between overall performance of the system and the synchronization presentation in each multimedia application.
In this paper, we consider the parallel implementation of solving generalized eigenproblem of Hermitian type matrices on Dawning-1000. It arises from the theoretical analysis of nonlinear optical crystal structures. W...
详细信息
ISBN:
(纸本)0818678763
In this paper, we consider the parallel implementation of solving generalized eigenproblem of Hermitian type matrices on Dawning-1000. It arises from the theoretical analysis of nonlinear optical crystal structures. We uses Cholesky factorisation, Househoulder transformation, bisection method and inverse iteration to complete the computation. The implementation is base on the BLAS library and communication function library provided on Dawning-1000. The numerical results show very good performance and the application in physics is satisfactory.
暂无评论