Dynamic load balancing is crucial for the performance of many parallel algorithms. Random polling, a simple randomized load balancing algorithm, has proved to be very efficient in practice for applications like parall...
详细信息
Dynamic load balancing is crucial for the performance of many parallel algorithms. Random polling, a simple randomized load balancing algorithm, has proved to be very efficient in practice for applications like parallel depth first search. This paper presents a detailed analysis of the algorithm taking into account many aspects of the underlying machine and the application to be load balanced. It derives tight scalability bounds which are for the first time able to explain the superior performance of random polling analytically. In some cases, the algorithm even turns out to be optimal. Some of the proof-techniques employed might also be useful for the analysis of other parallel algorithms.< >
We give an efficient shortest path algorithm on a mesh-connected processor array for n/spl times/n banded matrices with bandwidth b. We use a [b/2]/spl times/[b/2] semisystolic processor array. The input data is suppl...
详细信息
We give an efficient shortest path algorithm on a mesh-connected processor array for n/spl times/n banded matrices with bandwidth b. We use a [b/2]/spl times/[b/2] semisystolic processor array. The input data is supplied to the processors array from the host computer. The output from the processor array can be also supplied to itself through the host computer. This algorithm computes all pair shortest distances within the band in 7n-4[b/2]-1 steps.< >
In parallel programs where the problem data is dynamically generated, it is very useful to be able to rely on an efficient load balancing algorithm. The token distribution problem (TDP) is a generalization of the stat...
详细信息
In parallel programs where the problem data is dynamically generated, it is very useful to be able to rely on an efficient load balancing algorithm. The token distribution problem (TDP) is a generalization of the static load balancing problem. The paper describes a novel algorithm for solving the TDP for k-ary d-cube topology networks. Compared to other algorithms, our method is more general and does not rely on every processor knowing the exact number of tokens associated to each processor. The correctness of the algorithm is proved and its complexity is informally studied.< >
In this paper we consider the simple polygon visibility problem: Given a simple polygon P with N vertices and a point z in the interior of the polygon, find all the boundary points of P that are visible from z. We pre...
详细信息
ISBN:
(纸本)0818665076
In this paper we consider the simple polygon visibility problem: Given a simple polygon P with N vertices and a point z in the interior of the polygon, find all the boundary points of P that are visible from z. We present an O(logN loglogN) time algorithm that solves the simple polygon visibility problem on a /spl radic/N/spl times//spl radic/N RMESH. Previously, the best known algorithm for the problem on a /spl radic/N/spl times//spl radic/N RMESH takes O(log/sup 2/ N) time.< >
parallel multi-layer classifier architectures with an increasing hierarchical order have offered much flexibility in design to deal with a wide variety of properties. The model of pipeline processing is especially app...
详细信息
parallel multi-layer classifier architectures with an increasing hierarchical order have offered much flexibility in design to deal with a wide variety of properties. The model of pipeline processing is especially appropriate for realising such architectures. This has provided hierarchical classifiers a distinct advantage in real-time applications to cope with the important demand for high operating speed, in addition to a potentially better classification performance. An example application of a cascaded form of the BWS and FWS networks, both of which are representatives of the array memory based statistical classifier is described in this paper. As with most pipelined architectures, the complex interactions between successive processing layers of the cascaded network represent a major drawback, and they impose performance bottlenecks which challenge the use of a highly parallel realisation of the classifier. This paper describes an efficient data parallel implementation of the BWS-FWS. For completeness, a brief review of the multi-layer classifiers is first presented. The new algorithm for combining the BWS and FWS networks is described and implemented on two distributed memory processor arrays, the MasPar MP-1 and a network of transputers. An analysis of the performance obtained is also presented.< >
In this paper, an O(n log n) time algorithm for finding all the maximal cliques of an interval graph is proposed. This algorithm can also be implemented in parallel in O(log n) time using O(n/sup 2/) processors. The m...
详细信息
ISBN:
(纸本)0818665076
In this paper, an O(n log n) time algorithm for finding all the maximal cliques of an interval graph is proposed. This algorithm can also be implemented in parallel in O(log n) time using O(n/sup 2/) processors. The maximal cliques of an interval graph contain important structural information. Many problems on interval graphs can be solved after all the maximal cliques are known. It is shown that cut vertices, bridges, and vertex connectivities can all be determined easily after the maximal cliques are known. Finally, the all-pair shortest path problem for interval graphs is solved based on the relationship between maximal cliques. The all-pair shortest path algorithm can also be parallelized in O(log n) time using O(n/sup 2/) processors.< >
Dynamic load balancing schemes are essentially significant for efficiently executing non-uniform problems in highly parallel multicomputer systems. Their objective is to minimize the total execution time of single app...
详细信息
ISBN:
(纸本)0818665076
Dynamic load balancing schemes are essentially significant for efficiently executing non-uniform problems in highly parallel multicomputer systems. Their objective is to minimize the total execution time of single applications. This paper proposes adaptive receiver initiated diffusion (ARID) strategy for distributed dynamic load balancing. Its principle and control protocol are described. The communication overhead and the effect on system stability and performance efficiency are analyzed. Finally simulation experiments are carried out to compare the adaptive strategy with other dynamic load balancing scheme.< >
How can a user write a program to be portable and efficient across widely different parallelarchitectures, such as SIMD, MIMD, shared-memory, distributed memory, workstation clusters, etc.? The following issues are c...
详细信息
How can a user write a program to be portable and efficient across widely different parallelarchitectures, such as SIMD, MIMD, shared-memory, distributed memory, workstation clusters, etc.? The following issues are considered: what language should be used; how appropriate is one language for different applications; how efficient can a portable program be; and how will efficiency be achieved.< >
We present two new fault tolerant routing algorithms for hypercubes. The first algorithm requires only local knowledge of the faults whereas the second algorithm requires global knowledge. Unlike previous fault tolera...
详细信息
We present two new fault tolerant routing algorithms for hypercubes. The first algorithm requires only local knowledge of the faults whereas the second algorithm requires global knowledge. Unlike previous fault tolerant routing algorithms, our algorithms take into consideration the dynamic conditions (link contention) of the network. We have shown that checking for dynamic conditions in fault tolerant algorithms is essential. Performance evaluation by extensive simulation of our algorithms and other fault tolerant routing algorithms show that ours are better than previous algorithms by as much as 50%; and 500%; in time and space, respectively. We also observed that global information about the location of faults does not give us additional benefit. This observation is true regardless of the consideration of the dynamic conditions in the network.< >
Lower bound on the finishing time of optimal schedules is used as an absolute performance measure of static scheduling heuristics. This paper presents an efficient method of computing such a bound based on estimating ...
详细信息
Lower bound on the finishing time of optimal schedules is used as an absolute performance measure of static scheduling heuristics. This paper presents an efficient method of computing such a bound based on estimating overlaps among the execution ranges of tasks in a given task graph and analyzing the delays of tasks on the critical paths of the graph. The computation performed by this method is shown to be of higher quality than that of other known methods. The future work and directions on this topic are also indicated.< >
暂无评论