A parallel algorithm to find k, 2 &le k &le nn-2, spanning trees from a connected, weighted and undirected graph G(V, E, W) in the order of increasing weight is presented. It runs in O(T(n) + k log n) time wit...
详细信息
A parallel algorithm to find k, 2 &le k &le nn-2, spanning trees from a connected, weighted and undirected graph G(V, E, W) in the order of increasing weight is presented. It runs in O(T(n) + k log n) time with O(n2/log n) processors on a CREW PRAM, where n = /V/, m = /E/ and T(n), O(log n) &le T(n) &le O(log2 n), is the time of the fastest parallelalgorithms to find a minimum spanning tree of G on a CREW PRAM with no more than O(n2/log n) processors. Since T(n) = O(log2 n) for the time being, this result shows that to find k minimum spanning trees can be done in the same time bound as to find just one when k &le O(log n) on a CREW PRAM.
Mesh is one of the most widely used interconnection networks for multiprocessor systems. In this paper, we propose an approach to partition a given mesh into m submeshes which can be allocated to m tasks with grid str...
详细信息
Mesh is one of the most widely used interconnection networks for multiprocessor systems. In this paper, we propose an approach to partition a given mesh into m submeshes which can be allocated to m tasks with grid structures. We adapt two-dimensional packing to solve the submesh allocation problem. Due to the intractability of the two-dimensional packing problem, finding an optimal solution is computationally infeasible. We develop an efficient heuristic packing algorithm called TP-heuristic. Allocating a submesh to each task is achieved using the results of packing. We propose two different methods called uniform scaling and non-uniform scaling. Experiments were carried out to test the accuracy of solutions provided by our allocation algorithm.
In this paper, a model is presented for representing and comparing workloads, based on the way they would exercise parallel machines. This workload characterization is derived from parallel instruction centroid and pa...
详细信息
In this paper, a model is presented for representing and comparing workloads, based on the way they would exercise parallel machines. This workload characterization is derived from parallel instruction centroid andparallel workload similarity. The centroid is a simple measure that aggregates average parallelism, instruction mix, and critical path length. When captured with abstracted information about communication requirements, the result is a powerful tool in understanding the requirements of workloads and their potential performance on target machines. The workload similarity is based on measuring the normalized Euclidean distance (ned) between workload centroids. It will be shown that this workload representation method outperforms comparable ones in accuracy, as well as in time and space requirements. Analysis of the NAS parallel Benchmarks and their performance is presented to demonstrate some of the applications, such as performance prediction with good accuracy, and insight provided by this model.
This paper discusses the use of run-time feedback for optimizing the execution of parallel computations. Four levels of feedback are distinguished, and the applicability and limitations of each are discussed. A two-pa...
详细信息
This paper discusses the use of run-time feedback for optimizing the execution of parallel computations. Four levels of feedback are distinguished, and the applicability and limitations of each are discussed. A two-part scheduling paradigm known as SEDIA (Static Exploration/Dynamic Instantiation and Activation) that addresses these limitations to perform robust scheduling in the presence of variant run-time behavior is introduced. A key component of this scheduling paradigm is an abstract model of run-time information fidelity, which has evolved from our previous work in the area of Trace Recovery, employing control-theoretic concepts.
In this paper a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architecture...
详细信息
ISBN:
(纸本)0818678704
In this paper a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architectures and some key, implementation techniques, such as logical addresses andparallel mechanisms, are described in details. Our prototype system and analyzing results suggest that the scheme presented in the paper not only provides an effective approach to high reliable LANs, but also can improve their performance greatly.
In a parallel computer system with faulty processors, it is highly desirable to reconfigure the system by eliminating the faulty ones and thereby restore the system to some operational state. In the reconfiguration fi...
详细信息
In a parallel computer system with faulty processors, it is highly desirable to reconfigure the system by eliminating the faulty ones and thereby restore the system to some operational state. In the reconfiguration finding the maximum size fault-free subsystem is the main problem. In this paper, we propose an efficient scheme for identifying maximum size fault-free submeshes in a faulty two-dimensional(2D) mesh system. For this, the relations between two submeshes in a 2D mesh have been defined. Then we take two phase approach. In the first phase, an efficient algorithm for determining maximal faulty submeshes in a faulty mesh has been introduced. In the second phase, we have introduced a procedure to identify the maximal fault-free submeshes by splitting all faulty submeshes from a whole mesh. The time complexity of the proposed scheme is O(Nf2) where Nf is the number of faulty processors in a 2D mesh. The proposed scheme can be utilized to the task allocation in 2D meshes in the presence of failed nodes.
The Hierarchical PRAM (H-PRAM) (T. Heywood, S. Ranka, 1992) model is a dynamically partitionable PRAM, which charges for communication and synchronization, and allows parallelalgorithms to abstractly represent genera...
详细信息
The main features of an integrated system to support the technology of application problem parallelization, development (assembly) of parallel programs, and tuning to available resources of specific multiprocessor sys...
详细信息
作者:
Bode, ArndtInstitut für Informatik
Lehrstuhl für Rechnertechnik und Rechnerorganisation Technische Universitat Munchen MunchenD-80290 Germany
This article covers research at Technische Universität München on distributed andparallelarchitectures and applications. First, an overview on the parallel processing research organization is given. The se...
详细信息
This paper presents a new methodology to automatically synthesize asynchronous circuits from descriptions based on process algebra. Traditionally, syntax-directed techniques have been used to generate a netlist of bas...
详细信息
暂无评论