The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computin...
详细信息
The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computing the d-MST is NP-hard for every d in the range 2 &le d &le (n - 2) where n denotes the total number of nodes, several approximate algorithms have been proposed in the literature. We have previously proposed two approximate algorithms, TC-RNN and IR, for the d-MST problem. Our experimental results show that while the IR algorithm is faster, the TC-RNN algorithm consistently produces spanning trees with a smaller weight. In this paper, we propose a new algorithm, TC-NNC, which is an improved version of TC-RNN. Our experiments using randomly generated, weighted graphs as input demonstrate that the execution time of TC-NNC is smaller than that of TC-RNN, and is very close to that of IR. Further, the quality-of-solution of TC-NNC is better than that of IR and is very close to that of TC-RNN.
We present three deterministic parallel selection algorithms with analysis on clusters. The first and second algorithms are proposed on the CGM (Coarse Grained Multicomputer) model. The first algorithm achieves optima...
详细信息
We present three deterministic parallel selection algorithms with analysis on clusters. The first and second algorithms are proposed on the CGM (Coarse Grained Multicomputer) model. The first algorithm achieves optimality with respect to its computation time and runs in O(n/p) computation time and O(min(log p, log log n)) communication rounds if n/p/spl ges/p/sup /spl epsiv// for any /spl epsiv/>0, where n is the number of input elements and p is the number of processors. The second algorithm achieves optimality with respect to the number of communication rounds and runs in O(n/p log p) computation time and the constant number of communication rounds if n/p>p/sup /spl epsiv// for any /spl epsiv/>0. The third selection algorithm is a hybrid algorithm of the above two algorithms and suitable for practical cluster computing. The algorithms are implemented using PVM, and its experimental results are also presented.
A novel network structure called generalized Hierarchical Completely-Connected networks (HCC) is proposed, and its properties and features are evaluated. A set of the HCCs constructed by the proposed method includes s...
详细信息
A novel network structure called generalized Hierarchical Completely-Connected networks (HCC) is proposed, and its properties and features are evaluated. A set of the HCCs constructed by the proposed method includes some conventional hierarchical networks (then it is called generalized). The construction of an HCC is started from a basic block (a level-1 block) which consists of n nodes with a constant degree. Then a level-h (h/spl ges/2) block is constructed recursively by interconnecting any pair of macro nodes (n level-(h-1) blocks) completely. An HCC has the constant node degree regardless of increasing its size (the number of nodes). Furthermore, since an HCC has a hierarchically structured character and the feature of uniformity, a wide variety of inter-cluster connections are possible.
Most parallelprogramming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performan...
详细信息
Most parallelprogramming models for distributed-memory architectures are based on individual threads interacting via send and receive operations. We show that a more structured model, BSP, gains substantial performance improvements by exploiting the extra information implicit in its structure. In particular, each thread learns something about global state whenever it receives a message. This information can be used to modify its own behavior to improve collective use of the communication system. The programming model's semantics also provides implicit knowledge that can be exploited to increase performance. We show that these effects are useful at the application level by comparing the performance of BSP and MPI implementations of the NAS parallel benchmarks.
To minimize the execution time of an iterative application in a heterogeneous parallel computing environment, an appropriate mapping scheme is needed for matching and scheduling the subtasks of the application onto th...
详细信息
To minimize the execution time of an iterative application in a heterogeneous parallel computing environment, an appropriate mapping scheme is needed for matching and scheduling the subtasks of the application onto the processors. When some of the characteristics of the application subtasks are unknown a priori and will change from iteration to iteration during execution-time, a semi-static methodology can be employed, that starts with an initial mapping but dynamically decides whether to perform a remapping between iterations of the application, by observing the effects of these dynamic parameters on the application's execution time. The objective of this study is to implement and evaluate such a semi-static methodology. For analyzing the effectiveness of the proposed scheme, it is compared with two extreme approaches: a completely dynamic approach using a fast mapping heuristic and an ideal approach that uses a genetic algorithm on-line but ignores the time for remapping. Experimental results indicate that the semi-static approach outperforms the dynamic approach and is reasonably close to the ideal but infeasible approach.
For an effective Internet based distributed parallel computing platform, Java-Internet Computing Environment (JICE) is designed and implemented with multithreading and remote method invocation mechanisms provided in J...
详细信息
For an effective Internet based distributed parallel computing platform, Java-Internet Computing Environment (JICE) is designed and implemented with multithreading and remote method invocation mechanisms provided in Java. Specifically, JICE supports a shared memory system model for communication between any two nodes. Under the JICE, communication time is a major candidate of performance bottleneck. To reduce this communication overhead, a method of grouping is designed based on the optimal communication time. Communication performance given by grouping is evaluated through the analysis of execution time and verified via experiments. The results show that communication time can be reduced about 80% from executing some Java benchmarks on JICE.
This paper presents a randomized parallel algorithm for the Maximal Independent Set problem. Our algorithm uses a BSP-like computer with p processors and requires that n+m/p=/spl Omega/(p) for a graph with 72 vertices...
详细信息
This paper presents a randomized parallel algorithm for the Maximal Independent Set problem. Our algorithm uses a BSP-like computer with p processors and requires that n+m/p=/spl Omega/(p) for a graph with 72 vertices and m edges. Under this scalability assumption, and after a preprocessing phase, it computes a maximal independent set after O(log p) communication rounds, with high probability, each round requiring linear computation time O(n+p/p). The preprocessing phase is deterministic and important in order to ensure that degree computations can be implemented efficiently. For this, we give an optimal parallel BSP/CGM algorithm to the p-quantiles search problem, which runs in O(m log p/p) time and a constant number of communication rounds, and could be of interest in its own right, as shown in the text.
There are well defined methods for supporting regular problems with scalable performance, typified by the HPF language and the BSP model. Less well understood is the solution of more irregular problems, supporting com...
详细信息
There are well defined methods for supporting regular problems with scalable performance, typified by the HPF language and the BSP model. Less well understood is the solution of more irregular problems, supporting complex shared data structures and task dependencies, and typically requiring dynamic load balancing to sustain high performance. It is demonstrated how the use of Shared Abstract Data Types (SADTs), together with an extended BSP cost model, can support irregular problems in a structured manner An SADT is an extension of a serial ADT which allows the concurrent invocation of its methods. A number of SADTs are used to implement a solution of the travelling salesman problem on the Cray T3D machine, and a description of the restructuring of a parallel CFD code using SADTs is provided, with initial results given for the Cray T3E.
In this paper, we report a performance gap between a schedule with good makespan on the task scheduling model and the corresponding parallel program on distributed memory parallel machines. The main reason is the soft...
详细信息
In this paper, we report a performance gap between a schedule with good makespan on the task scheduling model and the corresponding parallel program on distributed memory parallel machines. The main reason is the software overhead in the interprocessor communication. Therefore, speedup ratios of schedules on the model do not well approximate to those of parallel programs on the machines. The purpose of the paper is to get a task scheduling algorithm that generates schedules with good approximation to the corresponding parallel program. For this purpose, we propose an algorithm BCSH that generates only bulk synchronous style schedules. In those schedules, computation phases and communication phases appear alternately. All interprocessor communications are done only in the latter phases, and thus the corresponding parallel programs can make better use of the message packaging technique easily. It reduces some software overheads of messages from a source processor to the same destination processor to almost one software overhead, and improves the performance of a parallel program significantly. Finally we show some results of performance comparisons between BCSH and Kruatrachue's algorithm DSH. The schedules by the latter are famous for their good makespans. However the approximations of ours are much better than those of Kruatrachue's.
The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computin...
详细信息
The Minimum Spanning Tree (MST) problem with an added constraint that no node in the spanning tree has the degree more than a specified integer d, is known as the Degree-Constrained MST (d-MST) problem. Since computing the d-MST is NP-hard for every d in the range 2/spl les/d/spl les/(n-2) where n denotes the total number of nodes, several approximate algorithms have been proposed in the literature. We have previously proposed two approximate algorithms, TC-RNN and IR, for the d-MST problem (L.J. Mao et al., 1997). Our experimental results show that while the IR algorithm is faster, the TC-RNN algorithm consistently produces spanning trees with a smaller weight. We propose a new algorithm, TC-NNC, which is an improved version of TC-RNN. Our experiments using randomly generated, weighted graphs as input, demonstrate that the execution time of TC-NNC is smaller than that of TC-RNN, and is very close to that of IR. Further, the quality-of-solution of TC-NNC is better than that of IR and is very close to that of TC-RNN.
暂无评论