The ParaStation communication fabric provides a high-speed communication network with user-level access to enable efficient parallelcomputing on workstation clusters. The architecture, implemented on off-the-shelf wo...
详细信息
The ParaStation communication fabric provides a high-speed communication network with user-level access to enable efficient parallelcomputing on workstation clusters. The architecture, implemented on off-the-shelf workstations coupled by the ParaStation communication hardware, removes the kernel and common network protocols from the communication path while still providing full protection in a multiuser, multiprogramming environment. The programming interface presented by ParaStation consists of a UNIX socket emulation and widely used parallel programming environments such as PVM, P4, and MPI. This allows porting a wide range of client/server and parallel applications to the ParaStation architecture. Implementations of ParaStation using various platforms, such as Digital's AlphaGeneration workstations and Linux PCs, achieve end-to-end (process-to-process) latencies as low as 2 mu s and a sustained bandwidth of up to 15 Mbyte/s per channel with small packets. Benchmarks using PVM on ParaStation demonstrate real application performance of 1 GFLOP on an 8-node cluster. (C) 1998 Elsevier Science Inc. All rights reserved.
This paper considers the problem of optimal configuration of the monitoring units in a hierarchical distributed monitoring system-A hierarchical distributed monitoring system consists of a hierarchy of monitoring unit...
详细信息
This paper considers the problem of optimal configuration of the monitoring units in a hierarchical distributed monitoring system-A hierarchical distributed monitoring system consists of a hierarchy of monitoring units which are grouped and distributed onto the physical network. The architecture lends itself to parallel processing, reducing the complexity of distributed monitoring caused by factors such as collection and processing of the large quantities of monitoring data. Furthermore, the topology-specific partitioning of the monitoring units allow complex, topology-specific events to be monitored and evaluated in a natural and efficient way. The optimal configuration problem is concerned with finding an optimal hierarchical partition of the monitoring units such that the total processing cost is the minimum. It is a NP-complete problem. In this paper, we study the heuristics for obtaining near-optimal grouping of monitoring units. Simulation of heuristic algorithms for mesh and hypercube networks are presented. The results suggest further system topology specific heuristics. Although the paper is targeted at distributed monitoring, we believe that the results can also be applied to other hierarchical control problems in distributedcomputing. (C) 1998 Elsevier Science Inc. All rights reserved.
A new class of interconnection networks, the hypernetworks, have been proposed recently. Hypernetworks are characterized by hypergraphs. Compared with point-to-point networks, they allow for increased resource-sharing...
详细信息
A new class of interconnection networks, the hypernetworks, have been proposed recently. Hypernetworks are characterized by hypergraphs. Compared with point-to-point networks, they allow for increased resource-sharing and communication bandwidth utilization, and they are especially suitable for optical interconnects. One way to derive a hypernetwork is by finding the dual of a point-to-point network. Hypercube Qn, where n is the dimension, is a very popular point-to-point network. In this article, we consider using the dual Q*n of hypercube of Qn as an interconnection network. We investigate the properties of Q*n, and present a set of fundamental data communication algorithms for Q*n. Our results indicate that hypernetwork Q*n is a useful and promising interconnection structure for high-performance parallel and distributed computing systems.
Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributedcomputing system. In this paper, we describe the design...
详细信息
Dynamic task assignment and migration are the key technique to load balancing which plays an important role in the achievement of high performance in distributedcomputing system. In this paper, we describe the design and implementation of an online thread scheduling and migration system (S&M) based on a previous work of LWP -MPI. Experimental results show that performance is enhanced.
The problem of embedding link-disjoint Hamiltonian cycles into torus networks is addressed. The maximum number of link-disjoint cycles is limited to half the degree of the node in a regular network. Simple methods are...
详细信息
The problem of embedding link-disjoint Hamiltonian cycles into torus networks is addressed. The maximum number of link-disjoint cycles is limited to half the degree of the node in a regular network. Simple methods are presented to embed the maximum number of link-disjoint Hamiltonian cycles in an r-dimensional torus network. An algorithm for finding a Hamiltonian cycle in an r-dimensional torus in the presence of a set of faulty links is also given. Copyright (C) 1997 Elsevier Science Ltd
The generalised computational model of term graph rewriting systems (TGRSs) has been used extensively as an implementation vehicle for a number of, often divergent, programming paradigms ranging from the traditional f...
详细信息
The generalised computational model of term graph rewriting systems (TGRSs) has been used extensively as an implementation vehicle for a number of, often divergent, programming paradigms ranging from the traditional functional programming ones to the (concurrent) logic programming ones and various amalgamations of them, to (concurrent) object-oriented ones. More recently, the relationship between TGRSs and process calculi (such as the pi-calculus) as well as linear logic has also been explored. In this paper we describe our experience in using the intermediate compiler target language Dactl based on TGRSs for mapping a variety of programming paradigms of the aforementioned types onto it. In particular, we concentrate on some of the issues that we feel have played an important role in our work (in, say, affecting performance, etc.), the aim being to derive a list of features that we feel every language model which intends to be used as an intermediate representation between (concurrent) high-level languages and (parallel) computer architectures must have. (C) 1997 Elsevier Science B.V.
This paper presents the design philosophy and implementation of the BALANCE system. BALANCE Is a flexible, network independent and computer architecture independent load balancing system which allows the building of r...
详细信息
This paper presents the design philosophy and implementation of the BALANCE system. BALANCE Is a flexible, network independent and computer architecture independent load balancing system which allows the building of reusable parallel and distributed applications. By implementing related services as generic servers with their connection endpoints registered in BALANCE, the clients can easily access the servers by server system calls. To demonstrate the flexibility of BALANCE, several widely different applications have been implemented and evaluated, including system servers, parallel and distributed applications and a scheduling testbed. The use of generic servers to improve system modularity and code reuse is also discussed. (C) 1997 by John Wiley & Sons, Ltd.
Fault tolerant algorithms are presented for broadcasting on the star graph. In our algorithm, fault tolerance is achieved by constructing an isomorphism of the star network, such that the faulty nodes minimally disrup...
详细信息
Fault tolerant algorithms are presented for broadcasting on the star graph. In our algorithm, fault tolerance is achieved by constructing an isomorphism of the star network, such that the faulty nodes minimally disrupt the message passing sequence. It is shown that, in the presence of r(1 less than or equal to r less than or equal to k-2) faults, at most r extra steps are required by our algorithm to perform a one-to-all broadcasting in the k-star network. Our algorithm has the same time complexity as an optimal broadcasting algorithm, and, since it takes advantage of the hierarchical nature of the star graph network, it can be implemented easily. Our algorithm can also be used to perform all-to-all broadcasting in a faulty star graph.
Given a Cartesian product G = G(1) x ... x G(M) (m greater than or equal to 2) of nontrivial connected graphs G(i) and the base d, dimension D de Bruijn graph B(d, D), it is investigated under which conditions G is (o...
详细信息
Given a Cartesian product G = G(1) x ... x G(M) (m greater than or equal to 2) of nontrivial connected graphs G(i) and the base d, dimension D de Bruijn graph B(d, D), it is investigated under which conditions G is (or is not) a subgraph of B(d,D). We present a complete solution of this problem for the case D greater than or equal to 4. For D = 3, we give partial results including a complete solution for the case that G is a torus, i.e., G is the Cartesian product of cycles.
In this paper we present an approach to parallelization of the program for computation of axisymmetrical forging process. The parallel algorithm we have applied is based on non-overlapping domain decomposition method....
详细信息
In this paper we present an approach to parallelization of the program for computation of axisymmetrical forging process. The parallel algorithm we have applied is based on non-overlapping domain decomposition method. A mesh of elements is divided into layers assigned to different processes. The parallel program was written in C using PVM and it was implemented on Convex Exemplar SPP1000 and on networked workstations IBM RS/6000-320. We have investigated dependence of performance of the elaborated parallel program on number of process and on number of nodes in the mesh.
暂无评论