This paper consists of two parts. In the first one, two new algorithms for wormhole routing on the hypercube network are presented. These techniques are adaptive and are ensured to be deadlock- and livelock-free. Thes...
详细信息
This paper consists of two parts. In the first one, two new algorithms for wormhole routing on the hypercube network are presented. These techniques are adaptive and are ensured to be deadlock- and livelock-free. These properties are guaranteed by using a small number of resources in the routing node. The first algorithm is adaptive and nonminimal, and will be referred to as Nonminimal. In this technique, some moderate derouting is allowed in order to alleviate the potential congestion arising from highly structured communication patterns, The second algorithm, dubbed Subcubes, is adaptive and minimal, and is based on partitioning the hypercube into subcubes of smaller dimension. This technique requires only two virtual channels per physical link of the node. In the second part of the paper, a wide variety of techniques for wormhole routing in the hypercube are evaluated from an algorithmic point of view. Five partially adaptive algorithms are considered: the Hanging algorithm, the Zenith algorithm, the Hanging-order algorithm, the Nonminimal algorithm, and the Subcubes algorithm. One oblivious algorithm, the Dimension-Order, or E-Cube routing algorithm, is also used. Finally, a Fully Adaptive Minimal algorithm is tried. A simple node model was designed and adapted to all the algorithms. For those algorithms that require fewer virtual channels per physical link, the extra logical channels are used as extra lanes. As a result of this, the storage and routing capabilities of the algorithms are equalized. For the empirical performance evaluation, several dynamic injection loads are used on a hypercube of 2(10) nodes.
Molecular dynamics, which is a powerful method for studying both structural and dynamic properties of condensed matter, is employed as a "microscope for the motion of atoms". Information about the motion of ...
详细信息
Molecular dynamics, which is a powerful method for studying both structural and dynamic properties of condensed matter, is employed as a "microscope for the motion of atoms". Information about the motion of individual atoms that is difficult to obtain experimentally can be obtained using this method. parallel computers with tera-flop speed and tera-byte memory will make it possible to perform hundreds of million particle simulations using empirical potential. Moreover, the macroscopic phenomena analyzed by continuum theory can be obtained by coarse-graining the atomic-level information provided by this approach. When considering the problem of data analysis, examples in semiconductor physics such as the implantation process are provided. Copyright (C) 1998 Elsevier Science B.V.
A parallel numerical model was established for solving the Navier-Stokesequations by using Sequential Regu-larization Method (SRM). The computational domain is decomposedinto P sub-domains in which the difference form...
详细信息
A parallel numerical model was established for solving the Navier-Stokesequations by using Sequential Regu-larization Method (SRM). The computational domain is decomposedinto P sub-domains in which the difference formulae were obtained from the governing equations. Thedata were exchannged at the virtual boundary of sub-domains in parallel computation. Theclose-channel cavity flow was solved by the implicit method. The driven square cavity flow wassolved by the explicit method. The results were compared well those given by Ghia.
The partition method of Wang for tridiagonal equations is generalized to the arbitrary band case. A stability criterion is given. The algorithm is compared to Gaussian elimination and cyclic reduction.
The partition method of Wang for tridiagonal equations is generalized to the arbitrary band case. A stability criterion is given. The algorithm is compared to Gaussian elimination and cyclic reduction.
There are several benchmark programs available to measure the performance of MPI on parallel computers. The most common use of MPI benchmarks software are SKaMPI, Pallas MPI Benchmark, MPBench, Mpptest and MPIBench. I...
详细信息
There are several benchmark programs available to measure the performance of MPI on parallel computers. The most common use of MPI benchmarks software are SKaMPI, Pallas MPI Benchmark, MPBench, Mpptest and MPIBench. It is interesting to analyze the differences between different benchmark. Presently, there have been few comparisons done between the different benchmarks. Thus, in this paper we discuss a comparison of the techniques used and the functionality of each benchmark, and also a comparison of the results on a distributed memory machine and shared memory machine for point-to-point communication. All of the MPI benchmarks listed above will be compared in this analysis. It is expected that the results from different benchmarks should be similar, however this analysis found substantial differences in the results for certain MPI communications, particularly for shared memory machines.
This paper consists of two parts. In the first part, two new algorithms for deadlock- and livelock-free wormhole routing in the torus network are presented. The first algorithm, called *-Channels, is for the n-dimensi...
详细信息
This paper consists of two parts. In the first part, two new algorithms for deadlock- and livelock-free wormhole routing in the torus network are presented. The first algorithm, called *-Channels, is for the n-dimensional torus network. This technique is fully-adaptive minimal, that is, all paths with a minimal number of hops from source to destination are available for routing, and needs only five virtual channels per bidirection link, the lowest channel requirement known in the literature for fully-adaptive minimal worm-hole routing. In addition, this result also yields the lowest buffer requirement known in the literature for packet-switched fully-adaptive minimal routing. The second algorithm, called 4-Classes, is for the bidimensional torus network. This technique is fully-adaptive minimal and requires only eight virtual channels per bidirectional link. Also, it allows for a highly parallel implementation of its associated routing node. In the second part of this paper, four worm-hole routing techniques for the two-dimensional torus are experimentally evaluated using a dynamic message injection model and different traffic patterns and message lengths.
Realized cellular automata may be operated by universal computer systems as programmable special-purpose processors for parallelizable problems. Because of their architecture (local neighbourhood, small storage size p...
详细信息
Realized cellular automata may be operated by universal computer systems as programmable special-purpose processors for parallelizable problems. Because of their architecture (local neighbourhood, small storage size per cell, they are well suited for processing systolic algorithms. A cellular programming language — named CEPROL — is presented which offers means for programming and controlling cellular automata processing such algorithms.
The concept of the two-dimensional (2-D) parallel computer with square module arrays was first introduced by Unger. It is the purpose of this paper to discuss the relative merits of square and hexagonal module arrays,...
详细信息
The concept of the two-dimensional (2-D) parallel computer with square module arrays was first introduced by Unger. It is the purpose of this paper to discuss the relative merits of square and hexagonal module arrays, to propose an operational symbolism for the various basic hexagonal modular transformations which may be performed by these comupters, to illustrate some logical circuit implementation, and to describe a few elementary applications.
A novel algorithm is presented which models protein-protein interactions using surface complementarity. The method is applied to antibody-antigen docking. A steric scoring scheme, based upon a soft potential, is used ...
详细信息
A novel algorithm is presented which models protein-protein interactions using surface complementarity. The method is applied to antibody-antigen docking. A steric scoring scheme, based upon a soft potential, is used to assess complementarity, and a simple electrostatic model is then used to remove infeasible interactions. The soft potential allows for structural changes that occur during docking. Biochemical knowledge is necessary to reduce the number of docking orientations produced by the method to a manageable size. The information used includes the known epitope residues and a single loose distance constraint. The method is applied to all three crystallographically determined antibody-lysozyme complexes, HyHEL-10, D1.3 and HyHEL-5. For the first time, a predicted antibody structure (that of D1.3) is used as a docking target. In the four systems modelled, the method identifies between 15 and 40 possible docking orientations. The root-meansquare (r.m.s.) deviation between these orientations and the relevant crystallographic complex is measured in the interface region. For all four complexes an orientation is found with r.m.s. deviation in the range 1.9 Å and 4.8 Å. The algorithm is implemented on a single instruction/multiple datastream (SI/MD) architecture computer. The use of a parallel architecture computer ensures detailed coverage of the search space, whilst still maintaining a search time of two days.
The Benes binary network can realize any one-to-one mapping of its 2ninlets onto its 2noutlets. Several authors have proposed algorithms which compute control patterns for this network from any bijection assignment. H...
详细信息
The Benes binary network can realize any one-to-one mapping of its 2ninlets onto its 2noutlets. Several authors have proposed algorithms which compute control patterns for this network from any bijection assignment. However, these algorithms are both time-consuming and space-consuming. In order to meet the time constraints arising from the use of a Benes network as the alignm
暂无评论