Most of the work in scientificomputing today is done in parallelalgorithms, often via message-passing architectures such as the message-passing interface (MPI). Anewly emerging language called Ruby, which maintains a ...
详细信息
Most of the work in scientificomputing today is done in parallelalgorithms, often via message-passing architectures such as the message-passing interface (MPI). Anewly emerging language called Ruby, which maintains a strict adherence to object orientedprinciples and a clean, intuitive syntax. The author created MPI Ruby, a complete binding of MPI toRuby. In this article, he introduces Ruby and MPI Ruby. Some applications and information on theproject's current status and its availability are described.
Kohonen's self-organizing map algorithm provides computational neurobiology with a useful model of the primate cerebral cortex. However, simulations of only modestly sized maps quickly exceed the capacity of even ...
详细信息
Kohonen's self-organizing map algorithm provides computational neurobiology with a useful model of the primate cerebral cortex. However, simulations of only modestly sized maps quickly exceed the capacity of even very fast workstations. Here, we report that a parallel implementation of the algorithm on a Beowulf commodity-class computing cluster scales very favorably with the number of available nodes and greatly speeds the computation of medium-to-large-scale cortical maps. (C) 2002 Elsevier Science B.V. All rights reserved.
In this paper two problems on the class of k-trees, a subclass of the class of chordal graphs, are considered: the fast reordering problem and the isomorphism problem. An O (log(2) it) time parallel algorithm for the ...
详细信息
In this paper two problems on the class of k-trees, a subclass of the class of chordal graphs, are considered: the fast reordering problem and the isomorphism problem. An O (log(2) it) time parallel algorithm for the fast reordering problem is described that uses O (nk (n - k)/log n) processors on a CRCW PRAM proving membership in the class NC for fixed k. An O(nk(k + 1)!) time sequential algorithm for the isomorphism problem is obtained representing an improvement over the O(n(2)k(k + 1)!) algorithm of Sekharan (the second author) [10]. A parallel version of this sequential algorithm is presented that runs in O(log(2) n) time using O((nk((k + 1)! +n -k))/log n) processors improving on a parallel algorithm of Sekbaran for the isomorphism problem [ 10]. Both the sequential and parallel algorithms use a concept introduced in this paper called the kernel of a k-tree.
The PROUD module placement algorithm mainly uses a hierarchical decomposition technique and the solution of sparse linear systems based on a resistive network analogy. It has been shown that the PROUD algorithm can ac...
详细信息
The PROUD module placement algorithm mainly uses a hierarchical decomposition technique and the solution of sparse linear systems based on a resistive network analogy. It has been shown that the PROUD algorithm can achieve a comparable design of the placement problems for very large circuits with the best placement algorithm based on simulated annealing, but with several order of magnitude faster. The modified PROUD, namely MPROUD algorithm by perturbing the coefficient matrices performs much faster that the original PROUD algorithm. Due to the instability and unguaranteed convergence of MPROUD algorithm, we have proposed a new convergent and numerically stable PROUD, namely Improved PROUD algorithm, denoted as IPROUD with attractive computational costs to solve the module placement problems by making use of the SYMMLQ and MINRES methods based on Lanczos process (Yang, 1997). We subsequently propose parallel versions of the improved PROUD algorithms. The parallel algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent. Therefore, the cost of global communication which represents the bottleneck of the parallel performance on parallel distributed memory computers can be significantly reduced, therefore, to obtain another order of magnitude improvement in the runtime without loss of the quality of the layout.
An outerplanar graph is a planar graph that can be imbedded in the plane in such a way that all vertices lie on the exterior face. An outerplanar graph is maximal if no edge can be added to the graph without violating...
详细信息
An outerplanar graph is a planar graph that can be imbedded in the plane in such a way that all vertices lie on the exterior face. An outerplanar graph is maximal if no edge can be added to the graph without violating the outer-planarity. In this paper, an optimal parallel algorithm is proposed on the EREW PRAM for testing isomorphism of two maximal outerplanar graphs. The proposed algorithm takes O(log n) time using O(n) work. Besides being optimal, it is very simple. Moreover, it can be implemented optimally on the CRCW PRAM in O(1) time. (C) 2002 Elsevier Science (USA).
This paper introduces a new parallel algorithm for computing an N( = n!)-point Lagrange interpolation on an n-star (n > 2). The proposed algorithm exploits several communication techniques on stars in a novel way, ...
详细信息
This paper introduces a new parallel algorithm for computing an N( = n!)-point Lagrange interpolation on an n-star (n > 2). The proposed algorithm exploits several communication techniques on stars in a novel way, which can be adapted for computing similar functions. It is optimal and consists of three phases: initialization, main, and final. While there is no computation in the initialization phase, the main phase is composed of n!/2 steps, each consisting of four multiplications, four subtractions, and one communication operation and an additional step including one division and one multiplication. The final phase is carried out in (n-1) subphases each with O(log n) steps where each step takes three communications and one addition. Results from a cost-performance comparative analysis reveal that for practical network sizes the new algorithm on the star exhibits superior performance over those proposed for common interconnection networks. (C) 2002 Elsevier Science (USA).
The external selection problem is to select the record with the K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each process...
详细信息
ISBN:
(纸本)0769515126
The external selection problem is to select the record with the K-th smallest key from the given N records that are distributed and stored evenly on the D disks for the parallel machine with D processors. Each processor has its own primary memory of size M records and one disk, where N/D> M. The processors are connected with a root D X rootD Mesh architecture. Based on a two-stage approach, this paper presents an efficient parallel external selection algorithm for the distributed-memory parallel systems. First, all the processors execute local external sorting in parallel, each processor sorts the N/D records on its own disk. Next, they execute parallel external selection from the D sorted sub files on the D disks. This algorithm is asymptotically optimal and has a small constant factor of time complexity.
In the presented work the authors included the comparison of the calculations of a parallel FDTD algorithm with the computations obtained with the use of the Quick Wave programme published by QWED. The authors worked ...
详细信息
ISBN:
(纸本)0769517315
In the presented work the authors included the comparison of the calculations of a parallel FDTD algorithm with the computations obtained with the use of the Quick Wave programme published by QWED. The authors worked out a parallel implementation of the standard FDTD algorithm which is based on MPI communication library. The parallel algorithm was examined in a heterogeneous PC cluster.
In the article the authors describe an idea of parallel implementation of a conjugate gradient method in a heterogeneous PC cluster and a supercomputer Hitachi SR-2201. The new version of algorithm implementation diff...
详细信息
ISBN:
(纸本)0769517315
In the article the authors describe an idea of parallel implementation of a conjugate gradient method in a heterogeneous PC cluster and a supercomputer Hitachi SR-2201. The new version of algorithm implementation differs from the one applied earlier[1], because it uses a special method for storing sparse coefficient matrices: only non-zero elements are stored and taken into account during computations, so that the sparsity of the coefficient matrix is taken full advantage of. The article includes a comparison of the two versions. A speedup of the parallel algorithm has been examined for three different cases of coefficient matrices resulting in solving different physical problems. The authors have also investigated a preconditioning method, which uses the inversed diagonal of the coefficient matrix, as a preconditioning matrix.
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the ...
详细信息
ISBN:
(纸本)0769515126
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the previous serial program, the computation of curvature, the first-order and the second-order difference were determined to be the main objects of parallelization. Some processing techniques were applied to convert the serial program into parallel program, such as the strategy of "Divide and Conquer", the balance of the loading distribution. Numerical simulation computation of the parallel program results in a great increase of computing speed of the non-ideal 3-D space detonation wave propagation.
暂无评论