检索结果-内蒙古大学图书馆

CONCEPTS FOR EFFICIENT MULTIGRID IMPLEMENTATION ON SUPRENUM-LIKE ARCHITECTURES

PARALLEL COMPUTING 1991年第1期17卷 1-16页

作者： ALEF, M Kernforschungszentrum Karlsruhe GmbH Abteilung Numerische Physik (HDI 3) Postfach 36 40 W-7500 Karlsruhe 1 Germany

The implementation of algorithms on distributed-memory multiprocessors requires regular exchange of certain intermediate results between the parallel processes. The less data that must be moved the more efficient the parallelization is. In this paper, concepts for efficient implementation of multigrid methods with regular grid structure are presented for the example of the SUPRENUM supercomputer. The main idea is the introduction of an optimized 'multicolor' relaxation scheme, combined with an adapted agglomeration technique. The speedup to be expected on SUPRENUM is discussed for the example of the solution of the Poisson equation in boundary-fitted coordinates.

关键词： AGGLOMERATION distributed-memory multiprocessor MIMD MULTICOLOR RELAXATION MULTIGRID METHOD multiprocessor EFFICIENCY PARALLELIZATION SUPRENUM

来源：评论

学校读者我要写书评

暂无评论

A PARALLEL ALGORITHM FOR GENERATING DISCRETE ORTHOGONAL POLYNOMIALS

引用

PARALLEL COMPUTING 1992年第6期18卷 649-659页

作者： EGECIOGLU, O KOC, CK UNIV HOUSTON DEPT ELECT ENGNHOUSTONTX 77204 UNIV CALIF SANTA BARBARA DEPT COMP SCISANTA BARBARACA 93106

A parallel algorithm that makes use of the classical three-term recursion formula to construct an orthogonal family of polynomials with respect to a discrete inner product is proposed. The algorithm requires O(N log N) parallel arithmetic steps on a distributed-memory multiprocessor with N + 1 processors to construct the polynomials p(i)(x) for 0 less-than-or-equal-to i less-than-or-equal-to N. If hypercube topology is assumed, the algorithm can be implemented with the additional overhead of O(N log N) routing steps. In this case the implementation is quite simple, requiring only scalar single node broadcast and accumulation procedures together with a Gray code mapping. The limited processor version of the algorithm requires O(N2/p + N log p) arithmetic and O(N log p) routing steps on a hypercube with p less-than-or-equal-to N + 1 nodes. We present some experimental results obtained on an Intel cube.

关键词： DISCRETE ORTHOGONAL POLYNOMIALS PARALLEL ALGORITHMS distributed-memory multiprocessor HYPERCUBE

来源：评论

学校读者我要写书评

暂无评论

Exploitation of image parallelism for ray tracing 3D scenes on 2D mesh multicomputers

引用

PARALLEL COMPUTING 1997年第13期23卷 1993-2015页

作者： Lee, TY Natl Cheng Kung Univ Dept Comp Sci & Informat Engn Tainan 70101 Taiwan

Ray tracing is a well known technique to generate life-like images based on models of light shading, reflection, and refraction. The massive computation and memory demands of ray tracing complex scenes have long motivated researchers to use parallel processing in reducing the ray tracing time. This paper gives a study of parallel implementation of a ray tracing algorithm on a distributed memory parallel computer. The computational cost of rendering pixels and patterns of data access can not be predicted until runtime. To efficiently parallelize such an application, the issues of database partition, data management and load balancing must be addressed. In this paper, we discuss the ways of database partition and propose a dynamic data management scheme which can exploit image coherence to reduce data. communication time. A global load balancing mechanism is presented to ensure a good load balance among processors during ray tracing time. The success of our implementation depends crucially on a number of parameters which are experimentally evaluated. (C) 1997 Published by Elsevier Science B.V.

关键词： ray tracing communication distributed-memory multiprocessor load balancing scalability

来源：评论

学校读者我要写书评

暂无评论

Parallel p-adic method for solving linear systems of equations

引用

PARALLEL COMPUTING 1997年第13期23卷 2067-2074页

作者： Koc, CK Oregon State Univ Dept Elect & Comp Engn Corvallis OR 97331 USA

We present a parallel algorithm for an exact solution of an integer linear system of equations using the single modulus p-adic expansion technique. More specifically, we parallelize an algorithm of Dixon, and present our implementation results on a distributed-memory multiprocessor. The parallel algorithm presented here can be used together with the multiple moduli algorithms and parallel Chinese remainder algorithms for fast computation of the exact solution of a system of linear equations with integer entries. (C) 1997 Elsevier Science B.V.

关键词： integer linear systems Dixon's algorithm distributed-memory multiprocessor implementation results

来源：评论

学校读者我要写书评

暂无评论

A MULTILEVEL DIFFUSION METHOD FOR DYNAMIC LOAD BALANCING

引用

PARALLEL COMPUTING 1993年第2期19卷 209-218页

作者： HORTON, G Lehrstuhl f&uuml r Rechnerstrukturen Universit&auml t Erlangen-N&uuml rnberg Martensstr. 3 8520 Erlangen Federal Republic of Germany

We consider the problem of dynamic load balancing for multiprocessors, for which a typical application is a parallel finite element solution method using non-structured grids and adaptive grid refinement. This type of application requires communication between the subproblems which arises from the interdependencies in the data. A load balancing algorithm should ideally not make any assumptions about the physical topology of the parallel machine. Further requirements are that the procedure should be both fast and accurate. An new multi-level algorithm is presented for solving the dynamic load balancing problem which has these properties and whose parallel complexity is logarithmic in the number of processors used in the computation.

关键词： DYNAMIC LOAD BALANCING PARALLEL COMPUTING distributed-memory multiprocessor MULTILEVEL ALGORITHM

来源：评论

学校读者我要写书评

暂无评论

Minimum communication cost reordering for parallel sparse Cholesky factorization

引用

PARALLEL COMPUTING 1999年第8期25卷 943-967页

作者： Lin, WY Chen, CL IShou Univ Dept Informat Management Kaohsiung Taiwan Natl Taiwan Univ Dept Comp Sci & Informat Engn Taipei 10764 Taiwan

In this paper, we consider the problem of reducing the communication cost for the parallel factorization of a sparse symmetric positive definite matrix on a distributed-memory multiprocessor. We define a parallel communication cost function and show that, with a contrived example, simply minimizing the height of the elimination tree is ineffective for exploiting minimum communication cost and the discrepancy may grow infinitely. We propose an algorithm to find an ordering such that the communication cost to complete the parallel Cholesky factorization is minimum among all equivalent reorderings. Our algorithm consumes O(n log n + m) in time, where n is the number of nodes and m the sum of all maximal clique sizes in the filled graph. (C) 1999 Elsevier Science B.V. All rights reserved.

关键词： communication cost distributed-memory multiprocessor sparse matrix parallel factorization equivalent reordering elimination tree

来源：评论

学校读者我要写书评

暂无评论

SUPERNODAL SYMBOLIC CHOLESKY FACTORIZATION ON A LOCAL-memory multiprocessor

引用

PARALLEL COMPUTING 1993年第2期19卷 153-162页

作者： NG, E Mathematical Sciences Section Oak Ridge National Laboratory P.O. Box 2008 Bldg. 6012 Oak Ridge TN 37831-6367 USA

In this paper, we consider the symbolic factorization step in computing the Cholesky factorization of a sparse symmetric positive definite matrix on distributed-memory multiprocessor systems. By exploiting the supernodal structure in the Cholesky factor, the performance of a previous parallel symbolic factorization algorithm is improved. Empirical tests demonstrate that there can be drastic reduction in the execution time required by the new algorithm on an Intel iPSC/2 hypercube.

关键词： LINEAR ALGEBRA CHOLESKY FACTORIZATION SPARSE MATRICES distributed-memory multiprocessor SYMBOLIC FACTORIZATION

来源：评论

学校读者我要写书评

暂无评论

MODELING AND EVALUATION OF A NEW MESSAGE-PASSING SYSTEM FOR PARALLEL multiprocessor SYSTEMS

引用

PARALLEL COMPUTING 1993年第6期19卷 633-649页

作者： AZARIA, H ELOVICI, Y Department of Electrical and Computer Engineering Ben-Gurion University of the Negev Beer-Sheva Israel

As parallel implementation of complex applications is becoming popular, the need for a high performance interprocessor communication system becomes imminent, especially in loosely coupled distributed-memory multiprocessor networks. An important factor in the efficiency of these networks is the effectiveness of the message-passing system which manages the data exchanges among the processors of the network. This paper presents the modeling and performance evaluation of a new Message-Passing System (MPS) for distributed multiprocessor networks without shared-memory and where the processors or Processing Elements (PEs) are connected to each other by point-to-point communication links. For maximum performance, the MPS manages the communication and the synchronization between the different tasks of an application by means of three approaches. One is an asynchronous send/receive approach which handles efficiently server like tasks. the second is a synchronous send/receive approach which handles efficiently streaming communication mode and the third is a virtual channel approach which minimizes the overhead of the synchronization mechanism, efficiently handling the burst mode of heavy communication between tasks. The developed models of the MPS approaches enable the determination of analytical expressions for different performances and a comparison between analytical and experimental performances reveals that the models predict the MPS performance with high accuracy. The MPS written in Parallel ANSI C, is studied on a mesh topology network of 16 transputers T800. The MPS performances for each approach are studied and presented in terms of communication latency, throughput, computation efficiency and memory consumption.

关键词： MESSAGE-PASSING SYSTEM distributed-memory multiprocessor PARALLEL IMPLEMENTATION TRANSPUTER NETWORK MODELING PERFORMANCE EVALUATION

来源：评论

学校读者我要写书评

暂无评论

PARALLEL MATRIX-INVERSION ON A SUBCUBE-GRID

引用

PARALLEL COMPUTING 1993年第3期19卷 243-256页

作者： CHU, E GEORGE, A QUESNEL, D UNIV WATERLOO DEPT COMP SCIWATERLOO N2L 3G1ONTARIOCANADA

In this paper we propose a new medium-grain parallel algorithm for computing a matrix inverse on a hypercube multiprocessor. The algorithm implements Gauss-Jordan inversion with column interchanges. The hypercube network is configured as a two-dimensional subcube-grid to support submatrix partitionings. For some algorithms on some types of hypercubes, submatrix partitionings are known to have communication advantages not shared by partitions limited to rows or columns We show that such advantages can be extended to Gauss-Jordan inversion on an Intel iPSC/860, the most current third-generation of hypercubes, and that there is little extra programming effort to include it in the subcube-grid library used in various other matrix computations. An actual aggregate execution rate of 200 MFLOPS (Million Floating-point Operation Per Second) is achieved when inverting a 2000 X 2000 matrix (in double-precision Fortran 77) using 64 iPSC/860 processors configured as an 8 X 8 subcube-grid.

关键词： LINEAR ALGEBRA MATRIX INVERSION GAUSS-JORDAN ALGORITHM distributed-memory multiprocessor INTEL IPSC/1860

来源：评论

学校读者我要写书评

暂无评论

PARALLEL SPARSE LU DECOMPOSITION ON A MESH NETWORK OF TRANSPUTERS

引用

SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS 1993年第3期14卷 853-879页

作者： VANDERSTAPPEN, AF BISSELING, RH VANDEVORST, JGG KONINKLIJKE SHELL EXPTL PROD LAB 1003 AA AMSTERDAM NETHERLANDS

A parallel algorithm is presented for the LU decomposition of a general sparse matrix on a distributed-memory MIMD multiprocessor with a square mesh communication network. In the algorithm, matrix elements are assigned to processors according to the grid distribution. Each processor represents the nonzero elements of its part of the matrix by a local, ordered, two-dimensional linked-list data structure. The complexity of important operations on this data structure and on several others is analysed. At each step of the algorithm, a parallel search for a set of m compatible pivot elements is performed. The Markowitz counts of the pivot elements are close to minimum, to preserve the sparsity of the matrix. The pivot elements also satisfy a threshold criterion, to ensure numerical stability. The compatibility of the m pivots enables the simultaneous elimination of m pivot rows and m pivot columns in a rank-m update of the reduced matrix. Experimental results on a network of 400 transputers are presented for a set of test matrices from the Harwell-Boeing sparse matrix collection.

关键词： SPARSE MATRICES LU DECOMPOSITION PARALLEL ALGORITHMS distributed-memory multiprocessor TRANSPUTERS

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：