Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemente...
详细信息
Effective design of parallel matrix multiplication algorithms relies on the consideration of many interdependent issues based on the underlying parallel machine or network upon which such algorithms will be implemented, as well as, the type of methodology utilized by an algorithm. In this paper, we determine the parallel complexity of multiplying two (not necessarily square) matrices on parallel distributed-memory machines and/or networks. In other words, we provided an achievable parallel run-time that can not be beaten by any algorithm (known or unknown) for solving this problem. In addition, any algorithm that claims to be optimal must attain this run-time. In order to obtain results that are general and useful throughout a span of machines, we base our results on the well-known LogP model. Furthermore, three important criteria must be considered in order to determine the running time of a parallel algorithm;namely, (i) local computational tasks, (ii) the initial data layout, and (iii) the communication schedule. We provide optimality results by first proving general lower bounds on parallel run-time. These lower bounds lead to significant insights on (i)-(iii) above. In particular, we present what types of data layouts and communication schedules are needed in order to obtain optimal run-times. We prove that no one data layout can achieve optimal running times for all cases. Instead, optimal layouts depend on the dimensions of each matrix, and on the number of processors. Lastly, optimal algorithms are provided.
In many cases, an elliptical system of partial differential equations (PDEs) has to be solved in order to compute a given flow problem. For domain decomposition, mainly the multi-block grid approach is used. A variety...
详细信息
ISBN:
(纸本)1892512459
In many cases, an elliptical system of partial differential equations (PDEs) has to be solved in order to compute a given flow problem. For domain decomposition, mainly the multi-block grid approach is used. A variety of flows are unsteady, thus the calculation of path lines is a common way of exploring the flow field. However, computing path lines is more complicated if the underlying grid geometry changes over time. We make use of a fragmented multi-block dataset for a parallelization approach to compute path lines. We describe our enhancements of VTK, the used basic toolkit for scientific visualization, which neither supports multi-block nor time-dependent datasets. Our extensions include the handling of unsteady datasets as well as adaptive step-size control and time-position-interpolation. Finally, we depict the results of our efforts in order to speed-up Computational Fluid Dynamics (CFD) explorations in Virtual Environments.
parallel Monte Carlo methods are successful because particles are typically independent and easily distributed to multiple processors. For time-dependent Monte Carlo particle transport problem, due to the communicatio...
详细信息
ISBN:
(纸本)0769520189
parallel Monte Carlo methods are successful because particles are typically independent and easily distributed to multiple processors. For time-dependent Monte Carlo particle transport problem, due to the communication of each time-step about scattering source attribute and meshes, it reduces the parallel efficiency and limits enlarge of parallel scale. We research parallel computation of two types of time-dependent particle transport problems. Adaptive processor assignment in parallel computation and three parallel I/O models with low-cost communication are presented. The optimized processor choice is obtained. We propose a scheme that is based upon Monte Carlo layered sample technique. It is used to treat communication of scattering source. The parallel expandability is greatly improved. The larger speedups over the basic methods are obtained.
This paper proposes a new interconnection network, referred to as the arrangement-star network, which is constructed from the product of the star and arrangement networks. Studying this new network is motivated by the...
详细信息
This paper proposes a new interconnection network, referred to as the arrangement-star network, which is constructed from the product of the star and arrangement networks. Studying this new network is motivated by the good qualities it exhibits over its constituent networks, the star and arrangement networks. The star network has been a research focus for quite a long time until recently when the algorithm development on the star network turned out to be cumbersome. The arrangement network as a generalized class for the star network offers no solution in that direction. The arrangement-star network, on the other hand, makes it possible to efficiently embed grids, pipelines, as well as other computationally important topologies in a very natural manner. Furthermore, the fact that the product of the star and arrangement networks comes with little increase in the network diameter and a better result on communication cost, motivates further investigation for this new alternative, the arrangement-star network. (C) 2003 Elsevier Science B.V. All rights reserved.
作者:
Trelea, ICINA PG
UMR Genie & Microbiol Proc Alimentaires F-78850 Thiverval Grignon France
The particle swarm optimization algorithm is analyzed using standard results from the dynamic system theory. Graphical parameter selection guidelines are derived. The exploration-exploitation tradeoff is discussed and...
详细信息
The particle swarm optimization algorithm is analyzed using standard results from the dynamic system theory. Graphical parameter selection guidelines are derived. The exploration-exploitation tradeoff is discussed and illustrated. Examples of performance on benchmark functions superior to previously published results are given. (C) 2002 Elsevier Science B.V. All rights reserved.
We present a novel and powerful parallel algorithm for mining maximal frequent patterns, called Par-MinMax. It decomposes the search space by prefix-based equivalence classes, distributes work among the processors and...
详细信息
ISBN:
(纸本)3540200541
We present a novel and powerful parallel algorithm for mining maximal frequent patterns, called Par-MinMax. It decomposes the search space by prefix-based equivalence classes, distributes work among the processors and selectively duplicates databases in such a way that each processor can compute the maximal frequent patterns independently. It utilizes multiple level backtrack pruning strategy and other novel pruning strategies, along with vertical database format, counting frequency by simple tid-list intersection operation. These techniques eliminate the need for synchronization, drastically cutting down the I/O overhead. The analysis and experimental results demonstrate the superb efficiency of our approach in comparison with the existing work.
Preconditioning techniques based on ILU decomposition, on Frobenius norm minimization and on factorized sparse approximate inverse are considered. These algorithms are applied with conjugate gradient-type methods, nam...
详细信息
Preconditioning techniques based on ILU decomposition, on Frobenius norm minimization and on factorized sparse approximate inverse are considered. These algorithms are applied with conjugate gradient-type methods, namely Bi-CGSTAB, QMR and TFQMR for the solution of complex, large, sparse linear systems. The results of numerical experiments in scalar environment with matrices arising from transport in porous media, quantum chemistry, structural dynamics and electromagnetism are analysed. The preconditioner that appears most significant in parallel environment (based on factorized sparse approximate inverse) is then employed on a Cray T3E supercomputer. The experimental results show the satisfactory parallel performance of the proposed algorithm. Copyright (C) 2003 John Wiley Sons, Ltd.
This paper describes a newly proposed simple and efficient parallel algorithm for the construction of the Delaunay triangulation (DT) in E 2 by randomized incremental insertion. The construction of the DT is one of th...
详细信息
ISBN:
(纸本)158113861X
This paper describes a newly proposed simple and efficient parallel algorithm for the construction of the Delaunay triangulation (DT) in E 2 by randomized incremental insertion. The construction of the DT is one of the fundamental problems in computer graphics. The proposed algorithm is designed for parallel systems with shared memory and several processors. Such hardware (especially with two-processors) became available in the last few years thanks to low prices and at present, there is still a lack of parallel algorithms that are simple to implement and efficient enough to be an attractive alternative to long existing serial algorithms. The designed algorithm incorporates new method for synchronization among PEs based on the simple geometric test (i.e. if no other points lie in the circum-circle of accessed triangle, this triangle can be modified independently on others PEs). We implemented the algorithm in C++ and tested it on workstations up to four processors where we reached relatively good speed-up to our serial implementation. When only two processors were used we reached even super-linear speed-up.
The class of complex modified Korteweg-de Vriet (CMKdV) equations has many applications. One form of the CMKdV equation has been used to create models for the nonlinear evolution of plasma waves [5], for the propagati...
详细信息
ISBN:
(纸本)1892512416
The class of complex modified Korteweg-de Vriet (CMKdV) equations has many applications. One form of the CMKdV equation has been used to create models for the nonlinear evolution of plasma waves [5], for the propagation of transverse waves in a molecular chain [3], Another form of the CMKdV equation has been used for the traveling-wave and for a double homoclinic orbit [4]. In this paper we introduce sequential and parallel split-step Fourier methods for numerical simulations of the above-equation. The parallel methods are implemented on the Origin 2000 multiprocessor computer. Our numerical experiments have shown that these methods give considerable speedup.
A parallel algorithm for the solution of unsteady Euler equations on unstructured and moving meshes is developed. A cell-centered finite volume scheme is used. The temporal discretization involves an implicit time-int...
详细信息
A parallel algorithm for the solution of unsteady Euler equations on unstructured and moving meshes is developed. A cell-centered finite volume scheme is used. The temporal discretization involves an implicit time-integration scheme based on backward-Euler time differencing. The movement of the computational mesh is accomplished by means of a dynamically deforming mesh algorithm. The parallelization is based on decomposition of the domain into a series of subdomains with overlapped interfaces. The scheme is computationally efficient, time accurate, and stable for large time increments. Detailed descriptions of the solution algorithm are given, and computations for airflow around a NACA0012 airfoil and a missile configuration are presented to demonstrate the applications.
暂无评论