The increasing computational load required by most applications and the limits in hardware performances affecting scientific computing contributed in the last decades to the development of parallel software and archit...
详细信息
The increasing computational load required by most applications and the limits in hardware performances affecting scientific computing contributed in the last decades to the development of parallel software and architectures. In fluid-structure interaction (FSI) for haemodynamic applications, parallelization and scalability are key issues (see [L. Formaggia, A. Quarteroni, and A. Veneziani, eds., Cardiovascular Mathematics: Modeling and Simulation of the Circulatory System, Modeling, Simulation and Applications 1, Springer, Milan, 2009]). In this work we introduce a class of parallel preconditioners for the FSI problem obtained by exploiting the block-structure of the linear system. We stress the possibility of extending the approach to a general linear system with a block-structure, then we provide a bound in the condition number of the preconditioned system in terms of the conditioning of the preconditioned diagonal blocks, and finally we show that the construction and evaluation of the devised preconditioner is modular. The preconditioners are tested on a benchmark three-dimensional (3D) geometry discretized in both a coarse and a fine mesh, as well as on two physiological aorta geometries. The simulations that we have performed show an advantage in using the block preconditioners introduced and confirm our theoretical results.
Given n points in the plane the planar dominance counting problem is to determine for each point the number of points dominated by it. Point p is said to dominate point q if x(q)x(p) and y(q)y(p), when x(p) and y(p) a...
详细信息
Given n points in the plane the planar dominance counting problem is to determine for each point the number of points dominated by it. Point p is said to dominate point q if x(q)x(p) and y(q)y(p), when x(p) and y(p) are the x? and y-coordinate of p, respectively. We present two CREW PRAM parallel algorithms for the problem, one running in O(log n loglog n) time and and the other in O(lognloglogn/logloglogn) time both using O(n) processors. Some applicationsare also given.
The Multi-Splitting (MS) iterative method, designed exclusively for multiprocessor environments, is considered for the solution of large systems of linear equations. A general parallel algorithm is devised and impleme...
详细信息
The Multi-Splitting (MS) iterative method, designed exclusively for multiprocessor environments, is considered for the solution of large systems of linear equations. A general parallel algorithm is devised and implemented on a modular two-level parallel architecture, which utilizes the systolic arrays as building blocks, to demonstrate the point iteration. A particular three-term member of the MS family is applied, for the parallel block iterative solution, on the Poisson's equation discretized by the collocation method.
The solution of linear systems continues to play an important role in scientific computing. The problems to be solved often are of very large size, so that solving them requires large computer resources. To solve thes...
详细信息
The solution of linear systems continues to play an important role in scientific computing. The problems to be solved often are of very large size, so that solving them requires large computer resources. To solve these problems, at least supercomputers with large shared memory or massive parallel computer systems with distributed memory are needed. This paper gives a survey of research on parallel implementation of various direct methods to solve dense linear systems. In particular are considered: Gaussian elimination, Gauss-Jordan elimination and a variant due to Huard (1979), and an algorithm due to Enright (1978), designed in relation to solving (stiff) ODEs, such that stepsize and other method parameters can easily be varied. Some theoretical results are mentioned, including a new result on error analysis of Huard's algorithm. Moreover, practical considerations and results of experiments on supercomputers and on a distributed-memory computer system are presented.
We consider the problem of computing in parallel all pairs of shortest paths in a general large-scale directed network of N nodes. A hierarchical network decomposition algorithm is provided that yields for an importan...
详细信息
We consider the problem of computing in parallel all pairs of shortest paths in a general large-scale directed network of N nodes. A hierarchical network decomposition algorithm is provided that yields for an important subclass of problems log N savings in computation time over the traditional parallel implementation of Dijkstra's algorithm. Error bounds are provided for the procedure and are illustrated numerically for a problem motivated by intelligent transportation systems.
We examine parallel algorithms for molecular dynamics simulations involving long-range induction interactions. The algorithms are tested by performing molecular dynamics simulations of water with an intermolecular pot...
详细信息
We examine parallel algorithms for molecular dynamics simulations involving long-range induction interactions. The algorithms are tested by performing molecular dynamics simulations of water with an intermolecular potential that explicitly includes contributions from pair, three-body and induction interactions. Both cyclic and balanced force decomposition methods are implemented to decompose the parallelizable components of induction, pair and three-body interactions using a message passing interface. We report that more than 90% of the induction calculation, and 98% of the total calculation can be effectively parallelized. A reasonably good speedup of 15.7 times and an efficiency of 49.1% are obtained on 32 processors with the balance force decomposition algorithm. (C) 2007 Elsevier B.V. All rights reserved.
We present parallel algorithms for the following four operations on red-black trees: construction, search, insertion, and deletion. Our parallel algorithm for constructing a red-black tree from a sorted list of n item...
详细信息
We present parallel algorithms for the following four operations on red-black trees: construction, search, insertion, and deletion. Our parallel algorithm for constructing a red-black tree from a sorted list of n items runs in O(1) time with n processors on the CRCW PRAM and runs in O(loglogn) time with n/loglogn processors on the EREW PRAM. Our construction algorithm does not require the assumptions that previous construction algorithms used. Each of our parallel algorithms for search, insertion, and deletion in red-black trees runs in O(logn + logk) time with k processors on the EREW PRAM, where k is the number of unsorted items to search for, insert, or delete and n is the number of nodes in a red-black tree. (C) 2001 Elsevier Science B.V. All rights reserved.
We consider the following communication problem: Each vertex of an undirected graph possesses a unique piece of information which must be sent to every other vertex in the graph. The mode of communication will be one-...
详细信息
We consider the following communication problem: Each vertex of an undirected graph possesses a unique piece of information which must be sent to every other vertex in the graph. The mode of communication will be one-way, point-to-point communication (i.e., one-way mail) in which one vertex may tell another everything it knows in a single transmission. We describe nearly optimal parallel algorithms for disseminating the messages in certain prominent families of graphs (e.g., trees and hypercubes), and consider the complexity of the problem for general graphs.
In this paper a parallel algorithm is given that, given a graph G = (V, E), decides whether G is a series parallel graph, and, if so, builds a decomposition tree for G of series and parallel composition rules. The alg...
详细信息
In this paper a parallel algorithm is given that, given a graph G = (V, E), decides whether G is a series parallel graph, and, if so, builds a decomposition tree for G of series and parallel composition rules. The algorithm uses O (log\E\log*\E\) time and O(\E\) operations on an EREW PRAM, and O (log \E\) time and O(\E\) operations on a CRCW PRAM. The results hold for undirected as well as for directed graphs. algorithms with the same resource bounds are described for the recognition of graphs of treewidth two, and for constructing tree decompositions of treewidth two. Hence efficient parallel algorithms can be found for a large number of graph problems on series parallel graphs and graphs with treewidth two. These include many well-known problems like all problems that can be stated in monadic second-order logic.
Finding common fixed points of a set of operators with some kind of contracting properties is a problem that has attracted much attention in the last decades. Probably the least demanding of such operators are the so-...
详细信息
Finding common fixed points of a set of operators with some kind of contracting properties is a problem that has attracted much attention in the last decades. Probably the least demanding of such operators are the so-called paracontractions, for which it is only needed that the image of a point under such operator is not farther away from the set of fixed points of the operator than the original point. In what follows, we present two parallel algorithms for finding common fixed points of a finite set of paracontractions;in the first one, the complete set of operators is involved at each iteration step;the second algorithm has a block-iterative nature.
暂无评论