Coupled-cluster (CC) methods are now widely used in quantum chemistry to calculate the electron correlation energy and many other properties of atoms and molecules. In this paper we outline the basics of the theory, d...
详细信息
Coupled-cluster (CC) methods are now widely used in quantum chemistry to calculate the electron correlation energy and many other properties of atoms and molecules. In this paper we outline the basics of the theory, discuss some computational aspects, and review work that has been done toward developing and implementing algorithms for CC methods on parallel computers. (C) 2000 Elsevier Science B.V. All rights reserved.
The detection of circles in images is an important task in many computer vision applications. When the three parameters (center coordinates and radius) of a circle are quantized into O(n) values each, a sequential alg...
详细信息
The detection of circles in images is an important task in many computer vision applications. When the three parameters (center coordinates and radius) of a circle are quantized into O(n) values each, a sequential algorithm using the Hough transform runs with a time complexity of O(n4), where n x n is the size of the image. When information about the gradient direction is also used, the complexity of the sequential algorithm reduces to O(n3). This paper proposes three parallel algorithms for circle detection on an n x n mesh of processing elements operating in the SIMD mode. The first two algorithms use the Hough transform and the third is based on a tracing algorithm. The first algorithm uses only the gradient magnitude and takes O(n3) time. The second uses both the gradient magnitude and gradient direction and runs in O(n2) time. The third method uses a midpoint circle scan conversion algorithm and runs with a complexity of O(n2). This algorithm is the most efficient of the three. It does not use the gradient direction and offers an improvement of O(n2) over its sequential counterpart that runs in O(n4) time. When implemented with a table look-up operation, this algorithm has a low proportionality constant and offers a significant improvement in computational speed.
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Re cent research has identified several prom...
详细信息
The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Re cent research has identified several promising algorithms for implementing this method on massively parallel computers. however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations on a sphere: considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but we also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional fast Fourier transforms (FFTs) and other parallel transforms.
The solution of linear systems continues to play an important role in scientific computing. The problems to be solved often are of very large size, so that solving them requires large computer resources. To solve thes...
详细信息
The solution of linear systems continues to play an important role in scientific computing. The problems to be solved often are of very large size, so that solving them requires large computer resources. To solve these problems, at least supercomputers with large shared memory or massive parallel computer systems with distributed memory are needed. This paper gives a survey of research on parallel implementation of various direct methods to solve dense linear systems. In particular are considered: Gaussian elimination, Gauss-Jordan elimination and a variant due to Huard (1979), and an algorithm due to Enright (1978), designed in relation to solving (stiff) ODEs, such that stepsize and other method parameters can easily be varied. Some theoretical results are mentioned, including a new result on error analysis of Huard's algorithm. Moreover, practical considerations and results of experiments on supercomputers and on a distributed-memory computer system are presented.
The need to deal with imprecise and vague information in ontologies is rising in importance, as required by several real-world application domains. As a consequence, there is a growing interest in fuzzy ontologies, wh...
详细信息
The need to deal with imprecise and vague information in ontologies is rising in importance, as required by several real-world application domains. As a consequence, there is a growing interest in fuzzy ontologies, which combine ontologies and fuzzy logic theory. In fuzzy ontologies, some reasoning tasks usually become harder to solve, such as the concept subsumption problem and the computation of the Best Degree Bound (BDB) of an axiom. In fact, the current existing algorithms to solve these problems usually require performing some simpler tests several times. In this paper, we present a parallelization of these algorithms, implemented in the DeLorean reasoner, and discuss the encouraging results of an empirical evaluation.
The Multi-Splitting (MS) iterative method, designed exclusively for multiprocessor environments, is considered for the solution of large systems of linear equations. A general parallel algorithm is devised and impleme...
详细信息
The Multi-Splitting (MS) iterative method, designed exclusively for multiprocessor environments, is considered for the solution of large systems of linear equations. A general parallel algorithm is devised and implemented on a modular two-level parallel architecture, which utilizes the systolic arrays as building blocks, to demonstrate the point iteration. A particular three-term member of the MS family is applied, for the parallel block iterative solution, on the Poisson's equation discretized by the collocation method.
We present two parallel algorithms for finding all the roots of an N-degree polynomial equation on an efficient model of Optoelectronic Transpose Interconnection System (OTIS), called OTIS-2D torus. The parallel algor...
详细信息
We present two parallel algorithms for finding all the roots of an N-degree polynomial equation on an efficient model of Optoelectronic Transpose Interconnection System (OTIS), called OTIS-2D torus. The parallel algorithms are based on the iterative schemes of Durand-Kerner and Ehrlich methods. We show that the algorithm for the Durand-Kerner method requires (N (0.75)+0.5N (0.25)-1) electronic moves + 2(N (0.5)-1) OTIS moves using N processors. The parallel algorithm for Ehrlich method is shown to run in (N (0.75)+0.5N (0.25)-1) electronic moves + 2(N (0.5)-1) OTIS moves with the same number of processors. The algorithms have lower AT cost than the algorithms presented in Jana (parallel Comput 32:301-312, 2006). The scalability of the algorithms is also discussed.
In this paper, we introduce and evaluate the parallel implementations of two video sequences decorrelation algorithms having been developed based on the non-alternating three-dimensional wavelet transform (3D-WT) and ...
详细信息
In this paper, we introduce and evaluate the parallel implementations of two video sequences decorrelation algorithms having been developed based on the non-alternating three-dimensional wavelet transform (3D-WT) and the temporal-window method. The proposed algorithms have been proven to outperform the classic 3D-WT algorithm in terms of a better coding efficiency and lower computational requirements while enabling a lossless coding and a top-quality reconstruction: the two most highly relevant features to medical imaging applications. The parallel implementations of the algorithms are developed and tested on a shared memory system, a SGI Origin 3800 supercomputer, making use of a message-passing paradigm. We evaluate and analyze the performance of the implementations in terms of the response time and speed-up factor by varying the number of processors and various video coding parameters. The key point enabling the development of highly efficient implementations rely on a workload distribution strategy supplemented by the use of parallel I/O primitives, for better exploiting the inherent features of the application and computing platform. Two sets of I/O primitives are tested and evaluated: the ones provided by the C compiler and the ones belonging to the MPI/IO library.
The increasing computational load required by most applications and the limits in hardware performances affecting scientific computing contributed in the last decades to the development of parallel software and archit...
详细信息
The increasing computational load required by most applications and the limits in hardware performances affecting scientific computing contributed in the last decades to the development of parallel software and architectures. In fluid-structure interaction (FSI) for haemodynamic applications, parallelization and scalability are key issues (see [L. Formaggia, A. Quarteroni, and A. Veneziani, eds., Cardiovascular Mathematics: Modeling and Simulation of the Circulatory System, Modeling, Simulation and Applications 1, Springer, Milan, 2009]). In this work we introduce a class of parallel preconditioners for the FSI problem obtained by exploiting the block-structure of the linear system. We stress the possibility of extending the approach to a general linear system with a block-structure, then we provide a bound in the condition number of the preconditioned system in terms of the conditioning of the preconditioned diagonal blocks, and finally we show that the construction and evaluation of the devised preconditioner is modular. The preconditioners are tested on a benchmark three-dimensional (3D) geometry discretized in both a coarse and a fine mesh, as well as on two physiological aorta geometries. The simulations that we have performed show an advantage in using the block preconditioners introduced and confirm our theoretical results.
We present parallel algorithms for the following four operations on red-black trees: construction, search, insertion, and deletion. Our parallel algorithm for constructing a red-black tree from a sorted list of n item...
详细信息
We present parallel algorithms for the following four operations on red-black trees: construction, search, insertion, and deletion. Our parallel algorithm for constructing a red-black tree from a sorted list of n items runs in O(1) time with n processors on the CRCW PRAM and runs in O(loglogn) time with n/loglogn processors on the EREW PRAM. Our construction algorithm does not require the assumptions that previous construction algorithms used. Each of our parallel algorithms for search, insertion, and deletion in red-black trees runs in O(logn + logk) time with k processors on the EREW PRAM, where k is the number of unsorted items to search for, insert, or delete and n is the number of nodes in a red-black tree. (C) 2001 Elsevier Science B.V. All rights reserved.
暂无评论