We propose two parallel algorithms for the rounding exact evaluation of sums of products. The first method transforms products to sums and applies one of the known methods for rounding exact summation in time complexi...
详细信息
We propose two parallel algorithms for the rounding exact evaluation of sums of products. The first method transforms products to sums and applies one of the known methods for rounding exact summation in time complexity O( n 2 ) with n processors ( n denoting the “length” of the expression). The second method approximates the products as well as the sum and has average time complexity O( ld ( n )) for n /2 processors and has average time complexity O( n ) viewed as a sequential algorithm.
A maximum a posteriori (MAP) algorithm is presented for the estimation of spin-density and spin-spin decay distributions from frequency and phase-encoded magnetic resonance imaging data. Linear spatial localization gr...
详细信息
A maximum a posteriori (MAP) algorithm is presented for the estimation of spin-density and spin-spin decay distributions from frequency and phase-encoded magnetic resonance imaging data. Linear spatial localization gradients are assumed: the y-encode gradient applied during the phase preparation time of duration tau before measurement collection, and the x-encode gradient applied during the full data collection time t greater than or equal to 0, The MRT signal model developed in [22] is used in which a signal resulting from M phase encodes (rows) and N frequency encode dimensions (columns) is modeled as a superposition of MN sine-modulated exponentially decaying sinusoids with unknown spin-density and spin-spin decay parameters, The nonlinear least-squares MAP estimate of the spin density and spin-spin decay distributions solves for the 2MN spin-density and decay parameters minimizing the squared-error between the measured data and the sine-modulated exponentially decay signal model using an iterative expectation-maximization algorithm. A covariance diagonalizing transformation is derived which decouples the joint estimation of MN sinusoids into M separate N sinusoid optimizations, yielding an order of magnitude speed up in convergence, The MAP solutions are demonstrated to deliver a decrease in standard deviation of image parameter estimates on brain phantom data of greater than a factor of two over Fourier-based estimators of the spin density and spin-spin decay distributions. A parallel processor implementation is demonstrated which maps the N sinusoid coupled minimization to separate individual simple minimizations, one for each processor.
Computational methods based on the use of adaptively constructed nonuniform meshes reduce the amount of computation and storage necessary to perform many scientific calculations. The adaptive construction of such nonu...
详细信息
Computational methods based on the use of adaptively constructed nonuniform meshes reduce the amount of computation and storage necessary to perform many scientific calculations. The adaptive construction of such nonuniform meshes is an important part of these methods. In this paper, we present a parallel algorithm for adaptive mesh refinement that is suitable for implementation on distributed-memory parallel computers. Experimental results obtained on the Intel DELTA are presented to demonstrate that for scientific computations involving the finite element method, the algorithm exhibits scalable performance and has a small run time in comparison with other aspects of the scientific computations examined. It is also shown that the algorithm has a fast expected running time under the parallel random access machine (PRAM) computation model.
In this paper four parallel algorithms for the evaluation of finite series of orthogonal polynomials are introduced. The algorithms are based on the Forsythe and Clenshaw sequential algorithms. Several tests carried o...
详细信息
In this paper four parallel algorithms for the evaluation of finite series of orthogonal polynomials are introduced. The algorithms are based on the Forsythe and Clenshaw sequential algorithms. Several tests carried out on a Cray T3D are presented.
With the development of roof photovoltaic power (PV) generation technology and the increasingly urgent need to improve supply reliability levels in remote areas, islanded microgrid with photovoltaic and energy storage...
详细信息
With the development of roof photovoltaic power (PV) generation technology and the increasingly urgent need to improve supply reliability levels in remote areas, islanded microgrid with photovoltaic and energy storage systems (IMPE) is developing rapidly. The high costs of photovoltaic panel material and energy storage battery material have become the primary factors that hinder the development of IMPE. The advantages and disadvantages of different types of photovoltaic panel materials and energy storage battery materials are analyzed in this paper, and guidance is provided on material selection for IMPE planners. The time sequential simulation method is applied to optimize material demands of the IMPE. The model is solved by parallel algorithms that are provided by a commercial solver named CPLEX. Finally, to verify the model, an actual IMPE is selected as a case system. Simulation results on the case system indicate that the optimization model and corresponding algorithm is feasible. Guidance for material selection and quantity demand for IMPEs in remote areas is provided by this method. (C) 2016 Elsevier B.V. All rights reserved.
We give the first efficient parallel algorithms for solving the arrangement problem. We give a deterministic algorithm for the CREW PRAM which runs in nearly optimal bounds of O (log n log* n) time and n(2)/log n proc...
详细信息
We give the first efficient parallel algorithms for solving the arrangement problem. We give a deterministic algorithm for the CREW PRAM which runs in nearly optimal bounds of O (log n log* n) time and n(2)/log n processors. We generalize this to obtain an O (log n log* n)-time algorithm using n(d)/log n processors for solving the problem in d dimensions. We also give a randomized algorithm for the EREW PRAM that constructs an arrangement of n lines on-line, in which each insertion is done in optimal O (log n) time using n/log n processors. Our algorithms develop new parallel data structures and new methods for traversing an arrangement.
Three commonly used traversal methods for binary trees (forsets) are pre-order, in-order and post-order. It is well known that sequential algorithms for these traversals takes order O( N ) time where N is the total nu...
详细信息
Three commonly used traversal methods for binary trees (forsets) are pre-order, in-order and post-order. It is well known that sequential algorithms for these traversals takes order O( N ) time where N is the total number of nodes. This paper establishes a one-to-one correspondence between the set of nodes that possess right sibling and the set of leaf nodes for any forest. For the case of pre-order traversal, this result is shown to provide an alternate characterization that leads to a simple and elegant parallel algorithm of time complexity O(log N ) with or without read-conflicts on an N processor SIMD shared memory model, where N is the total number of nodes in a forest.
Random networks are widely used for modeling and analyzing complex processes. Many mathematical models have been proposed to capture diverse real-world networks. One of the most important aspects of these models is de...
详细信息
Random networks are widely used for modeling and analyzing complex processes. Many mathematical models have been proposed to capture diverse real-world networks. One of the most important aspects of these models is degree distribution. Chung-Lu (CL) model is a random network model, which can produce networks with any given arbitrary degree distribution. The complex systems we deal with nowadays are growing larger and more diverse than ever. Generating random networks with any given degree distribution consisting of billions of nodes and edges or more has become a necessity, which requires efficient and parallel algorithms. We present an MPI-based distributed memory parallel algorithm for generating massive random networks using CL model, which takes time with high probability and O(n) space per processor, where n, m, and P are the number of nodes, edges and processors, respectively. The time efficiency is achieved by using a novel load-balancing algorithm. Our algorithms scale very well to a large number of processors and can generate massive power-law networks with one billion nodes and 250 billion edges in one minute using 1024 processors.
In this work, we present the numerical results (using C++) obtained from seven different versions of the LU decomposition algorithms. Four of the algorithms use Crout-like reduction and three of the algorithms use Doo...
详细信息
In this work, we present the numerical results (using C++) obtained from seven different versions of the LU decomposition algorithms. Four of the algorithms use Crout-like reduction and three of the algorithms use Doolittle-like reduction. (C) 2004 Elsevier Inc. All rights reserved.
The circuit value update problem is the problem of updating values in a representation of a combinational circuit when some of the inputs are changed. We assume for simplicity that each combinational element has bound...
详细信息
The circuit value update problem is the problem of updating values in a representation of a combinational circuit when some of the inputs are changed. We assume for simplicity that each combinational element has bounded fan-in and fan-out and can be evaluated in constant time. This problem is easily solved on an ordinary serial computer in O(W + D) time, where W is the number of elements in the altered subcircuit and D is the subcircuit's embedded depth (its depth measured in the original circuit). In this paper we show how to solve the circuit value update problem efficiently on a P-processor parallel computer. We give a straightforward synchronous, parallel algorithm that runs in O(W/P + D1g P) expected time. Our main contribution, however, is an optimistic, asynchronous, parallel algorithm that runs in O(W/P + D + 1g W + 1g P) expected time, where W and D are the size and embedded depth, respectively, of the ''volatile'' subcircuit, the subcircuit of elements that have inputs which either change or glitch as a result of the update. To our knowledge, our analysis provides the first analytical bounds on the running time of an optimistic, asynchronous, parallel algorithm.
暂无评论