This paper describes the parallelisation of a state estimator with confidence limit analysis. State estimation involves the optimal fitting of an overdetermined set of measurements to the corresponding values calculat...
详细信息
This paper describes the parallelisation of a state estimator with confidence limit analysis. State estimation involves the optimal fitting of an overdetermined set of measurements to the corresponding values calculated from the mathematical model of the system. The inaccuracies associated with measurements lead to discrepancies within the state estimate. Consequently for the state estimation algorithm to be of practical use it needs to quantify the effect of these discrepancies in the form of state confidence limits [2], However, the quasi-quadratic numerical complexity of the state estimation algorithms suggests a need for parallel implementation of the probabilistic state estimation, so that the real-time performance may be maintained also for large-scale systems. The algorithm is based on the idea of ‘tearing’ the original system into subsystems and then coordination of the resulting subsystem solutions. The algorithm has been tested in the context of water distribution systems state estimation.
We present two fundamentally different approaches to detect collisions between two point clouds and compare their performance on multiple datasets. A collision between points happens if they are closer to each other t...
详细信息
We present two fundamentally different approaches to detect collisions between two point clouds and compare their performance on multiple datasets. A collision between points happens if they are closer to each other than a given threshold radius. One approach utilizes the main CPU with a k-d tree datastructure to efficiently carry out fixed range searches around points in 3D while the other mainly executes on a GPU using a regular grid decomposition technique implemented in the CUDA framework. We will show how massively parallel 3D range searches on a grid based datastructure on a GPU performs similarly well as a tree based approach on the CPU with orders of magnitude less parallelization. We also show how each method scales with varying input sizes and how they perform differently well depending on the spatial structure of the input data. (C) 2016, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.
In this paper we show that by using path following interior point methods with nonlogarithmic potential functions that vary inversely with the beta th power of the distances from the hyperplane (with beta = O(log v), ...
详细信息
In this paper we show that by using path following interior point methods with nonlogarithmic potential functions that vary inversely with the beta th power of the distances from the hyperplane (with beta = O(log v), it is possible to obtain an approximate bipartite matching with the number of edges within a factor of (1 - 1/p) of that in the optimal matching for arbitrarily specified p in O*(p) matrix inversions (O*(X) = O(X log(k) n), i.e., we ignore logarithmic factors of n in stating most bounds in this paper). At present the best-known logarithmic time parallel algorithm for finding an approximate matching is that for finding a maximal matching that contains at least half of the edges in the optimal matching by Karp and Wigderson [J. ACM, 32 (1985), pp, 762-773]. By combining the approximate matching algorithm discussed in this paper with an augmenting path algorithm it is possible to derive the optimal matching in O*(v(1/2)) time. The previous fastest parallel algorithms for general bipartite graphs are those by Vaidya [Proc. 22nd Ann. ACM Symp, Theory Computing, 1990, pp. 583-589], which runs in O*(ve)(1/4)) time and that by Goldberg, Plotkin, and Vaidya [Proc 29th IEEE Symp. Foundations of Computer Science, 1990, pp. 175-185], which obtains solutions in O*(v(2/3)) time.
Efficient algorithms for computing triangular decompositions of Hermitian matrices with small displacement rank using hyperbolic Householder matrices are derived. These algorithms can be both vectorized and paralleliz...
详细信息
Efficient algorithms for computing triangular decompositions of Hermitian matrices with small displacement rank using hyperbolic Householder matrices are derived. These algorithms can be both vectorized and parallelized. Implementations along with performance results on an Alliant FX/80, Cray X-MP/48, and Cray-2 are discussed. The use of Householder-type transformations is shown to improve performance for problems with nontrivial displacement ranks. In special cases, the general algorithm reduces to the well-known Schur algorithm for factoring Toeplitz matrices and Elden’s algorithm for solvig structured regularization problems. It gives a Householder formulation to the class of algorithms based on hyperbolic rotations studied by Kailath, Lev-Ari, Chun, and their colleagues for Hermitian matrices with small displacement structure. In addition, an extension to the efficient factorization of indefinite systems is described.
We solve an optimal control problem for controlled parabolic Ito equations by a stochastic quasigradient method. Because of high amounts of computation time required by numerical solution of such problems we investiga...
详细信息
We solve an optimal control problem for controlled parabolic Ito equations by a stochastic quasigradient method. Because of high amounts of computation time required by numerical solution of such problems we investigate the parallelization of the algorithm. We distribute the computations of space stages over several processor nodes of a parallel computer. We obtain an efficient algorithm with low communication cost by using a ring topology
Developing parallel codes for computing the nonlinear flow in multiaquifer porous systems is an important task both for improving model efficiency and for performing large real-life simulations. Multiaquifer systems c...
详细信息
Developing parallel codes for computing the nonlinear flow in multiaquifer porous systems is an important task both for improving model efficiency and for performing large real-life simulations. Multiaquifer systems consist of sandy and clayey alternating layers. In this paper, highly compressible multiaquifer systems are considered, where some hydraulic parameters depend on the potential head, thus the flow inside some layers is governed by nonlinear equations. An effective procedure for solving these equations is developed, relying upon The partition of the solution procedure into layer-wise steps. By assigning to each processor the computation of the flow inside a suitable set of layers, the iterative solution procedure can be efficiently implemented on a parallel super-computer. Using such a domain decomposition strategy, a satisfactory degree of parallelization is achieved when computing the flow in a realistic nonlinear multiaquifer system, employing a CRAY T3D massively parallel computer. Performing test simulations on real-life multiaquifer systems, the recorded speed-ups are as large as 1.89, 3.34. 5.37, with 2, 4, 8 processors, respectively. The importance of load balance and information exchange in casting the parallel performances of the code is also analyzed.
An efficient parallel approach for the computation of the eigenvalue of smallest absolute magnitude of sparse real and complex matrices is provided. The proposed strategy tries to improve the efficiency of the reverse...
详细信息
An efficient parallel approach for the computation of the eigenvalue of smallest absolute magnitude of sparse real and complex matrices is provided. The proposed strategy tries to improve the efficiency of the reverse power method. At each inverse power iteration the linear system is solved either by the conjugate gradient scheme (symmetric case) or by the Bi-CGSTAB method (symmetric case). Both solvers are preconditioned employing the approximate inverse factorization and thus are easily parallelized. The satisfactory speed-ups obtained on the CRAY T3E supercomputer show the high degree of parallelization reached by the proposed algorithm.
This paper presents a framework of usingresource metricsto characterize the various models of parallel computation. Our framework reflects the approach of recent models to abstract architectural details into several g...
详细信息
This paper presents a framework of usingresource metricsto characterize the various models of parallel computation. Our framework reflects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the different resource metrics chosen by different parallel models, categorizing the models into four classes: the basic synchronous models, and extensions of the basic models which more accurately reflect practical machines by incorporating notions of asynchrony, communication cost, and memory hierarchy. We then present a new parallel computation model, the LogP-HMM model, as an illustration of design principles based on the framework of resource metrics. The LogP-HMM model extends an existing parameterized network model (LogP) with a sequential hierarchical memory model (HMM) characterizing each processor. The result captures both network communication costs and the effects of multileveled memory such as local cache and I/O. More generally, the LogP-HMM is representative of a class of models formed by combining a network model with any of several existing hierarchical memory models. Along these lines we introduce a variant of the LogP-HMM model, the LogP-UMH, which combines the LogP with the Universal Memory Hierarchy (UMH) model. We examine the potential utility of both our models in the design of several near optimal FFT and sorting algorithms. We also examine the potential of the LogP-UMH to more accurately reflect parallel machines by matching the model to the CM-5 and IBM SP2.
A 4-subiteration parallel thinning algorithm, based on 3×3 operations, is proposed. It is shown that by taking into account bidirectional compression in each subiteration, pixels belonging to a pair of successive...
详细信息
A 4-subiteration parallel thinning algorithm, based on 3×3 operations, is proposed. It is shown that by taking into account bidirectional compression in each subiteration, pixels belonging to a pair of successive contours, a 4-contour and an 8-contour, are removed from the pattern in every iteration. Therefore, contour pixel removal proceeds towards the inner part of the pattern according to the octagonal metric. This provides a resulting medial line which is centered in the pattern in a quasi-Euclidean sense and is less sensitive to pattern rotation. The performance of the algorithm is discussed and compared with that of some well-known parallel algorithms.
Deadlock prevention for routing messages has a central role in communication networks, since it directly influences the correctness of parallel and distributed systems. In this paper, we extend some of the computation...
详细信息
Deadlock prevention for routing messages has a central role in communication networks, since it directly influences the correctness of parallel and distributed systems. In this paper, we extend some of the computational results presented in Second Colloquium on Structural Information and Communication Complexity (SIROCCO), Carleton University Press, 1995, pp. 1-12 on acyclic orientations for the determination of optimal deadlock-free routing schemes. In this context, minimizing the number of buffers needed to prevent deadlocks for a set of communication requests is related to finding an acyclic orientation of the network which minimizes the maximum number of changes of orientations on the dipaths realizing the communication requests. The corresponding value is called the rank of the set of dipaths. We first show that the problem of minimizing the rank is NP-hard if all shortest paths between the couples of nodes wishing to communicate have to be represented and even not approximable if only one shortest path between each couple has to tie represented. This last result holds even if we allow an error which is any sublinear function in the number of couples to be connected. We then improve some of the known lower and upper bounds on the rank of all possible shortest dipaths between any couple of vertices for particular topologies, such as grids and hypercubes, and we find tight results for tori. (C) 2002 Elsevier Science B.V. All rights reserved.
暂无评论