A parallel algorithm for Kalman filtering with contaminated observations is developed. This algorithm is suitable for the parallel computer implementation allowing to treat dynamic linear systems with large number of ...
详细信息
A parallel algorithm for Kalman filtering with contaminated observations is developed. This algorithm is suitable for the parallel computer implementation allowing to treat dynamic linear systems with large number of state variables in a robust recursive way. The implementation is based on the square root version of the Kalman filter. It represents a great improvement over serial implementations reducing drastically computational costs for each state update and avoiding numerical instability problems.
For scientific codes to achieve good performance on computers with hierarchical memories, it is necessary that the ratio of memory references to arithmetic operations be low. In this paper, we show that Level 3 BLAS l...
详细信息
For scientific codes to achieve good performance on computers with hierarchical memories, it is necessary that the ratio of memory references to arithmetic operations be low. In this paper, we show that Level 3 BLAS linear algebra kernels can be used to satisfy this requirement to produce an efficient implementation of a parallel finite element solver on a shared memory parallel computer with a fast cache memory.
Nowadays, many computer facilities are constituted by a network of general-purpose workstations. The aim of this paper is to show how to combine the available resources of this network in order to deal efficiently wit...
详细信息
Nowadays, many computer facilities are constituted by a network of general-purpose workstations. The aim of this paper is to show how to combine the available resources of this network in order to deal efficiently with time-consuming image processing algorithms. It is shown how to distribute the processes, by using a specialized library, namely PVM (parallel Virtual Machine). An example is given: the LEG algorithm for codebooks optimization has been revisited in order to distribute efficiently the process. A major point has been to minimize the required communication bandwidth between the processors. Some adaptations are proposed in order to synchronize processors with different speeds (load balancing) better. An implementation giving to the process robustness against failures is also described.
The reconfigurable mesh captures salient features from a variety of sources, including the CAAPP, the CHiP, the polymorphic-torus network and the bus automaton. It consists of an array of processors interconnected by ...
详细信息
The reconfigurable mesh captures salient features from a variety of sources, including the CAAPP, the CHiP, the polymorphic-torus network and the bus automaton. It consists of an array of processors interconnected by a reconfigurable bus system. The bus system can be used to dynamically obtain various interconnection patterns between the processors. In this paper, we present a fast algorithm for computing the histogram of an N x N image with h grey levels in O(min{root h + log*(N / h), N}) time on an N x N reconfigurable mesh assuming each PE has a constant amount of local memory. This algorithm runs on the PARBUS and MRN/LRN models. In addition, histogram modification can be performed in O(root h) time on the same model. A variant of our algorithm runs in O(min{root h + loglog(N / h), N}) time on an N x N RMESH in which each PE has constant storage. This result improves the known time and memory bounds for histogramming on the RMESH model.
This paper presents efficient algorithms for updating moving octrees with real-time performance. The first algorithm works for octrees undergoing both translation and rotation motion;it works efficiently by compacting...
详细信息
This paper presents efficient algorithms for updating moving octrees with real-time performance. The first algorithm works for octrees undergoing both translation and rotation motion;it works efficiently by compacting source octrees into a smaller set of cubes (not necessarily standard octree cubes) as a precomputation step, and by using a Fast, exact cube/cube intersection test between source octree cubes and target octree cubes. A parallel version of the algorithm is also described. Finally, the paper presents an efficient algorithm for the more limited case of octree translation only. Experimental results are given to show the efficiency of the algorithms in comparison to competing algorithms. In addition to being fast, the algorithms presented are also space efficient in that they can produce target octrees in the linear octree representation.
A Viterbi algorithm is formally modified to select a set of k state sequences with top a posteriori probabilities, where k is a prespecified positive integer. A hypercube parallel algorithm is then developed along wit...
详细信息
A Viterbi algorithm is formally modified to select a set of k state sequences with top a posteriori probabilities, where k is a prespecified positive integer. A hypercube parallel algorithm is then developed along with a performance evaluation.
This paper presents a fast algorithm for corner detection based on the observation that the total curvature of the grey-level image is proportional to the second order directional derivative in the direction tangentia...
详细信息
This paper presents a fast algorithm for corner detection based on the observation that the total curvature of the grey-level image is proportional to the second order directional derivative in the direction tangential to edge normal, and inversely proportional to the edge strength (norm of the edge normal). This algorithm simply takes the difference of the second tangential derivative with the edge strength, where the first term is the cornerness measurement and the second is called a false corner suppression. A subpixel addressing mechanism (called linear interpolation) is utilized for intermediate pixel addressing in the differentiation step, which results in improved accuracy of corner localization and reduced computational complexity. The analysis of corner dislocation leads to a subpixel implementation. The corner finder is implemented on a hybrid parallel processor PARADOX with a performance of 14 frames/s for the vision algorithm Droid.
The order statistics problem is considered in this paper. We present a parallel algorithm to find the smallest (or largest) kth element in a set of N totally ordered (but not sorted) data items. This algorithm runs in...
详细信息
The order statistics problem is considered in this paper. We present a parallel algorithm to find the smallest (or largest) kth element in a set of N totally ordered (but not sorted) data items. This algorithm runs in O(log(2) N) expected time on a reconfigurable linear array with N processors and a constant amount of memory space in each processor. We also show that this algorithm can be generalized to process an oversized order statistics problem efficiently.
Controversy over the issue of geometric stiffening as it arises in the context of multibody dynamics revolves primarily around the ''correct'' methodology for incorporating the stiffening effect into d...
详细信息
Controversy over the issue of geometric stiffening as it arises in the context of multibody dynamics revolves primarily around the ''correct'' methodology for incorporating the stiffening effect into dynamics formulations, The main goal of this work is to present the different approaches that have been developed for this problem through an in depth review of several publications. The contribution is a precise understanding of the existing methods and how they relate to each other. The paper also offers some novel insights and clarifying interpretations. It concludes with a general classification and a numerical comparison of the approaches for modeling geometric stiffening in flexible-body systems.
This paper considers automatic restructuring of loops with conditional branching for parallel processing, especially a class of loops termed ''conditional cyclic loops.'' A conditional cyclic loop poss...
详细信息
This paper considers automatic restructuring of loops with conditional branching for parallel processing, especially a class of loops termed ''conditional cyclic loops.'' A conditional cyclic loop possesses a dependence cycle caused by conditional branching across loop iterations, which makes it difficult to parallelize. In general, parallel execution of a conditional cyclic loop provides little benefit due to the need of solving a full-order nonlinear Boolean recurrence relation. However, the Boolean recurrence in practice is often of simpler forms. With the simpler forms, the number of possible predicate values of conditional branching is reduced drastically compared to a general conditional cyclic loop. These simple forms of conditional cyclic loops found in practice can be parallelized for O(p/log p) speedup with p processors.
暂无评论