The rational and suitability of implementing model fitting algorithms on vector and parallel computers are discussed. A particular maximum likelihood based algorithm for fitting of two-dimensional "Gaussian"...
详细信息
The rational and suitability of implementing model fitting algorithms on vector and parallel computers are discussed. A particular maximum likelihood based algorithm for fitting of two-dimensional "Gaussian" peaks was investigated in detail and adapted to a system of four transputers. Analysis of the algorithm shows it well suited to both vectorisation and parallelisation;this result being applicable to other fitting methods. The transputer implementation gave an increase in performance (5-10 times faster) compared to the host computer, allowing the model fitting procedure to approach acceptable response times.
Interval orders are partial orders defined by having interval representations. It is well known that a transitively oriented digraphGis an interval order iff its (undirected) complement G¯ is chordal. We investig...
详细信息
Interval orders are partial orders defined by having interval representations. It is well known that a transitively oriented digraphGis an interval order iff its (undirected) complement G¯ is chordal. We investigate parallel algorithms Tot the following scheduling problem: Given a system consisting of a setTofntasks (each requiring unit execution time) and an interval order ≺ overT, and given m identical parallel processors, construct an optimal (i.e., minimal length) schedule for (T, ≺).Our algorithm is based on a subroutine for computing so-called scheduling distances, i.e., the minimal number of time steps needed to schedule all those tasks succeeding some given tasktand preceding some other taskt1. For a given interval order with n tasks, these scheduling distances can be computed usingn3processors andO(log2n) time on a CREW-PRAM. We then give an incremental version of the scheduling distance algorithm, which can be used to compute the empty slots in an optimal schedule. From these, we derive the optimal schedule, using no more resources than for the initial scheduling distance computation and considerably improving on previous work by Sunder and *** algorithm can also be extended to handle task systems which, in addition to interval order precedence constraints, have individual deadlines and/or release times for the tasks. Our algorithm is the firstNC-algorithm for this problem. As another application, it also providesNC-algorithms for some graph problems on interval graphs (which areNP-complete in general).
A preconditioned Krylov iterative algorithm based on domain decomposition for linear systems arising from implicit finite-difference or finite-element discretizations of partial differential equation problems requirin...
详细信息
A preconditioned Krylov iterative algorithm based on domain decomposition for linear systems arising from implicit finite-difference or finite-element discretizations of partial differential equation problems requiring local mesh refinement is described. To keep data structures as simple as possible for parallel computing applications, the fundamental computational unit in the algorithm is defined as a subregion of the domain spanned by a locally uniform tensor-product grid, called a tile. In the tile-based domain decomposition approach, two levels of discretization are considered at each point of the domain: a global coarse grid defined by tile vertices only, and a local fine grid where the degree of resolution can vary from tile to tile. One global level and one local level provide the flexibility required to adaptively discretize a diverse collection of problems on irregular regions and solve them at convergence rates that deteriorate only logarithmically in the finest mesh parameter, with the coarse tessellation held fixed. A logarithmic departure from optimality seems to be a reasonable compromise for the simplicity of the composite grid data structure and concomitant regular data exchange patterns in a multiprocessor environment. Some experiments with up to 1024 tiles are reported, and the evolution of the algorithm is commented on and contrasted with optimal nonrefining two-level algorithms and optimal refining multilevel algorithms. Computational comparisons with some other popular methods are presented.
The performance of hash tables is analyzed in a parallel context. Assuming that a hash table of fixed size is allocated in the shared memory of a PRAM with n processors, a Ph-step is defined as a PRAM computation in w...
详细信息
The performance of hash tables is analyzed in a parallel context. Assuming that a hash table of fixed size is allocated in the shared memory of a PRAM with n processors, a Ph-step is defined as a PRAM computation in which each processor searches or inserts a key in the table. It is shown that the maximum number of table probes needed for a single key in a Ph-step is Ω( log 1/α n ) and O( log 1/α′ n ) with high probability, where α and α′ are the load factors before and after the execution of the Ph-step. However, a clever implementation of a Ph-step is proposed, which runs in time O(( log 1/α′ n) 1/2 ) with high probability. The algorithm exploits the fact that operations relative to different keys have different durations; hence, the processors in charge of shorter operations, once finished, are used to perform part of the longer ones.
This paper presents a complete and optimal framework for extending basic graph planning to operate in partitioned problem spaces. These spaces typically occur in systems that implement a hierarchy or contain data of v...
详细信息
This paper presents a complete and optimal framework for extending basic graph planning to operate in partitioned problem spaces. These spaces typically occur in systems that implement a hierarchy or contain data of various resolutions. An algorithm for the framework will be presented along with a proof of optimality. Finally, an example implementation for mobile robot path planning will be discussed.
The solutions to a scalar, homogeneous, constant-coefficient, linear recurrence are expressible in terms of the powers of a companion matrix. We show how to compute these powers efficiently via polynomial multiplicati...
详细信息
The solutions to a scalar, homogeneous, constant-coefficient, linear recurrence are expressible in terms of the powers of a companion matrix. We show how to compute these powers efficiently via polynomial multiplication. The result is a simple expression for the solution, which does not involve the characteristic roots and which is valid for any module over any commutative ring. The formula yields the nth term of the solution to a kth order recurrence with O(μ(k)
In this paper, we present an efficient parallel algorithm for computing the visibility region for a point in a plane among a non-intersecting set of segments. The algorithm is based on the cascading divide-and-conquer...
详细信息
In this paper, we present an efficient parallel algorithm for computing the visibility region for a point in a plane among a non-intersecting set of segments. The algorithm is based on the cascading divide-and-conquer technique and uses merge path to evenly distribute the workload between processors. We implemented the algorithm on NVIDIA's CUDA platform where it performed with a speedup up to 76x with respect to the serial CPU version.
Accurate simulations of real-life electromagnetic problems with integral equations require the solution of dense matrix equations involving millions of unknowns. Solutions of these extremely large problems cannot be e...
详细信息
Accurate simulations of real-life electromagnetic problems with integral equations require the solution of dense matrix equations involving millions of unknowns. Solutions of these extremely large problems cannot be easily achieved, even when using the most powerful computers with state-of-the-art technology. Hence, many electromagnetic problems in the literature have been solved by resorting to various approximation techniques, without controllable error. In this paper, we present full-wave solutions of scattering problems discretized with hundreds of millions of unknowns by employing a parallel implementation of the Multilevel Fast Multipole Algorithm. Various examples involving canonical and complicated objects, including scatterers larger than 1000 lambda, are presented, in order to demonstrate the feasibility of accurately solving large-scale problems on relatively inexpensive computing platforms.
A tabu search based approach is studied as a method for solving in parallel the two-dimensional irregular cutting problem. We use and compare different, variants of the method and various parallel computing systems. S...
详细信息
A tabu search based approach is studied as a method for solving in parallel the two-dimensional irregular cutting problem. We use and compare different, variants of the method and various parallel computing systems. Systems used are based on message passing or shared memory paradigm. parallel algorithms using both methods of communication are proposed. The efficiency of computer system utilization is discussed in the context of unpredictable time requirements of parallel tasks. We present results for different variants of the method together with efficiency measures for parallel implementations, where IBM SP2 and CRAY T3E systems, respectively, have been used.
In this paper a parallel implementation of an Adaptive Generalized Predictive Control (AGPC) algorithm is presented. Since the AGPC algorithm needs to be fed with knowledge of the plant transfer function, the parallel...
详细信息
In this paper a parallel implementation of an Adaptive Generalized Predictive Control (AGPC) algorithm is presented. Since the AGPC algorithm needs to be fed with knowledge of the plant transfer function, the parallelization of a standard Recursive Least Squares (RLS) estimator and a GPC predictor is discussed here. Also, since a matrix inversion operation is required in the GPC predictor algorithm, special attention is given to its parallelization. A small DSP network with up to 2 processors is used to investigate, the performance of the parallel implementation. To exploit an heterogeneous architecture the parallel algorithm is mapped over a network builded up of transputers as communication elements, and DSPs as computing elements. Further some heterogeneous topologies are compared. Execution times and efficiency results of the RLS and GPC steps are presented to show the performance of the parallel algorithm, over different topologies.
暂无评论