For solving systems of linear algebraic equations with block-tridiagonal matrices arising in geoelectrics problems, the parallel matrix sweep algorithm, conjugate gradient method with preconditioner, and square root m...
详细信息
For solving systems of linear algebraic equations with block-tridiagonal matrices arising in geoelectrics problems, the parallel matrix sweep algorithm, conjugate gradient method with preconditioner, and square root method are proposed and implemented numerically on multi-core CPU Intel with graphics processors NVIDIA. Investigation of efficiency and optimization of parallel algorithms for solving the problem with quasi-model data are performed. Crown Copyright (C) 2012 Published by Elsevier B.V. All rights reserved.
Modelling of salt transfer processes in fractal structured media has been considered on the base of fractional derivative equations with Caputo-Gerasimov derivatives with respect to space variables. Initial-boundary p...
详细信息
Modelling of salt transfer processes in fractal structured media has been considered on the base of fractional derivative equations with Caputo-Gerasimov derivatives with respect to space variables. Initial-boundary problem has been solved using locally one-dimensional finite difference scheme. Procedure of fractional derivative approximation has been proposed to lower computational complexity of solution process. parallel algorithms for distributed memory systems and GPU have been considered. Analysis of using one-dimensional and red-black data partitioning schemes is presented and new parametric scheme which have better characteristics in the determined conditions has been proposed.
Many approaches have been proposed for deriving tests from finite state machine (FSM) specifications with respect to some established coverage criteria. A fundamental core problem in FSM-based testing relates to the d...
详细信息
Many approaches have been proposed for deriving tests from finite state machine (FSM) specifications with respect to some established coverage criteria. A fundamental core problem in FSM-based testing relates to the derivation of input sequences that can distinguish states of an FSM specification, aka distinguishing sequences. A major effort in the construction of these sequences is based on the derivation of a successors search-tree labeled by sets of pairs of states of the given machine. We aim at reducing the time associated with such constructions through the use of state-of-the-art parallel technologies. Namely, we propose a parallel algorithm that we implement and evaluate on multicore CPUs and on many-core GPUs. We evaluate two alternative GPU implementations that use the CUDA and Thrust software platforms and a network of workstations based solution. The latter sports a workload partitioning based on Divisible Load Theory. A rigorous set of experiments highlights the differences of the proposed implementations in terms of execution time and speedup. [GRAPHICS] We aim at reducing the time associated with the construction of the successors of all state pairs of a given non-deterministic finite state machine. We propose a parallel algorithm that we implement and evaluate on multicore CPUs and on many-core GPUs. We evaluate two alternative GPU implementations that use the CUDA and Thrust software platforms. Additionally, we propose and evaluate a Network of Workstations solution based on Divisible Load Theory. A rigorous set of experiments highlights the differences of the proposed implementations in terms of execution time and speedup.
A parallel approach to contour extraction and coding on an Exclusive Read Exclusive Write (EREW) parallel Random Access Machine (PRAM) is presented and analyzed. The algorithm is intended for binary images. The labele...
详细信息
A parallel approach to contour extraction and coding on an Exclusive Read Exclusive Write (EREW) parallel Random Access Machine (PRAM) is presented and analyzed. The algorithm is intended for binary images. The labeled contours can be represented by lists of coordinates, and/or chain codes, and/or any other user designed codes. Using O( n 2 /log n ) processors, the algorithm runs in O(log n ) time, where, n by n is the size of the processed binary image.
This paper analyzes the performance of two parallel algorithms for solving the linear-quadratic optimal control problem arising in discrete-time periodic linear systems. The algorithms perform a sequence of orthogonal...
详细信息
This paper analyzes the performance of two parallel algorithms for solving the linear-quadratic optimal control problem arising in discrete-time periodic linear systems. The algorithms perform a sequence of orthogonal reordering transformations on formal matrix products associated with the periodic linear system and then employ the so-called matrix disk function to solve the resulting discrete-time periodic algebraic Riccati equations needed to determine the optimal periodic feedback. We parallelize these solvers using two different approaches, based on a coarse-grain and a medium-grain distribution of the computational load. The experimental results report the high performance and scalability of the parallel algorithms on a Beowulf cluster. (C) 2002 Elsevier Science (USA).
Binary addition and multiplication problems are very important as their time dominates computation time of any scientific or engineering problem. Simple algorithms are presented for these 2 problems which take only O...
详细信息
Binary addition and multiplication problems are very important as their time dominates computation time of any scientific or engineering problem. Simple algorithms are presented for these 2 problems which take only O(1) time and O(log n) time on a linear PARBS and n x 2n-PARBS respectively, in which each processor has only a constant number of gates and registers. It is believed that these algorithms could be an efficient design for the implementation of an adder and multiplier circuit in a single VLSI chip.
Pichat and Bohlender studied an algorithm for the rounding exact summation of floating point numbers which can be executed on any floating point arithmetic unit. We propose parallel versions of this algorithm, namely ...
详细信息
Pichat and Bohlender studied an algorithm for the rounding exact summation of floating point numbers which can be executed on any floating point arithmetic unit. We propose parallel versions of this algorithm, namely a pipeline version, an algorithm similar to the exchange methods for sorting and a tree-like algorithm, associating a tree to the sum. For all these algorithms we discuss the properties, a multiprocessor architecture should have for an efficient implementation of an algorithm without restricting us to a special architecture.
In this paper we show structural and algorithmic properties on the class of quasi-threshold graphs, or QT-graphs for short, and prove necessary and sufficient conditions for a QT-graph to be Hamiltonian. Based on thes...
详细信息
In this paper we show structural and algorithmic properties on the class of quasi-threshold graphs, or QT-graphs for short, and prove necessary and sufficient conditions for a QT-graph to be Hamiltonian. Based on these properties and conditions, we construct an efficient parallel algorithm for finding a Hamiltonian cycle in a QT-graph;for an input graph on n vertices and in edges, our algorithm takes O(log n) time and requires O(n + m) processors on the CREW PRAM model. In addition, we show that the problem of recognizing whether a QT-graph is a Hamiltonian graph and the problem of computing the Hamiltonian completion number of a nonHamiltonian QT-graph can also be solved in O(log n) time with O(n + in) processors. Our algorithms rely on O(log n)-time parallel algorithms, which we develop here, for constructing tree representations of a QT-graph;we show that a QT-graph G has a unique tree representation, that is, a tree structure which meets the structural properties of G. We also present parallel algorithms for other optimization problems on QT-graphs which run in O(log n) time using a linear number of processors. (C) 2003 Elsevier Inc. All rights reserved.
This paper proposes a parallel algorithm for robot path planning on a linear array with a reconfigurable pipelined bus system (LARPBS) through the construction of a Voronoi diagram on a binary image of the workspace. ...
详细信息
This paper proposes a parallel algorithm for robot path planning on a linear array with a reconfigurable pipelined bus system (LARPBS) through the construction of a Voronoi diagram on a binary image of the workspace. The algorithm is based on a d(4) distance metric, and it does not incur any additional time or processor requirements compared with those of a previously reported proposal (Tzionas et al., 1997). This paper recommends the same model as the simpler VLSI architecture for the problem in question.
Processing of logical data always requires special software and hardware tools. This is attributable to the specific features of the mathematical apparatus of the algebra of logical functions. Organization of parallel...
详细信息
Processing of logical data always requires special software and hardware tools. This is attributable to the specific features of the mathematical apparatus of the algebra of logical functions. Organization of parallel logical computation on the basis of the symbolic mathematical apparatus leads to complex logic programs. A different approach is proposed in this article. It is based on the matrix apparatus. Its use enables us to synthesize parallel and structurally homogeneous algorithms for the evaluation of directional logical derivatives of multivalued logic functions and implement their evaluation using standard matrixalgebra software or homogeneous computing systems. Homogeneous computing systems substantially accelerate the processing speed and can be built using VLSI technology.
The operation graphs of the proposed algorithms have the same configuration as the graphs of fast algorithms used in digital signal processing. This result makes it possible to use well-tried standard procedures of digital signal processing, which involve mapping of algorithms into homogeneous computing structures and hardware-software architectures.
暂无评论