The report presents a developed approach to simulation of acoustic fields in enclosed media. This method is based on the use of Rayleigh’s integral for calculation of secondary sources generated by a wave falling ont...
详细信息
The report presents a developed approach to simulation of acoustic fields in enclosed media. This method is based on the use of Rayleigh’s integral for calculation of secondary sources generated by a wave falling onto media boundaries. The implementing algorithm is highly parallelizable, implies loosely coupled parallel branches with only few points of inter-thread communication. On the other hand, the algorithm is exponential upon an average number of reflections which occur to a single wave element emitted by a primary source, although for practical applications this number can be reduced enough to provide accurate results with reasonable time and space consumptions. The proposed algorithm is based on the approximate superposition of acoustical fields and provides adequate results, as long as the used equations of acoustics are linear. To calculate scattering properties of reflecting boundaries, the algorithm represents a geometric model of sound media propagation as a set of small flat vibrating pistons. Each wave element falling onto such a piston makes one radiate reflected sound in all directions and makes it possible to construct an algorithm which accepts sets of sources and reflecting surfaces. It also yields a field distribution over specified points such that each source, primary or secondary, can be associated with an element of parallel execution and be managed via a list of polymorphic sources implementing a task list. The report covers a mathematical formulation of the problem, defines an object model used to implement the algorithm, and provides some analysis of the algorithm in sequential and parallel forms.
Solving large, linear systems is among the most important and most frequently encountered problems in computational mathematics and computer science. This paper presents efficient parallel Jacobi and Gauss-Seidel algo...
详细信息
Solving large, linear systems is among the most important and most frequently encountered problems in computational mathematics and computer science. This paper presents efficient parallel Jacobi and Gauss-Seidel algorithms, in spite of the apparent inherent sequentiality of the latter, for the iterative solution of large linear systems on hypercube machines. To evaluate their performance, expressions for the speedup factor of the algorithms are derived. The results show that the hypercubes are highly effective in solving large systems of dense linear algebraic equations. Finally, the suitability of the hypercubes for solving sparse linear systems is discussed.
This paper describes and compares three parallel algorithms for solving sparse triangular systems of equations. These methods involve some preprocessing overhead and are primarily of interest in solving many systems w...
详细信息
This paper describes and compares three parallel algorithms for solving sparse triangular systems of equations. These methods involve some preprocessing overhead and are primarily of interest in solving many systems with the same coefficient matrix. The first approach is to use a fixed blocksize and form the inverse of the diagonal blocks. The second approach is to use a variable blocksize and reorder the unknowns so that the diagonal blocks are diagonal matrices. We call the latter technique level scheduling because of how it is represented in the adjacency graph, and we consider both row-wise and jagged diagonal storage for the offdiagonal blocks. These techniques are analyzed for general parallel computers and experiments are presented for the eight-processor Alliant FX/8.
We propose a parallel algorithm for the generation of colour textures based upon the non-linear equations of the "multiple class random neural network model". A neuron is used to obtain the texture value of ...
详细信息
We propose a parallel algorithm for the generation of colour textures based upon the non-linear equations of the "multiple class random neural network model". A neuron is used to obtain the texture value of each pixel in the bit-map plane. Each neuron interacts with its immediate planar neighbours in order to obtain the texture for the whole plane. A model which uses at most 4(C 2 + C) parameters for the whole network, where C is the number of colours, is proposed. Numerical iterations of the non-linear field equations of the neural network model, starting with a randomly generated image, are shown to produce textures having different desirable features such as granularity, inclination and randomness. The experimental evaluation shows that the random network provides good results, at a computational cost which is considerably less than that of other approaches such as Markov random fields.
Many physical systems may be described by large, sparse linear equations with incidence symmetric matrices ( e.g. power systems). The solution is often too slow because of the sheer size of the equations to be solved....
详细信息
Many physical systems may be described by large, sparse linear equations with incidence symmetric matrices ( e.g. power systems). The solution is often too slow because of the sheer size of the equations to be solved. Many parallel variations on the LU-factorisation algorithm have been proposed but none seem to give a speed-up greater than about three even when many processors are used. This paper proposes a Recursively parallel method which alleviates this drawback by exploiting more of the parallelism inherent in the problem. The method uses the factorisation tree and clusters factorisation tree nodes into meta-nodes which may be processed in parallel.
Given an array of n real numbers A=(a 0 , a 1 , …, a n-1 ), define MIN(i,j)= min {a i ,…,a j }. The range minima problem consists of preprocessing array A such that queries MIN(i,j), for any 0≤i≤n-1 can be answere...
详细信息
Given an array of n real numbers A=(a 0 , a 1 , …, a n-1 ), define MIN(i,j)= min {a i ,…,a j }. The range minima problem consists of preprocessing array A such that queries MIN(i,j), for any 0≤i≤n-1 can be answered in constant time. Range minima is a basic problem that appears in many other important graph problems such as lowest common ancestor, Euler tour, etc. In this work we present a parallel algorithm under the CGM model (coarse grained multicomputer), that solves the range minima problem in O(n/p) time and constant number of communication rounds. The communication overhead involves the transmission of p numbers (independent of n). We show promising experimental results with speedup curves approximating the optimal for large n.
LetBbe a set ofnbblue points and R a set of nrred points in the plane, where nb+ nr= n. A blue point b and a red point r can be matched if r dominates b, that is, if x(b) ≤ x(r) and y( b) ≤ y(r). We consider the pro...
详细信息
LetBbe a set ofnbblue points and R a set of nrred points in the plane, where nb+ nr= n. A blue point b and a red point r can be matched if r dominates b, that is, if x(b) ≤ x(r) and y( b) ≤ y(r). We consider the problem of finding a maximum cardinality matching between the points in B and the points in R. We give an adaptive parallel algorithm to solve this problem that runs in O(log2n) time using the CREW PRAM with O(n2+∊/log n) processors for some ∊,0 < ∊ < *** follows that finding the minimum number of colors to color a trapezoid graph can be solved within these resource bounds
The paper discusses the problem of the induction of a minimal nondeterministic finite automaton (NFA) consistent with a given set of examples and counterexamples. The main contribution of the paper is the proposal of ...
详细信息
The paper discusses the problem of the induction of a minimal nondeterministic finite automaton (NFA) consistent with a given set of examples and counterexamples. The main contribution of the paper is the proposal of an efficient parallel algorithm transforming the NFA induction problem into a family of constraint satisfaction problems (CSP). Two original techniques for fast CSPs evaluation are proposed and discussed. The efficacy of the parallel algorithm is evaluated experimentally on selected benchmarks. The proposed algorithm solves all analyzed benchmark examples, so called Tomita languages, in the time below half a minute, which should be considered an important achievement, as compared to our previous efforts which required minutes or hours to solve some of the aforementioned benchmarks.
In this paper we develop a parallel approach for computing the modularity clustering often used to identify and analyse communities in social networks. We show that modularity can be approximated by looking at the lar...
详细信息
In this paper we develop a parallel approach for computing the modularity clustering often used to identify and analyse communities in social networks. We show that modularity can be approximated by looking at the largest eigenpairs of the weighted graph adjacency matrix that has been perturbed by a rank one update. Also, we generalize this formulation to identify multiple clusters at once. We develop a fast parallel implementation for it that takes advantage of the Lanczos eigenvalue solver and k-means algorithm on the GPU. Finally, we highlight the performance and quality of our approach versus existing state-of-the-art techniques.
This paper discusses the possibilities for parallel processing of the Full- and Limited-Memory BFGS training algorithms, two powerful second-order optimization techniques used to train Multilayer Perceptrons. The step...
详细信息
This paper discusses the possibilities for parallel processing of the Full- and Limited-Memory BFGS training algorithms, two powerful second-order optimization techniques used to train Multilayer Perceptrons. The step size and gradient calculations are identified as the critical components in both. The matrix calculations in the Full-Memory algorithm are also shown to be significant for larger problems. Various strategies are considered for parallelisation, the best of which is implemented on PVM and transputer based architectures. The generation of a neural predictive model for a nonlinear chemical plant is used as a control case study to assess parallel performance in terms of achievable speed-up. The transputer implementation is found to give excellent speed-ups but the size of problem that can be trained is limited by memory constraints. On the other hand speed-ups achievable with the PVM implementation are much poorer because of inefficient communication, but memory does not pose a problem.
暂无评论