Embedding algorithms for nonlinear systems of equations construct a continuous family of systems, and solve the given system by tracking the continuous curve of solutions to the family. Solving nonlinear equations by ...
详细信息
Embedding algorithms for nonlinear systems of equations construct a continuous family of systems, and solve the given system by tracking the continuous curve of solutions to the family. Solving nonlinear equations by a globally convergent embedding algorithm requires the evaluation and factoring of a Jacobian matrix at many points along the embedding curve. This paper describes how to optimize the evaluation of the Jacobian matrix on a hypercube. Several static and dynamic strategies for assigning components of the Jacobian to processors on the hypercube are investigated, and it is found that a static rectangular grid mapping is the preferred choice for inclusion in a robust parallel mathematical software package. The static linear mapping is a viable alternative when there are many common subexpressions in the component evaluation, while the dynamic assignment strategy should only be considered when there is large variation in the evaluation times for the components, leading to a load imbalance on the processors.
We present the first efficient parallel algorithms for recognizing some subclasses of circular arc graphs including GAMMA circular arc graphs and proper interval graphs. These algorithms run in 0(log2 n) time with 0(n...
详细信息
We present the first efficient parallel algorithms for recognizing some subclasses of circular arc graphs including GAMMA circular arc graphs and proper interval graphs. These algorithms run in 0(log2 n) time with 0(n3) processors on a CRCW PRAM. An intersection representation can also be constructed within the same resource bounds. Furthermore, we propose some new characterizations of THETA circular arc graphs and proper interval graphs.
In this paper we give a parallel algorithm for constructing the Voronoi diagram of a polygonal scene, i.e., a set of line segments in the plane such that no two segments intersect except possibly at their endpoints. O...
详细信息
In this paper we give a parallel algorithm for constructing the Voronoi diagram of a polygonal scene, i.e., a set of line segments in the plane such that no two segments intersect except possibly at their endpoints. Our algorithm runs in O(log2 n) time using O(n) processors in the CREW PRAM model.
A parallel algorithm for finding triconnected components on a CRCW PRAM is presented. The time complexity of the algorithm is 0(log n), and the processor-time product is 0 ((m + n) log log n), where n is the number of...
详细信息
A parallel algorithm for finding triconnected components on a CRCW PRAM is presented. The time complexity of the algorithm is 0(log n), and the processor-time product is 0 ((m + n) log log n), where n is the number of vertices and m is the number of edges of the input graph. The algorithm, like other parallel algorithms for this problem, is based on open ear decomposition, but it uses a new technique, local replacement, to improve the complexity. Only the need to use the subroutines for connected components and integer sorting, for which no optimal parallel algorithm that runs in 0(log n) time is known, prevents the algorithm from achieving optimality.
For the numerical solution of mildly nonlinear elliptic boundary value problems in a rectangle, the author already proposed in [14,15] a technique (PDFS) based on a fast method applied to the linear systems induced by...
详细信息
For the numerical solution of mildly nonlinear elliptic boundary value problems in a rectangle, the author already proposed in [14,15] a technique (PDFS) based on a fast method applied to the linear systems induced by a modified Picard iteration. This paper describes the implementation of the PDFS algorithm on a M X N grid using any distributed-memory multiprocessor in which a ring of p < min (M, N) processors can be embedded. The parallel PDFS features an adjustable parameter that controls the granularity of the algorithm and can be tuned to a specific architecture. A theoretical analysis shows that PDFS achieves a speedup of p in the limit of large problems sizes. Experimental timings of an implementation on a ring of transputers confirm the theoretical time model.
A class of parallel implicit Runge-Kutta formulas is constructed for multiprocessor system. A family of parallel implicit two-stage fourth order Runge-Kutta formulas is given. For these formulas, the convergence is pr...
详细信息
A class of parallel implicit Runge-Kutta formulas is constructed for multiprocessor system. A family of parallel implicit two-stage fourth order Runge-Kutta formulas is given. For these formulas, the convergence is proved and the stability analysis is given. The numerical examples demonstrate that these formulas can solve an extensive class of initial value problems for the ordinary differential equations.
This paper presents performance comparisons of seven neural network models on traffic control problems in multistage interconnection networks. The decay term, three neuron models, and two heuristics were evaluated. Th...
详细信息
This paper presents performance comparisons of seven neural network models on traffic control problems in multistage interconnection networks. The decay term, three neuron models, and two heuristics were evaluated. The goal of the traffic control problems is to find conflict-free switching configurations with the maximum throughput. Our simulation results show that the hysteresis McCulloch-Pitts neuron model without the decay term and with two heuristics has the best performance.
This work was aimed at investigating the suitability of parallel processing for the class of alternating direction implicit techniques on a tightly coupled, shared-memory, parallel architecture. One such technique, th...
详细信息
This work was aimed at investigating the suitability of parallel processing for the class of alternating direction implicit techniques on a tightly coupled, shared-memory, parallel architecture. One such technique, that of Beam and Warning was parallelized on a Sequent Symmetry S81 parallel computer. A significant impediment in achieving near-theoretical speed-ups with implicit techniques is the requirement for processor synchronization at one or more stages during a time integration. The need for synchronization was eliminated by the careful use of load balancing, phase splitting, and processor routing. This in turn enabled the factored implicit technique to achieve higher speedups than possible with standard parallel compiler implementations. The highest speedups resulted from more complex equations, larger size grids, and more stringent convergence criteria. The principles presented in this work are general and should be applicable to other implicit techniques with minor technique-dependent alterations.
In this paper we propose an improved algorithm for the parallel LU decomposition of an (m + 1)-banded upper Hessenberg matrix on a shared memory multi-processor, which requires O(2nm2/p) parallel operations, where n i...
详细信息
In this paper we propose an improved algorithm for the parallel LU decomposition of an (m + 1)-banded upper Hessenberg matrix on a shared memory multi-processor, which requires O(2nm2/p) parallel operations, where n is the dimension of the matrix and p is the number of processors. We show that for the special case of tridiagonal matrices this algorithms has a lower operation count than those in the literature and yields the best existing algorithm for the solution of tridiagonal systems of equations.
We present a fast new dominant point detection algorthm which does adaptive computation to detect the dominant points. The algorithm starts by initially computing four dominant points and their regions of support. The...
详细信息
We present a fast new dominant point detection algorthm which does adaptive computation to detect the dominant points. The algorithm starts by initially computing four dominant points and their regions of support. Then it recursively proceeds to extract the other dominant points.
暂无评论