A class of Adams-type parallel hybrid multistep algorithms is constructed. A-stable formula of 3-step 3rd order, A(α)-stable formula of 4-step 4th order with α = 89.99° and A(α)-stable formula of 5-step 5th or...
详细信息
A class of Adams-type parallel hybrid multistep algorithms is constructed. A-stable formula of 3-step 3rd order, A(α)-stable formula of 4-step 4th order with α = 89.99° and A(α)-stable formula of 5-step 5th order with α = 84.92° are obtained. The numerical example shows that these methods are efficient for solving stiff ordinary equations.
In the area of parallelizing compilers, considerable research has been carried out on data dependency analysis, parallelism extraction, as well as program and data partitioning. However, designing a practical, low com...
详细信息
In the area of parallelizing compilers, considerable research has been carried out on data dependency analysis, parallelism extraction, as well as program and data partitioning. However, designing a practical, low complexity scheduling algorithm without sacrificing performance remains a challenging problem. A variety of heuristics have been proposed to generate efficient solutions but they take prohibitively long execution times for moderate size or large problems. In this paper, we propose an algorithm called FASTEST (Fast Assignment and Scheduling of Tasks using an Efficient Search Technique) that has O(e) time complexity, where e is the number of edges in the task graph. The algorithm first generates an initial solution in a short time and then refines it by using a simple but robust random neighborhood search. We have also parallelized the search to further lower the time complexity. We are using the algorithm in a prototype automatic parallelization and scheduling tool which compiles sequential code and generates parallel code optimized with judicious scheduling. The proposed algorithm is evaluated with several application programs and outperforms a number of previous algorithms by generating parallelized code with shorter execution times, while taking dramatically shorter scheduling times. The FASTEST algorithm generates optimal solutions for a majority of the test cases and close-to-optimal solutions for the rest.
We present a new parallel implementation of a divide and conquer algorithm for computing the spectral decomposition of a symmetric tridiagonal matrix on distributed memory architectures. The implementation we develop ...
详细信息
We present a new parallel implementation of a divide and conquer algorithm for computing the spectral decomposition of a symmetric tridiagonal matrix on distributed memory architectures. The implementation we develop differs from other implementations in that we use a two-dimensional block cyclic distribution of the data, we use the Lowner theorem approach to compute orthogonal eigenvectors, and we introduce permutations before the back transformation of each rank-one update in order to make good use of deflation. This algorithm yields the first scalable, portable, and numerically stable parallel divide and conquer eigensolver. Numerical results confirm the effectiveness of our algorithm. We compare performance of the algorithm with that of the QR algorithm and of bisection followed by inverse iteration on an IBM SP2 and a cluster of Pentium PIIs.
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to...
详细信息
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel single-instruction-stream/multiple-data-stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural acid efficient implementation on SIMD machines.
Some testing results on DAWNING-1000, Paragon and workstation cluster are described in this paper. On the home-made parallel system DAWNING-1000 with 32 computational processors, the practical performance of 1.117 Gfl...
详细信息
Some testing results on DAWNING-1000, Paragon and workstation cluster are described in this paper. On the home-made parallel system DAWNING-1000 with 32 computational processors, the practical performance of 1.117 Gflops and 1.58 Gflops has been measured in solving a dense linear system and doing matrix multiplication, respectively. The scalability is also investigated. The importance of designing efficient parallel algorithms for evaluating parallel systems is emphasized.
A parallel algorithm for solving meeting schedule problems is presented in this paper where the problem is NP-complete. The proposed system is composed of two maximum neural networks which interact with each other. On...
详细信息
A parallel algorithm for solving meeting schedule problems is presented in this paper where the problem is NP-complete. The proposed system is composed of two maximum neural networks which interact with each other. One is an M x S neural network to assign meetings to available time slots on a timetable where M and S are the number of meetings and the number of time slots, respectively The other is an M x P neural network to assign persons to the meetings where P is the number of persons. The simulation results show that the state of the system always converges to one of the solutions. Our empirical study shows that the solution quality of the proposed algorithm does not degrade with the problem size.
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. A layered PMC is a PMC in which all input nodes are in the external face, and the gates c...
详细信息
A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in the plane and that contains only AND and OR gates. A layered PMC is a PMC in which all input nodes are in the external face, and the gates can be assigned to layers in such a way that every wire goes between gates in successive layers. Goldschlager, Cook and Dymond, and others have developed NC2 algorithms to evaluate a layered PMC when the output node is in the same face as the input nodes. These algorithms require a large number of processors (Omega(n(6)), where n is the size of the input circuit). In this paper we give an efficient parallel algorithm that evaluates a layered PMC of size n in O (log(2)n) time using only a linear number of processors on an EREW PRAM. Our parallel algorithm is the best possible to within a polylog factor, and is a substantial improvement over the earlier algorithms for the problem.
A parallel method for globally minimizing a linear program with an additional reverse convex constraint is proposed which combines the outer approximation technique and the cutting plane method. Basically p (less than...
详细信息
A parallel method for globally minimizing a linear program with an additional reverse convex constraint is proposed which combines the outer approximation technique and the cutting plane method. Basically p (less than or equal to n) processors are used for a problem with a variables and a globally optimal solution is found effectively in a finite number of steps. Computational results are presented for test problems with a number of variables up to 80 and 63 linear constraints (plus nonnegativity constraints). These results were obtained on a distributed-memory MIMD parallel computer, DELTA, by running both serial and parallel algorithms with double precision. Also, based on 40 randomly generated problems of the same size, with 16 variables and 32 linear constraints (plus x greater than or equal to 0), the numerical results from different number processors are reported, including the serial algorithm's. (C) 1997 Academic Press.
An efficient L-0-stable parallel algorithm is developed for the two-dimensional diffusion equation with non-local time-dependent boundary conditions. The algorithm is based on subdiagonal Pade approximation to the mat...
详细信息
An efficient L-0-stable parallel algorithm is developed for the two-dimensional diffusion equation with non-local time-dependent boundary conditions. The algorithm is based on subdiagonal Pade approximation to the matrix exponentials arising from the use of the method of lines and may be implemented on a parallel architecture using two processors running concurrently with each processor employing the use of tridiagonal solvers at every time-step. The algorithm is tested on two model problems from the literature for which discontinuities between initial and boundary conditions exist. The CPU times together with the associated error estimates are compared.
暂无评论