The pairs in the set {(i, j)\1 less-than-or-equal-to i < j less-than-or-equal-to n} can be distributed into (n-1) sets, such that each set contains exactly n/2 disjoint pairs. We refer to these (n-1) sets as comple...
详细信息
The pairs in the set {(i, j)\1 less-than-or-equal-to i < j less-than-or-equal-to n} can be distributed into (n-1) sets, such that each set contains exactly n/2 disjoint pairs. We refer to these (n-1) sets as complete Jacobi-sets. Gao and Thomas have [4] developed a recursivealgorithm for exchanges of elements on a hypercube configuration to generate complete Jacobi sets. Inspired by the algorithm we present a block recursive algorithm to generate these sets continuously and return to its initial state after the last Jacobi set has been generated. In the process of the derivation of the BR algorithm, we developed some interesting combinatorial properties on the hypercube topology related to the Jacobi-sets. We include those results in this paper. An important application of the Jacobi-sets lies in the parallel implementation for Jacobi-type methods. The order in which the pairs (i, j) are chosen is important in defining these algorithms. We present in this paper a study on the effect of different orderings on the convergence of Jacobi-type methods. Our tests indicate that the ordering defined by our block recursive algorithm is as good as the odd-even ordering.
In this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framewo...
详细信息
In this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's two-level memory model in which data can be distributed using various cyclic(B) distributions in contrast to the normally used physical track distribution cyclic(B-d), where B-d is the physical disk block size. We first introduce tensor bases to capture the semantics of block-cyclic data distributions of out-of-core data and also data access patterns to out-of-core data. We then present program generation techniques for tensor products and matrix transposition. We accurately represent the number of parallel I/O operations required for the synthesized programs for tensor products and matrix transposition as a function of tensor bases and data distributions. We introduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedure of synthesizing efficient out-of-core programs for tensor product formulas with various block-cyclic distributions as a dynamic programming problem. We demonstrate the effectiveness of our approach through several examples. We show that the choice of an appropriate data distribution can reduce the number of passes to access out-of-core data by as large as eight times for a tensor product and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor product formulas.
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms. We first express a computational problem in its matrix form. Next, we formulate...
详细信息
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms. We first express a computational problem in its matrix form. Next, we formulate a matrix equation for the matrix of the computational problem. Then, we try to find a solution of the matrix equation such that the solution is composed of simple matrices. Finally, we recursively factorize the subproblem to obtain a tensor product formula representing an algorithm for the given problem. In this methodology, the operations of a tensor product formula can be mapped to language constructs of high-level programming languages. That is, we can generate computer programs, including programs for parallel computers and distributed-memory multiprocessors, from tensor product formulas. In this paper, we use the parallel prefix problem and the discrete Fourier transform problem as examples to illustrate the methodology and derive various parallel prefix and fast Fourier transform algorithms.
EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronec...
详细信息
ISBN:
(纸本)0818666056
EXTENT is an EXpert system for TENsor product formula Translation. In this paper we present a programming environment for automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high performance programs on various architectures. In this programming methodology, block recursive algorithms such as the fast Fourier transform and Strassen's matrix multiplication algorithm are expressed as tensor product formulas involving tensor product and other matrix operations. A tensor product formula can be systematically translated to parallel and/or vector code for various parallel architectures. A prototype system which generates programs for the Cray Y-MP, Cray T3D, and Intel Paragon has been developed. Performance results for some generated programs are presented.
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms on various computer networks. In our previous works, we propose a programming me...
详细信息
ISBN:
(纸本)0769516807
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms on various computer networks. In our previous works, we propose a programming methodology for designing block recursive algorithms on shared-memory and distributed-memory multiprocessors without considering the interconnection of processors. We extend the work to consider the block recursive algorithms on direct networks and multistage interconnection networks. We use parallel prefix computation as an example to illustrate the methodology. First, we represent the prefix computation problem as a computational matrix which may not be suitable for deriving algorithms on specific computer networks. In this methodology, we add two steps to derive tensor product formulas of parallel prefix algorithms on computer networks: (1)decompose the computational matrix into two submatrices, and (2) construct an augmented matrix. The augmented matrix can be factorized so that each term is a tensor product formula and can fit into a specified network topology. With the augmented matrix, the input data is also extended. It means, in addition to the input data, an auxiliary vector as temporary storage is used. The content of temporary storage is relevant to the decomposition of the original computational matrix. We present the methodology to derive various parallel prefix algorithms on hypercube, omega, and baseline networks and veri,, correctness of the resulting tensor product formulas using induction.
We present an algebraic theory based on tensor products for modeling direct interconnection networks. This algebraic theory has been used for designing and implementing blockrecursive numerical algorithms on shared-m...
详细信息
We present a tensor product formulation for Hilbert space-filling curves. Both recursive and iterative formulas are expressed in the paper. We view a Hilbert space-filling curve as a permutation which maps two-dimensi...
详细信息
We present a tensor product formulation for Hilbert space-filling curves. Both recursive and iterative formulas are expressed in the paper. We view a Hilbert space-filling curve as a permutation which maps two-dimensional 2(n) x 2(n) data elements stored in the row major or column major order to the order of traversing a Hilbert curve. The tensor product formula of Hilbert space-filling curves uses several permutation operations: stride permutation, radix-2 Gray permutation, transposition, and anti-diagonal transposition. The iterative tensor product formula can be manipulated to obtain the inverse Hilbert permutation. Also, the formulas are directly translated into computer programs which can be used in various applications including image processing, VLSI component layout, and R-tree indexing, etc.
暂无评论