the aim of this paper is to discuss and compare two embedding mechanisms used in graph grammars: a connection relation mechanism (introduced in Janssens and Rozenberg ( Inform. Sci 20 , 1980 , 191–216) and a stencil ...
the aim of this paper is to discuss and compare two embedding mechanisms used in graph grammars: a connection relation mechanism (introduced in Janssens and Rozenberg ( Inform. Sci 20 , 1980 , 191–216) and a stencil mechanism (introduced in Culik and Lindenmayer (Proceedings, 8th Hawaii conference on Systems Science, January 1975). In order to carry out such a comparison the use of a connection relation is modified to cover parallel rewriting (and the generation of node- and edge-labeled directed graphs) and the use of stencils is modified to cover sequential rewriting (and the generation of node-labeled undirected graphs).
Withthe advent of multi-core processors the problem of designing application that efficiently can utilize it performance become more and more important. Moreover developing programs for these processors requires from...
详细信息
ISBN:
(纸本)9783642143892
Withthe advent of multi-core processors the problem of designing application that efficiently can utilize it performance become more and more important. Moreover developing programs for these processors requires from the programmers some additional, specific knowledge about the processor architecture. In multi-core systems efficient program execution is the main issue. It can even happen that switching from sequential to parallel computation can lead to decreasing of performance. the paper deals withthe short description of SliCer, the hardware independent tool that parallelizes serial programs in automatic way depending on the number of available processing units by creating the proper number of threads that can be later execute in parallel.
the contribution deals withthe development of a 3-D finite-element package called GEM and its aspirations in demanding mathematical modelling and simulations arising in geosciences. On the background of two complex a...
详细信息
ISBN:
(纸本)9783642143892
the contribution deals withthe development of a 3-D finite-element package called GEM and its aspirations in demanding mathematical modelling and simulations arising in geosciences. On the background of two complex applications from the presently running projects, formulated as linear elasticity and thermo-elasticity problems, the most;important;characteristics, especially those of the solvers, are presented. Features related to high performance computing, including parallelprocessing, are focused on.
the paper describes results of minimax tree searching algorithm implemented within CUDA platform. the problem regards move choice strategy in the game of Reversi. the parallelization scheme and performance aspects are...
详细信息
ISBN:
(纸本)9783642143892
the paper describes results of minimax tree searching algorithm implemented within CUDA platform. the problem regards move choice strategy in the game of Reversi. the parallelization scheme and performance aspects are discussed, focusing mainly on warp divergence problem and data transfer size. Moreover, a method of minimizing warp divergence and performance degradation is described. the paper contains boththe results of test performed on multiple CPUs and GPUs. Additionally, it discusses alpha beta parallel pruning implementation.
the aim of this paper is to show that a kind of boundary value problem for second-order ordinary differential equations which reduces to the problem of solving tridiagonal system of linear equations with almost Toepli...
详细信息
ISBN:
(纸本)9783642143892
the aim of this paper is to show that a kind of boundary value problem for second-order ordinary differential equations which reduces to the problem of solving tridiagonal system of linear equations with almost Toeplitz structure can be efficiently solved on modern multicore architectures using a parallel tiled algorithm based on the divide and conquer approach for solving linear recurrence systems with constant coefficients and novel data formats for dense matrices.
In this paper, we propose an implementation of a parallelthree-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. the proposed parall...
详细信息
ISBN:
(纸本)9783642143892
In this paper, we propose an implementation of a parallelthree-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. the proposed parallelthree-dimensional FFT algorithm is based on the multicolumn FFT algorithm. We show that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. We successfully achieved a performance of over 401 G Flops on 256 nodes of Appro Xtrerne-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) for 256(3)-point EFT.
the eigenvalues and eigenvectors of a symmetric matrix are of interest in a myriad of applications. One of the fastest and most accurate numerical techniques for the eigendecomposition is the Algorithm of Multiple Rel...
详细信息
ISBN:
(纸本)9783642143892
the eigenvalues and eigenvectors of a symmetric matrix are of interest in a myriad of applications. One of the fastest and most accurate numerical techniques for the eigendecomposition is the Algorithm of Multiple Relatively Robust Representations (MRRR), the first stable algorithm that computes the eigenvalues and eigenvectors of a tridiagonal symmetric matrix in O(n(2)) arithmetic operations. In this paper we present a parallelization of the MRRR algorithm for data parallel coprocessors using the CUDA programming environment. the results demonstrate the potential of data-parallel coprocessors for scientific computations: compared to routine sstemr, LAPACK's implementation of MRRR, our parallel algorithm provides 10-fold speedups.
the MPI and OpetiMP implementations of the parallel simulated annealing algorithm solving the vehicle routing problem (VRPTW) are presented. the algorithm consists of a number of components which co-operate periodical...
详细信息
ISBN:
(纸本)9783642143892
the MPI and OpetiMP implementations of the parallel simulated annealing algorithm solving the vehicle routing problem (VRPTW) are presented. the algorithm consists of a number of components which co-operate periodically by exchanging their best solutions found to date. the objective of the work is to explore speedups and scalability of the two implementations. For comparisons the selected VRPTW benchmarking tests are used.
Large scale computing requires parallelization in order to arrive at solution at;reasonable time. Today parallelization is a standard in fluid problems simulation. On the other hand adaptation is a. technique that all...
详细信息
ISBN:
(纸本)9783642143892
Large scale computing requires parallelization in order to arrive at solution at;reasonable time. Today parallelization is a standard in fluid problems simulation. On the other hand adaptation is a. technique that allows for dynamic modification of the mesh as the need for locally higher resolution arises. Adaptation used during parallel simulation leads to unbalanced numerical load. this in turn decreases the efficiency of parallelization. Dynamic load balancing strategies should be applied in order to ensure proper parallelization efficiency. the paper presents the potential benefits of applying the dynamic load balancing to adaptive flow problems simulated in parallel environments.
An approach is presented permitting for extracting both affine and non-linear synchronization-free slices in program loops. It requires an exact dependence analysis. To describe and implement the approach, the depende...
详细信息
ISBN:
(纸本)9783642143892
An approach is presented permitting for extracting both affine and non-linear synchronization-free slices in program loops. It requires an exact dependence analysis. To describe and implement the approach, the dependence analysis by Pugh and Wonnacott was chosen where dependences are found in the form of tuple relations. the approach is based on operations on integer tuple relations and sets and it has been implemented and verified by means of the Omega project software. Results of experiments withthe UTDSP benchmark suite are discussed. Speed-up and efficiency of parallel code produced by means of the approach is studied.
暂无评论