Like time complexity models that have significantly contributed to the analysis and development of fast algorithms, energy complexity models for parallel algorithms are desired as crucial means to develop energy effic...
详细信息
ISBN:
(纸本)9781509044573
Like time complexity models that have significantly contributed to the analysis and development of fast algorithms, energy complexity models for parallel algorithms are desired as crucial means to develop energy efficient algorithms for ubiquitous multicore platforms. Ideal energy complexity models should be validated on real multicore platforms and applicable to a wide range of parallel algorithms. However, existing energy complexity models for parallel algorithms are either theoretical without model validation or algorithm-specific without ability to analyze energy complexity for a widerange of parallel algorithms. This paper presents a new general validated energy complexity model for parallel (multithreaded) *** new model abstracts away possible multicore platforms by their static and dynamic energy of computational operations and data access, and derives the energy complexity of a given algorithm from its work, span and I/O complexity. The new model is validated by different sparse matrix vector multiplication (SpMV) algorithms and dense matrix multiplication (matmul) algorithms running on high performance computing (HPC) platforms (e.g., Intel Xeon and Xeon Phi). The new energy complexity model is able to characterize and compare the energy consumption of SpMV and matmul kernels according to three aspects: different algorithms, different input matrix types and different platforms. The prediction of the new model regarding which algorithm consumes more energy with different inputs on different platforms, is confirmed by the experimental results. In order to improve the usability and accuracy of the new model for a wide range of platforms, the platform parameters of ICE model are provided for eleven platforms including HPC, accelerator and embedded platforms.
Fault-tree analysis is a useful analytic tool for the reliability and safety of complex system. However, fault tree is not suitable for repairable system. In this paper, we will propose a new method called TTF (time t...
详细信息
ISBN:
(纸本)9781509002504
Fault-tree analysis is a useful analytic tool for the reliability and safety of complex system. However, fault tree is not suitable for repairable system. In this paper, we will propose a new method called TTF (time to failure) and TTM (time to maintenance) to analyze repairable system. Nevertheless, Monte Carlo simulation may be time consuming. In order to reduce simulation time, a parallel algorithm based on Spark will be used in this paper. Spark-MapReduce is the latest parallel computation framework. In situations where the amount of data is prohibitively large, we will propose a parallel algorithm for repairable system analysis to quickly get the simulation result. In this article, we propose a parallel algorithm to speed up through the experiment, we prove that the parallel algorithm has a superior performance on large scale models, and under Spark-MapReduce framework, researchers can concentrate on algorithm itself. It has significant benefits on reliability or availability assessment issues because it can free researchers, who are non-computer professional researchers, from parallelization and computational frame.
Performance of the PLASMA dense symmetric Eigensolver is optimized for large shared memory computer systems using multiple Householder domains for dense to band reduction and a communication reducing kernel for bulge ...
详细信息
ISBN:
(纸本)9781509052226
Performance of the PLASMA dense symmetric Eigensolver is optimized for large shared memory computer systems using multiple Householder domains for dense to band reduction and a communication reducing kernel for bulge chasing. The mr(3)-smp code by Petschow and Bientinesi is used for the tridiagonal eigensolution and the eigenvector back-transformations employ a 1D parallel decomposition. The input matrix, Householder vectors and scalars, are distributed among the CPU sockets with interleaved memory pages but the banded matrix, the eigenvectors, and temporary memory buffers are allocated and processed locally. Other considerations and optimization techniques also are presented. Numerical examples show the PLASMA eigensolver can out-perform ELPA and EIGENEXA significantly, for solving all the eigenpairs, if the problem size is sufficiently large, and the 2-stage eigensolution is generally better than its 1-stage counterpart on the latest x86_64 EP-4S CPUs with AVX2.
This paper achieves the parallel algorithm of fast multipole expansion method for electric field integral equation which is aimed to solve the electromagnetic scattering problems of some large metallic structures. We ...
详细信息
This paper achieves the parallel algorithm of fast multipole expansion method for electric field integral equation which is aimed to solve the electromagnetic scattering problems of some large metallic structures. We give the example of a spherical scattering structure to show the result of radar cross section. At first, this paper offers the discrete parallel algorithm boundary electric field integral equation under the frame of Galerkin Method, which is to expand the Green Function by multipole expansion and get the discrete linear equation system. Then, for the system of linear equations resulting from scattering the electric field integral equation, this paper proposes a parallel iterative algorithm based on conjugate gradient method. Finally, this paper will give a numerical example and reduce the complexity to O(N 1.5 ).
We present a new approach to the minimum-cost integral flow problem for small values of the flow. It reduces the problem to the tests of simple multivariate polynomials over a finite field of characteristic two for no...
详细信息
We present a new approach to the minimum-cost integral flow problem for small values of the flow. It reduces the problem to the tests of simple multivariate polynomials over a finite field of characteristic two for non-identity with zero. In effect, we show that a minimum-cost flow of value k in a network with n vertices, a sink and a source, integral edge capacities and positive integral edge costs polynomially bounded in n can be found by a randomized PRAM, with errors of exponentially small probability in n, running in O(klog(kn)+log(2)(kn)) time and using 2 (k) (kn) (O(1)) processors. Thus, in particular, for the minimum-cost flow of value O(logn), we obtain an RNC2 algorithm, improving upon time complexity of earlier NC and RNC algorithms.
We present a space and time efficient practical parallel algorithm for approximating the diameter of massive weighted undirected graphs on distributed platforms supporting a MapReduce-like abstraction. The core of the...
详细信息
ISBN:
(纸本)9781509021413
We present a space and time efficient practical parallel algorithm for approximating the diameter of massive weighted undirected graphs on distributed platforms supporting a MapReduce-like abstraction. The core of the algorithm is a weighted graph decomposition strategy generating disjoint clusters of bounded weighted radius. Theoretically, our algorithm uses linear space and yields a polylogarithmic approximation guarantee, moreover, for important practical classes of graphs, it runs in a number of rounds asymptotically smaller than those required by the natural approximation provided by the state-of-the-art Δ-stepping SSSP algorithm, which is its only practical linear-space competitor in the aforementioned computational scenario. We complement ourtheoretical findings with an extensive experimental analysis on large benchmark graphs, which demonstrates that our algorithm attains substantial improvements on a number of key performance indicators with respect to the aforementioned competitor, while featuring a similar approximation ratio (a small constant less than 1.4, as opposed to the polylogarithmic theoretical bound).
In the real world, many problems on massive graphs can be mapped to an underlying critical problem of discovering top-k subgraphs. For massive graphs, subgraph queries may have enormous number of matches, and so it is...
详细信息
ISBN:
(纸本)9781467390064
In the real world, many problems on massive graphs can be mapped to an underlying critical problem of discovering top-k subgraphs. For massive graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only top-k matches are desired. Meanwhile, parallel algorithm is urgent for the scalability of massive graph computing. In this paper, we address the challenges of top-k subgraph query in massive graph. Firstly, we present a new graph matching notion: “approximate graph simulation”. With approximate graph simulation, top-k subgraph query can be customized by appointing a weighted query graph, which provides good flexibility for different application scenarios. Secondly, we propose a parallel top-k subgraph query algorithm at the level of vertex. With such algorithm, each vertex in massive graph obtains its matching state separately without requiring global graph information. In the algorithm, we also design a filter mechanism to speed up the the computation and a aggregation mechanism to obtain top-k vertices for query focus. Using real-life datasets, we experimentally verify that our approach of parallel top-k subgraph query are efficient.
Library software implementing a parallel small-bulge multishift QR algorithm with Aggressive Early Deflation (AED) targeting distributed memory high-performance computing systems is presented. Starting from recent dev...
详细信息
Library software implementing a parallel small-bulge multishift QR algorithm with Aggressive Early Deflation (AED) targeting distributed memory high-performance computing systems is presented. Starting from recent developments of the parallel multishift QR algorithm [Granat et al., SIAM J. Sci. Comput. 32(4), 2010], we describe a number of algorithmic and implementation improvements. These include communication avoiding algorithms via data redistribution and a refined strategy for balancing between multishift QR sweeps and AED. Guidelines concerning several important tunable algorithmic parameters are provided. As a result of these improvements, a computational bottleneck within AED has been removed in the parallel multishift QR algorithm. A performance model is established to explain the scalability behavior of the new parallel multishift QR algorithm. Numerous computational experiments confirm that our new implementation significantly outperforms previous parallel implementations of the QR algorithm.
暂无评论