We present a hybrid OpenMP/Charm++ framework for solving the O(N) self-consistent-field eigenvalue problem with parallelism in the strong scaling regime, P >> N, where P is the number of cores, and N is a measur...
详细信息
We present a hybrid OpenMP/Charm++ framework for solving the O(N) self-consistent-field eigenvalue problem with parallelism in the strong scaling regime, P >> N, where P is the number of cores, and N is a measure of system size, i.e., the number of matrix rows/columns, basis functions, atoms, molecules, etc. This result is achieved with a nested approach to spectral projection and the sparse approximate matrix multiply [Bock and Challacombe, SIAM J. Sci. Comput., 35 (2013), pp. C72-C98], and involves a recursive, task-parallel algorithm, often employed by generalized N-Body solvers, to occlusion and culling of negligible products in the case of matrices with decay. Employing classic technologies associated with generalized N-Body solvers, including overdecomposition, recursive task parallelism, orderings that preserve locality, and persistence-based load balancing, we obtain scaling beyond hundreds of cores per molecule for small water clusters ([H2O](N), N is an element of {30, 90, 150}, P/N approximate to {819, 273, 164}) and find support for an increasingly strong scalability with increasing system size N.
We present an optimized single-precision implementation of the sparse approximate matrix multiply (SpAMM) [M. Challacombe and N. Bock, arXiv 1011.3534, 2010], a fast algorithm for matrix-matrix multiplication for matr...
详细信息
We present an optimized single-precision implementation of the sparse approximate matrix multiply (SpAMM) [M. Challacombe and N. Bock, arXiv 1011.3534, 2010], a fast algorithm for matrix-matrix multiplication for matrices with decay that achieves an O(n log n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a SpAMM tolerance below 2 x 10(-8) is lower than that of the single-precision general matrix-matrix multiply (SGEMM) for dense quantum chemical matrices, while outperforming SGEMM with a crossover already for small matrices (n similar to 1000). Relative to naive implementations of SpAMM using Intel's Math Kernel Library or AMD's Core Math Library, our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2x to 3x speedups.
Transmit diversity is an effective technique for combating fading in mobile wireless channels and improving the capacity of wireless *** the other hand,to cope with frequency selective fading channels,OFDM systems can...
详细信息
Transmit diversity is an effective technique for combating fading in mobile wireless channels and improving the capacity of wireless *** the other hand,to cope with frequency selective fading channels,OFDM systems can be applied to transform the channels into multiple flat fading channels and solve the multipath induced ISI ***,combining OFDM with transmit diversity can provide reliable and high-rate transmission in wide-area cellular *** the decoding of space-time or spacefrequency coded OFDM transmit diversity systems,efficient channel estimation methods have to be developed at the receiver to obtain the up-to-date channel state ***,channel estimation and tracking in OFDM transmit diversity systems suffers from high computation complexity due to the simultaneous estimation of channel parameters of all transmit *** this paper,we first investigate the issue of estimation bias in the reduced complexity algorithm [1].Two bias removal schemes will then be proposed by exploiting the temporal correlation of channel *** results show that the proposed methods can greatly reduce the estimation MSE of the reduced complexity algorithm for high frequency selective and fast fading channels.
暂无评论