版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Zagreb Fac Sci Dept Math Zagreb 10000 Croatia
出 版 物:《SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS》 (SIAM J. Matrix Anal. Appl.)
年 卷 期:2021年第42卷第2期
页 面:635-658页
核心收录:
学科分类:07[理学] 070104[理学-应用数学] 0701[理学-数学]
基 金:Croatian Science Foundation [IP-2019-04-6268]
主 题:Prony's method parallel algorithm efficient GPU-CPU implementation numerical analysis
摘 要:Prony s method is a standard tool exploited for solving many imaging and data analysis problems that result in parameter identification in sparse exponential sums f(k) = Sigma(M)(j=1) c(j)(e-2 pi i ), k is an element of Z(d), where the parameters are pairwise different {t(j)}(j=1)(M) subset of [0, 1)(d), and{c(j)}(j=1)(M) subset of C\{0} are nonzero. The focus of our investigation is on a Prony s method variant based on a multivariate matrix pencil approach. The method constructs matrices S-1, ..., S-d from the sampling values, and their simultaneous diagonalization yields the parameters {t(j)}(j=1)(M). The parameters {c(j)}(j=1)(M) are computed as the solution of an linear least squares problem, where the matrix of the problem is determined by {tj}(j=1)(M). Since the method involves independent generation and manipulation of a certain number of matrices, there is an intrinsic capacity for parallelization of the whole computational process on several levels. Hence, we propose a parallel version of the Prony s method in order to increase its efficiency. The tasks concerning the generation of matrices are divided among the block of threads of the graphics processing unit (GPU) and the central processing unit (CPU), where heavier load is put on the GPU. From the algorithmic point of view, the CPU is dedicated to the more complex tasks of computing the singular value decomposition, the eigendecomposition, and the solution of the least squares problem, while the GPU is performing matrix-matrix multiplications and summations. With careful choice of algorithms solving the subtasks, the load between CPU and GPU is balanced. Besides the parallelization techniques, we are also concerned with some numerical issues, and we provide detailed numerical analysis of the method in case of noisy input data. Finally, we performed a set of numerical tests which confirm superior efficiency of the parallel algorithm and consistency of the forward error with the results of numeric