Penalized spline smoothing is a well-established, nonparametric regression method that is efficient for one and two covariates. Its extension to more than two covariates is straightforward but suffers from exponential...
详细信息
Penalized spline smoothing is a well-established, nonparametric regression method that is efficient for one and two covariates. Its extension to more than two covariates is straightforward but suffers from exponentially increasing memory demands and computational complexity, which brings the method to its numerical limit. Penalized spline smoothing with multiple covariates requires solving a large-scale, regularized least-squares problem where the occurring matrices do not fit into storage of common computer systems. To overcome this restriction, we introduce a matrix-free implementation of the conjugate gradient method. We further present a matrix-free implementation of a simple diagonal as well as more advanced geometric multigrid preconditioner to significantly speed up convergence of the conjugate gradient method. All algorithms require a negligible amount of memory and therefore allow for penalized spline smoothing with multiple covariates. Moreover, for arbitrary but fixed covariate dimension, we show grid independent convergence of the multigrid preconditioner which is fundamental to achieve algorithmic scalability.
A capacity to recognize speech offline eliminates privacy concerns and the need for an internet connection. Despite efforts to reduce the memory demands of speech recognition systems, these demands remain formidable a...
详细信息
A capacity to recognize speech offline eliminates privacy concerns and the need for an internet connection. Despite efforts to reduce the memory demands of speech recognition systems, these demands remain formidable and thus popular tools such as Kaldi run best via cloud computing. The key bottleneck arises form the fact that a bedrock of such tools, the Viterbi algorithm, requires memory that grows linearly with utterance length even when contained via beam search. A recent recasting of the Viterbi algorithm, SIEVE, eliminates the path length factor from space complexity, but with a significant practical runtime overhead. In this paper, we develop a variant of SIEVE that lessens this runtime overhead via beam search, retains the decoding quality of standard beam search, and waives its linearly growing memory bottleneck. This space-complexity reduction is orthogonal to decoding quality and complementary to memory savings in model representation and training.
Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estima...
详细信息
ISBN:
(纸本)9781450369763
Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estimation suffers from high computational cost and memory consumption. Any algorithm for computing the kth order spectra on a dataset of size n needs Omega(n(k-1)) time since the output size will be Omega(n(k-1)) as well, which makes the direct HOS analysis difficult for long time series, and further prohibits its direct deployment to resource-limited and time-sensitive applications. Existing algorithms for computing HOS are either inefficient or have been implemented on obsolete architectures. Thus it is essential to develop efficient generic algorithms for HOS estimations. In this paper, we present a package of generic sequential and parallel algorithms for computationally and memoryefficient HOS estimations which can be employed on any parallel machine or platform. Our proposed algorithms largely reduce the HOS' computational cost and memory usage in spectrum multiplication and smoothing steps through carefully designed prefix sum operations. Moreover, we employ a matrix partitioning technique and design algorithms with optimal memory usage and present the parallel approaches on the PRAM and the mesh models. Furthermore, we implement our algorithms for both bispectrum and trispectrum estimations. We conduct extensive experiments and cross-compare the proposed algorithms' performance. Results show that our algorithms achieve state-of-the-art computational and memory efficiency, and our parallel algorithms achieve close to linear speedups.
暂无评论