Non-parametric density estimation methods are more flexible than parametric methods, due to the fact that they do not assume any specific shape or structure for the data. Most non-parametric methods, like Kernel estim...
详细信息
Non-parametric density estimation methods are more flexible than parametric methods, due to the fact that they do not assume any specific shape or structure for the data. Most non-parametric methods, like Kernel estimation, require tuning of parameters to achieve good data smoothing, a non-trivial task, even in low dimensions. In higher dimensions, sparsity of data in local neighborhoods becomes a challenge even for non-parametric methods. In this paper, we use the copula transform and two efficient non-parametric methods to develop a new method for improved non-parametric density estimation in multivariate domain. After separation of marginal and joint densities using copula transform, a diffusion-based kernel estimator is employed to estimate the marginals. Next, bayesian sequential partitioning (BSP) is used in the joint density estimation.
Multivariate density estimation methods typically work well in low dimensions and their extension to data analytics in high dimensions domain has proven challenging. For density estimation in high-dimensional big data...
详细信息
Multivariate density estimation methods typically work well in low dimensions and their extension to data analytics in high dimensions domain has proven challenging. For density estimation in high-dimensional big data domains, the non-parametric bayesian sequential partitioning (BSP) algorithm provides an efficient way of partitioning the sample space, based on bayesian inference. In this paper, we present a detailed analysis of BSP and provide a computationally efficient copula-transformed data structure and algorithm for use in density estimation for data analytics in high dimensions. Using the copula-transformed data structure, we implement the density estimation for marginals in both BSP and kernel density estimation (KDE) methods. The data structures and algorithm are suitably designed for most efficient rendering into parallel processing paradigms of open multi-processing (OPENMP(R)) and message passing interface (MPI).
Density estimation is a fundamental part of statistical analysis and data mining. In high-dimensional domains, parametric methods and the commonly used non-parametric methods like histograms or Kernel estimators fail ...
详细信息
ISBN:
(纸本)9781467368537
Density estimation is a fundamental part of statistical analysis and data mining. In high-dimensional domains, parametric methods and the commonly used non-parametric methods like histograms or Kernel estimators fail to perform properly. In this paper, we present computationally efficient data structures for efficient implementation of the bayesian sequential partitioning (BSP), as a framework for density estimation in high-dimensional domain. Simulation results are presented to analyze the performance for large high-dimensional datasets.
暂无评论