Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, cal...
详细信息
Unsupervised learning is an important problem in statistics and machine learning with a wide range of applications. In this paper, we study clustering of high-dimensional Gaussian mixtures and propose a procedure, called CHIME, that is based on the em algorithm and a direct estimation method for the sparse discriminant vector. Both theoretical and numerical properties of CHIME are investigated. We establish the optimal rate of convergence for the excess misclustering error and show that CHIME is minimax rate optimal. In addition, the optimality of the proposed estimator of the discriminant vector is also established. Simulation studies show that CHIME outperforms the existing methods under a variety of settings. The proposed CHIME procedure is also illustrated in an analysis of a glioblastoma gene expression data set and shown to have superior performance. Clustering of Gaussian mixtures in the conventional low-dimensional setting is also considered. The technical tools developed for the high-dimensional setting are used to establish the optimality of the clustering procedure that is based on the classical em algorithm.
This paper presents a unified method for influence analysis to deal with random effects appeared in additive nonlinear regression models for repeated measurement data. The basic idea is to apply the Q-function, the co...
详细信息
This paper presents a unified method for influence analysis to deal with random effects appeared in additive nonlinear regression models for repeated measurement data. The basic idea is to apply the Q-function, the conditional expectation of the complete-data log-likelihood function obtained from em algorithm, instead of the observed-data log-likelihood function as used in standard influence analysis. Diagnostic measures are derived based on the case-deletion approach and the local influence approach. Two real examples and a simulation study are examined to illustrate our methodology.
The multivariate Fay-Herriot model has been shown to be useful in various applications when there are multiple response variables. Therefore, several studies of the model have been pursued specially the study of estim...
详细信息
The multivariate Fay-Herriot model has been shown to be useful in various applications when there are multiple response variables. Therefore, several studies of the model have been pursued specially the study of estimation techniques of the variance components. The two benchmark methods for the variance component estimation are the profile maximum likelihood method and the residual maximum likelihood method. However, it has been shown in literature that these methods can produce zero estimates of the variance components. This leads to unfavorable results since the direct estimates do not contribute to the EBLUPs. In this paper, we propose alternative estimation methods based on the em algorithm for the variance components of the multivariate Fay-Herriot model. In our study, we illustrate the procedures of the em algorithms for the profile likelihood and the residual likelihood. Moreover, we perform a Monte Carlo simulation to investigate the performances of the proposed methods comparing with some existing methods. The simulation results suggest that the em algorithm improves those existing methods in certain cases particularly for the cases of small number of areas which are commonly found in applications. Finally, we apply the proposed estimates to the average household income and average household expenditure in Thailand.
In this paper, we focus on studying the Mixed-Effects State-Space (MESS) models previously introduced by Liu et al. [Liu D, Lu T, Niu X-F, et al. Mixed-effects state-space models for analysis of longitudinal dynamic s...
详细信息
In this paper, we focus on studying the Mixed-Effects State-Space (MESS) models previously introduced by Liu et al. [Liu D, Lu T, Niu X-F, et al. Mixed-effects state-space models for analysis of longitudinal dynamic systems. Biometrics. 2011;67(2):476-485]. We propose an estimation method by combining the auxiliary particle learning and smoothing approach with the Expectation Maximization (em) algorithm. First, we describe the technical details of the algorithm steps. Then, we evaluate their effectiveness and goodness of fit through a simulation study. Our method requires expressing the posterior distribution for the random effects using a sufficient statistic that can be updated recursively, thus enabling its application to various model formulations including non-Gaussian and nonlinear cases. Finally, we demonstrate the usefulness of our method and its capability to handle the missing data problem through an application to a real dataset.
The em algorithm can be used to compute maximum likelihood estimates of model parameters for skew-t mixture models. We show that the intractable expectations needed in the E-step can be written out analytically. These...
详细信息
The em algorithm can be used to compute maximum likelihood estimates of model parameters for skew-t mixture models. We show that the intractable expectations needed in the E-step can be written out analytically. These closed form expressions bypass the need for numerical estimation procedures, such as Monte Carlo methods, leading to accurate calculation of maximum likelihood estimates. Our approach is illustrated on two real data sets. (c) 2012 Elsevier B.V. All rights reserved.
In this paper, a mobile agent based distributed em (Expectation Maximization) algorithm is developed for density estimation and data clustering in sensor networks. It has been assumed that sensor measurements can be s...
详细信息
In this paper, a mobile agent based distributed em (Expectation Maximization) algorithm is developed for density estimation and data clustering in sensor networks. It has been assumed that sensor measurements can be statistically modeled by a common Gaussian mixture model. This algorithm not only executes the em algorithm in a distributed manner, but reduces the number of iterations of the em algorithm and increases its convergence rate. Convergence of the proposed method will also be studied analytically and will be shown that the estimated parameters will eventually converge to their true values. Finally, the proposed method will be applied to synthetic data sets in order to show its promising performance.
This paper proposes a new step called the P-step to handle the linear or nonlinear equality constraint in addition to the conventional em algorithm. This new step is easy to implement, first because only the first der...
详细信息
This paper proposes a new step called the P-step to handle the linear or nonlinear equality constraint in addition to the conventional em algorithm. This new step is easy to implement, first because only the first derivatives of the object function and the constraint function are necessary, and secondly, because the P-step is carried out after the conventional em algorithm. The estimate sequence produced by our method enjoys a monotonic increase in the observed likelihood function. We apply the P-step in addition to the conventional em algorithm to the two illustrative examples. The first example has a linear constraint function. The second has a nonlinear constraint function. We show finally that there exists a Kuhn-Tucker vector at the limit point produced by our method.
Maximum likelihood estimation of item parameters in the marginal distribution, integrating over the distribution of ability, becomes practical when computing procedures based on an em algorithm are used. By characteri...
详细信息
Maximum likelihood estimation of item parameters in the marginal distribution, integrating over the distribution of ability, becomes practical when computing procedures based on an em algorithm are used. By characterizing the ability distribution empirically, arbitrary assumptions about its form are avoided. The em procedure is shown to apply to general item-response models lacking simple sufficient statistics for ability. This includes models with more than one latent dimension.
This article presents a robust identification approach for nonlinear errors-in-variables (EIV) systems contaminated with outliers. In this work, the measurement noise is modelled using the t-distribution, instead of t...
详细信息
This article presents a robust identification approach for nonlinear errors-in-variables (EIV) systems contaminated with outliers. In this work, the measurement noise is modelled using the t-distribution, instead of the traditional Gaussian distribution, to mitigate the effect of the outliers. The heavier tails of the t-distribution, through the adjustable degrees of freedom, is used to account for noise and outliers concomitantly. Further, to avoid the intricacies related to the direct nonlinear identification, we propose to approximate the nonlinear EIV dynamics using multiple local ARX models and aggregating them using an exponential weighting strategy. The parameters of the local models and weighting parameters are estimated using the expectation maximization (em) algorithm, under the framework of the maximum likelihood estimation (MLE). The studies with simulated numerical examples and an experiment on a multi-tank system demonstrate the superiority of the proposed method. (C) 2017 Elsevier Ltd. All rights reserved.
The Expectation-Maximization (em) algorithm is a very popular technique for maximum likelihood estimation in incomplete data models. When the expectation step cannot be performed in closed form, a stochastic approxima...
详细信息
The Expectation-Maximization (em) algorithm is a very popular technique for maximum likelihood estimation in incomplete data models. When the expectation step cannot be performed in closed form, a stochastic approximation of em (SAem) can be used. Under very general conditions, the authors have shown that the attractive stationary points of the SAem algorithm correspond to the global and local maxima of the observed likelihood. In order to avoid convergence towards a local maxima, a simulated annealing version of SAem is proposed. An illustrative application to the convolution model for estimating the coefficients of the filter is given.
暂无评论