The Mixture of Gaussian Processes (MGP) is a powerful statistical model for characterizing multimodal data, but its conventional Expectation-Maximization (em) algorithm (Dempster et al., 1977) is computationally intra...
详细信息
The Mixture of Gaussian Processes (MGP) is a powerful statistical model for characterizing multimodal data, but its conventional Expectation-Maximization (em) algorithm (Dempster et al., 1977) is computationally intractable because of its time complexity. To solve this problem, some approximation techniques have been proposed in the conventional em algorithm. However, these approximate em algorithms are ineffective or limited in some situations. To implement the em algorithm more effectively, we approximate the em algorithm with simulated samples of latent variable via the Monte Carlo Markov Chain (MCMC) sampling, and design an MCMC em algorithm. Experiments on both synthetic and real-world data sets demonstrate that our MCMC em algorithm is more effective than the state-of-the-art em algorithms on classification and prediction problems. (C) 2018 Elsevier B.V. All rights reserved.
Constructing confidence interval (CI) for functions of cell probabilities (e.g., rate difference, rate ratio and odds ratio) is a standard procedure for categorical data analysis in clinical trials and medical studies...
详细信息
Constructing confidence interval (CI) for functions of cell probabilities (e.g., rate difference, rate ratio and odds ratio) is a standard procedure for categorical data analysis in clinical trials and medical studies. In the presence of incomplete data, existing methods could be problematic. For example, the inverse of the observed information matrix may not exist and the asymptotic CIs based on delta methods are hence not available. Even though the inverse of the observed information matrix exists, the large-sample delta methods are generally not reliable in small-sample studies. In addition, existing expectation-maximization (em) algorithm via the conventional data augmentation (DA) may suffer from slow convergence due to the introduction of too many latent variables. In this article, for r x c tables with incomplete data, we propose a novel DA scheme that requires fewer latent variables and this will consequently lead to a more efficient em algorithm. We present two bootstrap-type CIs for parameters of interest via the new em algorithm with and without the normality assumption. For r x c tables with only one incomplete/supplementary margin, the improved em algorithm converges in only one step and the associated maximum likelihood estimates can hence be obtained in closed form. Theoretical and simulation results showed that the proposed em algorithm outperforms the existing em algorithm. Three real data from a neurological study, a rheumatoid arthritis study and a wheeze study are used to illustrate the methodologies. (c) 2006 Elsevier B.V. All rights reserved.
Most of the researchers in the application areas usually use the em algorithm to find estimators of the normal mixture distribution with unknown component specific variances without knowing much about the properties o...
详细信息
Most of the researchers in the application areas usually use the em algorithm to find estimators of the normal mixture distribution with unknown component specific variances without knowing much about the properties of the estimators. It is unclear for which situations the em algorithm provides "good" estimators, good in the sense of statistical properties like consistency, bias, or mean square error. A simulation study is designed to investigate this problem. The scope of this study is set for the mixture model of normal distributions with component specific variance, while the number of components is fixed. The asymptotic properties of the em algorithm estimate is investigated in each situation. The results show that the em algorithm estimate does provide good asymptotic properties except for some situations in which the population means are quite close to each other and larger differences in the variances of the component distributions occur. (C) 2002 Elsevier Science B.V. All rights reserved.
We consider the problem of change-point in a classical framework while assuming a probability distribution for the change-point. An em algorithm is proposed to estimate the distribution of the change-point. A change-p...
详细信息
We consider the problem of change-point in a classical framework while assuming a probability distribution for the change-point. An em algorithm is proposed to estimate the distribution of the change-point. A change-point model for multiple profiles is also proposed, and em algorithm is presented to estimate the model. Two examples of Illinois traffic data and Dow Jones Industrial Averages are used to demonstrate the proposed methods.
Simple methods to choose sensible starting values for the em algorithm to get maximum likelihood parameter estimation in mixture models are compared. They are based on random initialization, using a classification em ...
详细信息
Simple methods to choose sensible starting values for the em algorithm to get maximum likelihood parameter estimation in mixture models are compared. They are based on random initialization, using a classification em algorithm (Cem), a Stochastic em algorithm (Sem) or previous short runs of em itself. Those initializations are included in a search/run/select strategy which can be compounded by repeating the three steps. They are compared in the context of multivariate Gaussian mixtures on the basis of numerical experiments on both simulated and real data sets in a target number of iterations. The main conclusions of those numerical experiments are the following. The simple random initialization which is probably the most employed way of initiating em is often outperformed by strategies using Cem, Sem or shorts runs of em before running em. Also, it appears that compounding is generally profitable since using a single run of em can often lead to suboptimal solutions. Otherwise, none of the experimental strategies can be regarded as the best one and it is difficult to characterize situations where a particular strategy can be expected to outperform the other ones. However, the strategy initiating em with short runs of em can be recommended. This strategy, which as far as we know was not used before the present study, has some advantages. It is simple, performs well in a lot of situations presupposing no particular form of the mixture to be fitted to the data and seems little sensitive to noisy data. (C) 2002 Elsevier Science B.V. All rights reserved.
In the recent work of Rodrigues et al. (2009), a flexible cure rate survival model was developed by assuming the number of competing causes of the event of interest to follow the Conway-Maxwell Poisson distribution. T...
详细信息
In the recent work of Rodrigues et al. (2009), a flexible cure rate survival model was developed by assuming the number of competing causes of the event of interest to follow the Conway-Maxwell Poisson distribution. This model includes as special cases some of the well-known cure rate models discussed in the literature. As the data obtained from cancer clinical trials are often subject to right censoring, the expectation maximization (em) algorithm can be used as a powerful and efficient tool for the estimation of the model parameters based on right censored data. In this paper, the cure rate model developed by Rodrigues et al. (2009) is considered and assuming the time-to-event to follow the exponential distribution, exact likelihood inference is developed based on the em algorithm. The inverse of the observed information matrix is used to compute the standard errors of the maximum likelihood estimates (MLEs). An extensive Monte Carlo simulation study is performed to illustrate themethod of inference developed here. Finally, the proposed methodology is illustrated with real data on cutaneous melanoma.
The em algorithm is a popular method for maximum likelihood estimation from incomplete. data. This method may be viewed as a proximal point method for maximizing the log-likelihood function using an integral form of t...
详细信息
The em algorithm is a popular method for maximum likelihood estimation from incomplete. data. This method may be viewed as a proximal point method for maximizing the log-likelihood function using an integral form of the Kullback-Leibler distance function. Motivated by this interpretation, we consider a proximal point method using an integral form of entropy-like distance function. We give a convergence analysis of the resulting proximal point method in the case where the cluster points lie in the interior of the objective function domain. This result is applied to a normal/independent example and a Gaussian mixture example to establish convergence of the em algorithm on these examples. Further convergence analysis of the method for maximization over an orthant is given in low dimensions. Sublinear convergence and schemes for accelerating convergence are also discussed.
An em algorithm (Dempster et al., 1977) is derived for the estimation of parameters of the truncated bivariate Poisson distribution with zeros missing from both margins. The observed information matrix is obtained and...
详细信息
An em algorithm (Dempster et al., 1977) is derived for the estimation of parameters of the truncated bivariate Poisson distribution with zeros missing from both margins. The observed information matrix is obtained and a numerical example is given where the convergence of the em algorithm is accelerated by the methods of Louis (1982) and conjugate gradients (Jamshidian and Jennrich, 1993).
The Photon Counting Histogram Expectation Maximization (PCH-em) algorithm has recently been reported as a candidate method for the characterization of Deep Sub-Electron Read Noise (DSERN) image sensors. This work desc...
详细信息
The Photon Counting Histogram Expectation Maximization (PCH-em) algorithm has recently been reported as a candidate method for the characterization of Deep Sub-Electron Read Noise (DSERN) image sensors. This work describes a comprehensive demonstration of the PCH-em algorithm applied to a DSERN capable quanta image sensor. The results show that PCH-em is able to characterize DSERN pixels for a large span of quanta exposure and read noise values. The per-pixel characterization results of the sensor are combined with the proposed Photon Counting Distribution (PCD) model to demonstrate the ability of PCH-em to predict the ensemble distribution of the device. The agreement between experimental observations and model predictions demonstrates both the applicability of the PCD model in the DSERN regime as well as the ability of the PCH-em algorithm to accurately estimate the underlying model parameters.
This paper considers the parameter identification problem for the images degraded by observation noise by applying the em algorithm. It is assumed that the image is described by a semicausal model due to Jain. By appl...
详细信息
This paper considers the parameter identification problem for the images degraded by observation noise by applying the em algorithm. It is assumed that the image is described by a semicausal model due to Jain. By applying the em algorithm to each scalar subsystem derived from the state-space model via the discrete sine transform (DST), we obtain a scheme of estimating the AR parameters of transformed image. A parameter identification algorithm of the original image model is also derived by using the least-squares (LS) method. As a by-product, the restored image is obtained in the course of parameter identification. Simulation studies are included to show the feasibility of the proposed algorithm. Zusammenfassung Das Problem der Parameter-Identification für Bilder, die durch Beobachtungsrauschen gestört sind, wird betrachtet; dabei wird der em-algorithmus angewandt. Es wird angenommen, daβ das Bild mit Hilfe eines semi-kausalen Modells nach Jain beschrieben werden kann. Der em-algorithmus wird auf jedes skalare Teilsystem angewandt, das über die diskrete Sinus-Transformation (DST) aus dem Zustandsraum-Modell gewonnen wird; hieraus erhalten wir ein Schema zur Schätzung der AR-Parameter des transformierten Bildes. Auch ein Parameter-Identifikationsansatz für das ursprüngliche Bild wird hergeleitet; dazu wird der LS-Ansatz verwendent. Als Nebenprodukt erhält man im Verlaufe der Parameterschätzung das restaurierte Bild. Simultations-Untersuchungen sind mit aufgenommen, um die Anwendbarkeit des vorgeschlagenen algorithmus' zu zeigen.
暂无评论