The expectation-maximization (EM) algorithm is a well-known iterative algorithm for finding maximum likelihood estimates from incomplete data and is used in several statistical models with latent variables and missing...
详细信息
The expectation-maximization (EM) algorithm is a well-known iterative algorithm for finding maximum likelihood estimates from incomplete data and is used in several statistical models with latent variables and missing data. The algorithm also exhibits a monotonic increase in a likelihood function and satisfies parameter constraints for its convergence. The popularity of the EM algorithm can be attributed to its stable convergence, simple implementation and flexibility in interpreting data incompleteness. Despite these computational advantages, the algorithm is linear convergent and suffers from very slow convergence when a statistical model has many parameters and a high proportion of missing data. Various algorithms have been proposed to accelerate the convergence of the EM algorithm. We introduce the acceleration of the EM algorithm using root-finding and vector extrapolation algorithms. The root-finding algorithms include Aitken's method and the newton-raphson, quasi-newton and conjugate gradient algorithms. These algorithms with faster convergence rates allow the EM algorithm to be sped up. The vector extrapolation algorithms transform the sequence of estimates from the EM algorithm into a fast convergent sequence and can accelerate the convergence without modifying the EM algorithm. We describe the derivation of these acceleration algorithms and attempt to apply them to two examples. This article is categorized under: Statistical and Graphical Methods of Data Analysis > EM algorithm
Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixe...
详细信息
Background correction is an important preprocessing step for microarray data that attempts to adjust the data for the ambient intensity surrounding each feature. The "normexp" method models the observed pixel intensities as the sum of 2 random variables, one normally distributed and the other exponentially distributed, representing background noise and signal, respectively. Using a saddle-point approximation, Ritchie and others (2007) found normexp to be the best background correction method for 2-color microarray data. This article develops the normexp method further by improving the estimation of the parameters. A complete mathematical development is given of the normexp model and the associated saddle-point approximation. Some subtle numerical programming issues are solved which caused the original normexp method to fail occasionally when applied to unusual data sets. A practical and reliable algorithm is developed for exact maximum likelihood estimation (MLE) using high-quality optimization software and using the saddle-point estimates as starting values. "MLE" is shown to outperform heuristic estimators proposed by other authors, both in terms of estimation accuracy and in terms of performance on real data. The saddle-point approximation is an adequate replacement in most practical situations. The performance of normexp for assessing differential expression is improved by adding a small offset to the corrected intensities.
暂无评论