The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method call be useful when we call make first a...
详细信息
The objective of this paper is to present a method which can accommodate certain types of missing data by using the quasi-likelihood function for the complete data. This method call be useful when we call make first and second moment, assumptions only;in addition, it can be helpful when the em algorithm applied to the actual likelihood becomes overly complicated. First we derive a loss function for the observed data using an exponential family density which has the same mean and variance structure of the complete data. This loss function is the counterpart of the quasi-deviance for the observed data. Then the loss function is minimized using the em algorithm. The use of the em algorithm guarantees a decrease in the loss function at every iteration. When the observed data call be expressed as a deterministic linear transformation of the complete data, or when data are missing completely at random, the proposed method yields consistent estimators. Examples are given for overdispersed polytomous data, linear random effects models, and linear regression with missing covariates. Simulation results for the linear regression model with missing covariates show that the proposed estimates are more efficient than estimates based on completely observed units, even when outcomes are bimodal or skewed.
When using gamma -ray coded-mask cameras, one does not get a direct image as in classical optical cameras but the correlation of the mask response with the source. Therefore the data must be mathematically treated in ...
详细信息
When using gamma -ray coded-mask cameras, one does not get a direct image as in classical optical cameras but the correlation of the mask response with the source. Therefore the data must be mathematically treated in order to reconstruct the original sky sources. Generally this reconstruction is based on linear methods, such as correlating the detector plane with a reconstruction array, or non-linear ones such as iterative or maximization methods (i.e. the em algorithm). The latter have a better performance but they increase the computational complexity by taking a lot of time to reconstruct an image. Here we present a method for speeding up such kind of algorithms by making use of a neural network with a back-propagation learning rule.
The Expectation Maximization (em) algorithm is widely used in latent variable model inference. However, when data are distributed across various locations, directly applying the em algorithm can often be impractical d...
详细信息
The Expectation Maximization (em) algorithm is widely used in latent variable model inference. However, when data are distributed across various locations, directly applying the em algorithm can often be impractical due to communication expenses and privacy considerations. To address these challenges, a communication-efficient distributed em algorithm is proposed. Under mild conditions, the proposed estimator achieves the same mean squared error bound as the centralized estimator. Furthermore, the proposed method requires only one extra round of communication compared to the Average estimator. Numerical simulations and a real data example demonstrate that the proposed estimator significantly outperforms the Average estimator in terms of mean squared errors.
Rubin and Thayer (Psychometrika, 47:69-76, 1982) proposed the em algorithm for exploratory and confirmatory maximum likelihood factor analysis. In this paper, we prove the following fact: the em algorithm always gives...
详细信息
Rubin and Thayer (Psychometrika, 47:69-76, 1982) proposed the em algorithm for exploratory and confirmatory maximum likelihood factor analysis. In this paper, we prove the following fact: the em algorithm always gives a proper solution with positive unique variances and factor correlations with absolute values that do not exceed one, when the covariance matrix to be analyzed and the initial matrices including unique variances and inter-factor correlations are positive definite. We further numerically demonstrate that the em algorithm yields proper solutions for the data which lead the prevailing gradient algorithms for factor analysis to produce improper solutions. The numerical studies also show that, in real computations with limited numerical precision, Rubin and Thayer's (Psychometrika, 47:69-76, 1982) original formulas for confirmatory factor analysis can make factor correlation matrices asymmetric, so that the em algorithm fails to converge. However, this problem can be overcome by using an em algorithm in which the original formulas are replaced by those guaranteeing the symmetry of factor correlation matrices, or by formulas used to prove the above fact.
Iterative reweighting (IR) is a popular method for computing M-estimates of location and scatter in multivariate robust estimation. When the objective function comes from a scale mixture of normal distributions the it...
详细信息
Iterative reweighting (IR) is a popular method for computing M-estimates of location and scatter in multivariate robust estimation. When the objective function comes from a scale mixture of normal distributions the iterative reweighting algorithm can be identified as an em algorithm. The purpose of this paper is to show that in the special case of the multivariate t-distribution, substantial improvements to the convergence rate can be obtained by modifying the em algorithm.
The established general results on convergence properties of the em algorithm require the sequence of em parameter estimates to fall in the interior of the parameter space over which the likelihood is being maximized....
详细信息
The established general results on convergence properties of the em algorithm require the sequence of em parameter estimates to fall in the interior of the parameter space over which the likelihood is being maximized. This paper presents convergence properties of the em sequence of likelihood values and parameter estimates in constrained parameter spaces for which the sequence of em parameter estimates may converge to the boundary of the constrained parameter space contained in the interior of the unconstrained parameter space. Examples of the behavior of the em algorithm applied to such parameter spaces are presented.
In epidemics of infectious diseases such as influenza, an individual may have one of four possible final states: prior immune, escaped from infection, infected with symptoms, and infected asymptomatically. The exact s...
详细信息
In epidemics of infectious diseases such as influenza, an individual may have one of four possible final states: prior immune, escaped from infection, infected with symptoms, and infected asymptomatically. The exact state is often not observed. In addition, the unobserved transmission times of asymptomatic infections further complicate analysis. Under the assumption of missing at random, data-augmentation techniques can be used to integrate out such uncertainties. We adapt an importance-sampling-based Monte Carlo Expectation-Maximization (MCem) algorithm to the setting of an infectious disease transmitted in close contact groups. Assuming the independence between close contact groups, we propose a hybrid em-MCem algorithm that applies the MCem or the traditional em algorithms to each close contact group depending on the dimension of missing data in that group, and discuss the variance estimation for this practice. In addition, we propose a bootstrap approach to assess the total Monte Carlo error and factor that error into the variance estimation. The proposed methods are evaluated using simulation studies. We use the hybrid em-MCem algorithm to analyze two influenza epidemics in the late 1970s to assess the effects of age and preseason antibody levels on the transmissibility and pathogenicity of the viruses.
This study investigated the performance of multiple imputations with Expectation-Maximization (em) algorithm and Monte Carlo Markov chain (MCMC) method in missing data imputation. We compared the accuracy of imputatio...
详细信息
This study investigated the performance of multiple imputations with Expectation-Maximization (em) algorithm and Monte Carlo Markov chain (MCMC) method in missing data imputation. We compared the accuracy of imputation based on sonic real data and set up two extreme scenarios and conducted both empirical and simulation studies to examine the effects of missing data rates and number of items used for imputation. In the empirical Study, the scenario represented item of highest missing rate from a domain with fewest items. In the simulation Study, we selected a domain with most items and the item imputed has lowest missing rate. In the empirical study, the results showed there was no significant difference between em algorithm and MCMC method for item imputation, and number of items used for imputation has little impact, either. Compared with the actual observed values, the middle responses of 3 and 4 were over-imputed, and the extreme responses of 1, 2 and 5 were under-represented. The similar patterns occurred for domain imputation, and no significant difference between em algorithm and MCMC method and number of items used for imputation has little impact. In the Simulation Study, we chose environmental domain to examine the effect of the following variables: em algorithm and MCMC method, missing data rates, and number of items used for imputation. Again, there was no significant difference between em algorithm and MCMC method. The accuracy rates did not significantly reduce with increase in the proportions of missing data. Number of items used for imputation has some contribution to accuracy of imputation, but not as much as expected.
Background: Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more informat...
详细信息
Background: Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more information than single locus statistics. By viewing the pooled genotype data as incomplete data, the expectation-maximization (em) algorithm is the natural algorithm to use, but it is computationally intensive. A recent proposal to reduce the computational burden is to make use of database information to form a list of frequently occurring haplotypes, and to restrict the haplotypes to come from this list only in implementing the em algorithm. There is, however, the danger of using an incorrect list, and there may not be enough database information to form a list externally in some applications. Results: We investigate the possibility of creating an internal list from the data at hand. One way to form such a list is to collapse the observed total minor allele frequencies to "zero" or "at least one", which is shown to have the desirable effect of amplifying the haplotype frequencies. To improve coverage, we propose ways to add and remove haplotypes from the list, and a benchmarking method to determine the frequency threshold for removing haplotypes. Simulation results show that the em estimates based on a suitably augmented and trimmed collapsed data list (ATCDL) perform satisfactorily. In two scenarios involving 25 and 32 loci respectively, the em-ATCDL estimates outperform the em estimates based on other lists as well as the collapsed data maximum likelihood estimates. Conclusions: The proposed augmented and trimmed CD list is a useful list for the em algorithm to base upon in estimating the haplotype distributions of rare variants. It can handle more markers and larger pool size than existing methods, and the resulting em-ATCDL estimates are more efficient than the em estimates based on other lists.
Beyond the expectation-maximization (em) algorithm for vector parameters, the em for an unknown distribution function is often used in mixture models, density estimation, and signal recovery problems. We prove the con...
详细信息
Beyond the expectation-maximization (em) algorithm for vector parameters, the em for an unknown distribution function is often used in mixture models, density estimation, and signal recovery problems. We prove the convergence of the em in functional spaces and show the em likelihoods in this space converge to the global maximum. (C) 2014 Elsevier B.V. All rights reserved.
暂无评论