The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations o...
详细信息
The study of compositional microbiome data is critical for exploring the functional roles of microbial communities in human health and disease. Recent advances have shifted from traditional log-ratio transformations of compositional covariates to zero constraint on the sum of the corresponding coefficients. Various approaches, including penalized regression and Markov Chain Monte Carlo (MCMC) algorithms, have been extended to enforce this sum-to-zero constraint. However, these methods exhibit limitations: penalized regression yields only point estimates, limiting uncertainty assessment, while MCMC methods, although reliable, are computationally intensive, particularly in high-dimensional data settings. To address the challenges posed by existing methods, we proposed Bayesian generalized linear models for analyzing compositional and sub-compositional microbiome data. Our model employs a spike-and-slab double-exponential prior on the microbiome coefficients, inducing weak shrinkage on large coefficients and strong shrinkage on irrelevant ones, making it ideal for high-dimensional microbiome data. The sum-to-zero constraint is handled through soft-centers by applying prior distribution on the sum of compositional or subcompositional coefficients. To alleviate computational intensity, we have developed a fast and stable algorithm incorporating expectation-maximization (em) steps into the routine iteratively weighted least squares (IWLS) algorithm for fitting GLMs. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to one microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). The methods have been implemented in a freely available R package BhGLM .
Mixture models are frequently used for modeling complex data. An extension of the em algorithm, here called ECME, is proposed to compute the maximum likelihood estimate of parameters of symmetric astable mixture model...
详细信息
Mixture models are frequently used for modeling complex data. An extension of the em algorithm, here called ECME, is proposed to compute the maximum likelihood estimate of parameters of symmetric astable mixture model ( SaSMM). Comprehensive simulation studies are performed to show the performance of the proposed ECME algorithm. The robustness of the SaSMM is investigated by simulations when it is used to model data generated from mixture of exponential power and t distributions. Both proposed ECME and Bayesian approaches are applied to three sets of real data, which shows that the proposed ECME algorithm outperforms the Bayesian paradigmfor all three sets. Also, the SaSMM is compared with the mixture of normal, skew normal, t, and skew t distributions for modeling four sets of real data. It turns out that the SaSMM works as well as or better than above models. This can be considered as SaSMM capability in robust mixture modeling.
Rank-based correlation is widely used to measure dependence between variables when their marginal distributions are skewed. Estimation of such correlation is challenged by both the presence of missing data and the nee...
详细信息
Rank-based correlation is widely used to measure dependence between variables when their marginal distributions are skewed. Estimation of such correlation is challenged by both the presence of missing data and the need for adjusting for confounding factors. In this paper, we consider a unified framework of Gaussian copula regression that enables us to estimate either Pearson correlation or rank-based correlation (e.g. Kendall's tau or Spearman's rho), depending on the types of marginal distributions. To adjust for confounding covariates, we utilize marginal regression models with univariate location-scale family distributions. We establish the em algorithm for estimation of both correlation and regression parameters with missing values. For implementation, we propose an effective peeling procedure to carry out iterations required by the em algorithm. We compare the performance of the em algorithm method to the traditional multiple imputation approach through simulation studies. For structured types of correlations, such as exchangeable or first-order auto-regressive (AR-1) correlation, the em algorithm outperforms the multiple imputation approach in terms of both estimation bias and efficiency. (C) 2016 Elsevier B.V. All rights reserved.
A new expectation-maximization(em) algorithm is proposed to estimate the parameters of the truncated multinormal distribution with linear restriction on the variables. Compared with the generalized method of moments...
详细信息
A new expectation-maximization(em) algorithm is proposed to estimate the parameters of the truncated multinormal distribution with linear restriction on the variables. Compared with the generalized method of moments(GMM) estimation and the maximum likelihood estimation(MLE) for the truncated multivariate normal distribution, the em algorithm features in fast calculation and high accuracy which are shown in the simulation results. For the real data of the national college entrance exams(NCEE), we estimate the distribution of the NCEE examinees' scores in Anhui, 2003, who were admitted to the university of science and technology of China(USTC). Based on our analysis, we have also given the ratio truncated by the NCEE admission line of USTC in Anhui, 2003.
The em algorithm is a powerful technique for determining the maximum likelihood estimates (MLEs) in the presence of binary data since the maximum likelihood estimators of the parameters cannot be expressed in a closed...
详细信息
The em algorithm is a powerful technique for determining the maximum likelihood estimates (MLEs) in the presence of binary data since the maximum likelihood estimators of the parameters cannot be expressed in a closed-form. In this paper, we consider one-shot devices that can be used only once and are destroyed after use, and so the actual observation is on the conditions rather than on the real lifetimes of the devices under test. Here, we develop the em algorithm for such data under the exponential distribution for the lifetimes. Due to the advances in manufacturing design and technology, products have become highly reliable with long lifetimes. For this reason, accelerated life tests are performed to collect useful information on the parameters of the lifetime distribution. For such a test, the Bayesian approach with normal prior was proposed recently by Fan et al. (2009). Here, through a simulation study, we show that the em algorithm and the mentioned Bayesian approach are both useful techniques for analyzing such binary data arising from one-shot device testing and then make a comparative study of their performance and show that, while the Bayesian approach is good for highly reliable products, the em algorithm method is good for moderate and low reliability situations. (C) 2011 Elsevier B.V. All rights reserved.
The two-parameter Burr XII distribution has been widely used in various practical applications such as business, chemical engineering, quality control, medical research and reliability engineering. In this paper, we p...
详细信息
The two-parameter Burr XII distribution has been widely used in various practical applications such as business, chemical engineering, quality control, medical research and reliability engineering. In this paper, we present maximum likelihood estimation (MLE) via the expectation-maximization (em) algorithm to estimate the Burr XII parameters with multiple censored data. We also provide a method that can be used to construct the confidence intervals of the parameters, a method that computes the asymptotic variance and the covariance of the MLE from the complete and missing information matrices. A simulation study is conducted to compare the performance of the MLE via the em algorithm and the Netwon-Raphson (NR) algorithm. The simulation results show that the em algorithm outperforms the NR algorithm in most cases in terms of bias and errors in the root mean square. A numerical example is also used to demonstrate the performance of the proposed method. Copyright (C) 2009 John Wiley & Sons, Ltd.
The two-part model and Heckman's sample selection model are often used in economic studies which involve analyzing the demand for limited variables. This study proposed a simultaneous equation model (Sem) and used...
详细信息
The two-part model and Heckman's sample selection model are often used in economic studies which involve analyzing the demand for limited variables. This study proposed a simultaneous equation model (Sem) and used the expectation-maximization algorithm to obtain the maximum likelihood estimate. We then constructed a simulation to compare the performance of estimates of price elasticity using Sem with those estimates from the two-part model and the sample selection model. The simulation shows that the estimates of price elasticity by Sem are more precise than those by the sample selection model and the two-part model when the model includes limited independent variables. Finally, we analyzed a real example of cigarette consumption as an application. We found an increase in cigarette price associated with a decrease in both the propensity to consume cigarettes and the amount actually consumed.
This paper provides an extension of the work of Balakrishnan and Ling by introducing a competing risks model into a one-shot device testing analysis under an ALT setting. An expectation maximization (em) algorithm is ...
详细信息
This paper provides an extension of the work of Balakrishnan and Ling by introducing a competing risks model into a one-shot device testing analysis under an ALT setting. An expectation maximization (em) algorithm is then developed for the estimation of model parameters. An extensive Monte Carlo simulation study is carried out to assess the performance of the proposed method. The performance of the em algorithm and the Fisher scoring method are also compared. Finally, the proposed em algorithm is applied to a modified Class-B insulation data for illustrating the results developed here.
This paper provides an extension of the work of Balakrishnan and Ling [1] by introducing a competing risks model into a one-shot device testing analysis under an accelerated life test setting. An Expectation Maximizat...
详细信息
This paper provides an extension of the work of Balakrishnan and Ling [1] by introducing a competing risks model into a one-shot device testing analysis under an accelerated life test setting. An Expectation Maximization (em) algorithm is then developed for the estimation of the model parameters. An extensive Monte Carlo simulation study is carried out to assess the performance of the em algorithm and then compare the obtained results with the initial estimates obtained by the Inequality Constrained Least Squares (ICLS) method of estimation. Finally, we apply the em algorithm to a clinical data, ED01, to illustrate the method of inference developed here. (C) 2015 Elsevier Ltd. All rights reserved.
Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (em) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the em alg...
详细信息
Grouped data are frequently used in several fields of study. In this work, we use the expectation-maximization (em) algorithm for fitting the skew-normal (SN) mixture model to the grouped data. Implementing the em algorithm requires computing the one-dimensional integrals for each group or class. Our simulation study and real data analyses reveal that the em algorithm not only always converges but also can be implemented in just a few seconds even when the number of components is large, contrary to the Bayesian paradigm that is computationally expensive. The accuracy of the em algorithm and superiority of the SN mixture model over the traditional normal mixture model in modelling grouped data are demonstrated through the simulation and three real data illustrations. For implementing the em algorithm, we use the package called ForestFit developed for R environment available at .
暂无评论