The existing maximum likelihood theory and its computer software in structural equation modeling are established on the basis of linear relationships among latent variables with fully observed data. However, in social...
详细信息
The existing maximum likelihood theory and its computer software in structural equation modeling are established on the basis of linear relationships among latent variables with fully observed data. However, in social and behavioral sciences, nonlinear relationships among the latent variables are important for establishing more meaningful models and it is very common to encounter missing data. In this article, an EM type algorithm is developed for maximum likelihood estimation of a general nonlinear structural equation model with ignorable missing data, which are missing at random with an ignorable mechanism. To avoid computation of the complicated multiple integrals involved in the conditional expectations, the E-step is completed by a hybrid algorithm that combines the Gibbs sampler and the Metropolis-Hastings algorithm;while the M-step is completed efficiently by conditional maximization. Standard errors of the maximum likelihood estimates are obtained via Louis's formula. The methodology is illustrated with results obtained from a simulation study and a real data set with rather complicated missing patterns and a large number of missing entries.
Interval-censored failure time data often occur in many areas and their analysis has recently attracted a great deal of attention. On the other hand, most of the existing literature for them can only deal with time-in...
详细信息
Interval-censored failure time data often occur in many areas and their analysis has recently attracted a great deal of attention. On the other hand, most of the existing literature for them can only deal with time-independent covariates. Sometimes one may face time dependent covariates and furthermore the covariates could also suffer measurement errors. For the situation, one approach is to conduct a joint analysis for which many methods have been developed in the literature under various framework. One drawback of these methods is that they usually assume that there are no more measurements on the covariates after the failure time and it is apparent that this may not be true. In this paper, a new joint analysis approach is proposed that can take into account the extra observations. In particular, for estimation, a mcem algorithm is developed that is much more stable and converges much faster than the existing algorithms. To assess the finite sample performance of the proposed method, an extensive simulation study is conducted and suggests that it works well for practical situations. Also the method is applied to an AIDS study that motivated this investigation. (C) 2019 Elsevier B.V. All rights reserved.
Background: Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and length...
详细信息
Background: Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and lengths of their reads, however, are yet to surpass those provided by the conventional Sanger sequencing method. This motivates the search for computationally efficient algorithms capable of reliable and accurate detection of the order of nucleotides in short DNA fragments from the acquired data. Results: In this paper, we consider Illumina's sequencing-by-synthesis platform which relies on reversible terminator chemistry and describe the acquired signal by reformulating its mathematical model as a Hidden Markov Model. Relying on this model and sequential Monte Carlo methods, we develop a parameter estimation and base calling scheme called ParticleCall. ParticleCall is tested on a data set obtained by sequencing phiX174 bacteriophage using Illumina's Genome Analyzer II. The results show that the developed base calling scheme is significantly more computationally efficient than the best performing unsupervised method currently available, while achieving the same accuracy. Conclusions: The proposed ParticleCall provides more accurate calls than the Illumina's base calling algorithm, Bustard. At the same time, ParticleCall is significantly more computationally efficient than other recent schemes with similar performance, rendering it more feasible for high-throughput sequencing data analysis. Improvement of base calling accuracy will have immediate beneficial effects on the performance of downstream applications such as SNP and genotype calling. ParticleCall is freely available at https://***/projects/particlecall.
In this paper, an improved Monte Carlo EM (mcem) acceleration algorithm is proposed, aiming at solving the problems of slow convergence and difficult integral calculation of the traditional mcem algorithm when dealing...
详细信息
In this paper, an improved Monte Carlo EM (mcem) acceleration algorithm is proposed, aiming at solving the problems of slow convergence and difficult integral calculation of the traditional mcem algorithm when dealing with complex models. The article first reviews the basic concepts and application background of EM algorithms and mcem algorithms, and then points out the limitations of mcem algorithms with high-dimensional data and complex models. To overcome these challenges, the authors introduce a new algorithm that approximates the solution of N-R step integrals by means of a Monte Carlo simulation method, which improves the convergence speed of the algorithm and maintains the quadratic convergence property. Specifically, the improved mcem acceleration algorithm consists of the following key steps: random sampling in the E1 step, computation of the expectation in the E2 step, maximization of the objective function in the M step, and approximation of the integral using the Monte Carlo method in the improved N-R step. Through numerical examples, the authors demonstrate the advantages of the improved algorithm over the original mcem algorithm and the mcem accelerated algorithm in terms of accuracy of parameter estimation and speed of convergence. In addition, the article discusses the effect of random number selection on the performance of the algorithm and provides a method for choosing the appropriate number of random numbers. In conclusion, the improved mcem acceleration algorithm effectively improves the computational efficiency and accuracy of the results when dealing with complex data models, which has important practical application value for statistical analysis in the era of big data.
Spatial models have been widely used in the public health setup. In the case of continuous outcomes, the traditional approaches to model spatial data are based on the Gaussian distribution. This assumption might be ov...
详细信息
Spatial models have been widely used in the public health setup. In the case of continuous outcomes, the traditional approaches to model spatial data are based on the Gaussian distribution. This assumption might be overly restrictive to represent the data. The real data could be highly non-Gaussian and may show features like heavy tails and/or skewness. In spatial data modeling, it is also commonly assumed that the covariates are observed without errors, but for various reasons, such as measurement techniques or instruments used, uncertainty is inherent in spatial (especially geostatistics) data, and so, these data are susceptible to measurement errors in the covariates of interest. In this paper, we introduce a general class of spatial models with covariate measurement error that can account for heavy tails, skewness, and uncertainty of the covariates. A likelihood method, which leads to the maximum likelihood estimation approach, is used for inference through the Monte Carlo expectation-maximization algorithm. The predictive distribution at nonsampled sites is approximated based on the Markov chain Monte Carlo algorithm. The proposed approach is evaluated through a simulation study and by a real application (particulate matter data set).
It is often assumed that events cannot occur simultaneously when modelling data with point processes. This raises a problem as real-world data often contains synchronous observations due to aggregation or rounding, re...
详细信息
It is often assumed that events cannot occur simultaneously when modelling data with point processes. This raises a problem as real-world data often contains synchronous observations due to aggregation or rounding, resulting from limitations on recording capabilities and the expense of storing high volumes of precise data. In order to gain a better understanding of the relationships between processes, we consider modelling the aggregated event data using multivariate Hawkes processes, which offer a description of mutually-exciting behaviour and have found wide applications in areas including seismology and finance. Here we generalise existing methodology on parameter estimation of univariate aggregated Hawkes processes to the multivariate case using a Monte Carlo expectation-maximization (MC-EM) algorithm and through a simulation study illustrate that alternative approaches to this problem can be severely biased, with the multivariate MC-EM method outperforming them in terms of MSE in all considered cases.
Effective surveillance on the long-term public health impact due to war and terrorist attacks remains limited. Such health issues are commonly under-reported, specifically for a large group of individuals. For this pu...
详细信息
Effective surveillance on the long-term public health impact due to war and terrorist attacks remains limited. Such health issues are commonly under-reported, specifically for a large group of individuals. For this purpose, efficient estimation of the size or undercount of the population under the risk of physical and mental health hazards is of utmost necessity. A novel trivariate Bernoulli model is developed allowing heterogeneity among the individuals and dependence between the sources of information, and an estimation methodology using a Monte Carlo-based EM algorithm is proposed. Simulation results show the superiority of the performance of the proposed method over existing competitors and robustness under model mis-specifications. The method is applied to analyse two real case studies on monitoring amyotrophic lateral sclerosis (ALS) cases for the Gulf War veterans and the 9/11 terrorist attack survivors at the World Trade Center, USA. The average annual cumulative incidence rate for ALS disease increases by 33%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$33\%$$\end{document} and 16%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\%$$\end{document} for deployed and no-deployed military personnel, respectively, after adjusting the undercount. The number of individuals exposed to the risk of physical and mental health effects due to WTC terrorist attacks increased by 42%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$42\%
暂无评论