B-splines are considered for the deconvolution problem of estimating a probability density function when the sample observations are contaminated with random noise. In the logspline method of density estimation, the l...
详细信息
B-splines are considered for the deconvolution problem of estimating a probability density function when the sample observations are contaminated with random noise. In the logspline method of density estimation, the logarithm of the unknown density function is approximated by a polynomial spline, the unknown parameters of which are estimated by maximum likelihood. Based on the logspline method, a fully automated procedure involving the em algorithm, stepwise knot deletion and BIC has been developed for deconvolution. Numerical examples using simulated data are given to show the performance of the B-spline deconvolution.
We consider the asymptotic behaviour of various parametric multiple imputation procedures which include but are not restricted to the 'proper' imputation procedures proposed by Rubin (1978). The asymptotic var...
详细信息
We consider the asymptotic behaviour of various parametric multiple imputation procedures which include but are not restricted to the 'proper' imputation procedures proposed by Rubin (1978). The asymptotic variance structure of the resulting estimators is provided. This result is used to compare the relative efficiencies of different imputation procedures. It also provides a basis to understand the behaviour of two Monte Carlo iterative estimators, stochastic em (Celeux & Diebolt, 1985;Wei & Tanner;1990) and simulated em (Ruud, 1991). We further develop properties of these estimators when they stop at iteration K with imputation size m. An application to a measurement error problem is used to illustrate the results.
In this paper the additive genetic gamma fraility model is defined. Individual frailties are correlated as a result of an additive genetic model, An algorithm to construct additive genetic gamma frailties for any pedi...
详细信息
In this paper the additive genetic gamma fraility model is defined. Individual frailties are correlated as a result of an additive genetic model, An algorithm to construct additive genetic gamma frailties for any pedigree is given so that the variance-covariance structure among individual frailties equals the numerator relationship matrix times a variance. The em algorithm can be used to estimate the parameters in the model. Calculations are similar using the em algorithm in the shared frailty model, however the E step is not correspondingly simple. This is illustrated re-analysing data, analysed by the shared frailty model in Nielsen et al. (1992), from the Danish adoptive register*. Goodness of fit of the additive genetic gamma frailty model can be tested after analysing data with the correlated frailty model. Doing so, a "defect" in the often used and otherwise well behaving likelihood was found.
Many meteorological datasets are mixtures in which components correspond to particular physical phenomena, the accurate identification of which are important from a meteorological standpoint. In particular, rainfall i...
详细信息
Many meteorological datasets are mixtures in which components correspond to particular physical phenomena, the accurate identification of which are important from a meteorological standpoint. In particular, rainfall is generated by at least two processes - one convection, the other frontal systems - each characterised by its own distribution of rain rates and durations. The breakpoint data format, in which the timings of rain-rate changes and the steady rates between changes are recorded, captures the information required to parameterise these phenomena. Rainfall data has only recently become available in breakpoint format, which is both more compact and contains more information than older sources such as the fixed amount and fixed interval representation commonly used. Techniques such as the em algorithm can be used to decompose the breakpoint data into its components. However, the quality of the currently available breakpoint data is poor for low rates and short durations and these portions of the data-need to be discarded, or screened out, and the em algorithm modified. In this paper, the em algorithm is extended to deal with datasets in which data screening has taken place. The unified approach adopted appears new and, although tailored to a particular and important application, the method should have much wider application. Furthermore, in this paper the extension is applied to a large scale breakpoint dataset of about 56,000 observations with univariate and bivariate normal mixtures being fitted after censoring or truncation below a point or line respectively. The procedure was also applied to simulated breakpoint data which showed that the procedure was relatively robust and gave excellent results in the majority of cases. For the actual data, the results at low truncation agreed with applications of the em algorithm to nontruncated data, but a different picture arose at moderate truncation. An analysis of the dry times between periods of precipitation is al
The World Health Organization (WHO) diagnostic criteria for diabetes mellitus were determined in part by evidence that in some populations the plasma glucose level 2 h after an oral glucose load is a mixture of two di...
详细信息
The World Health Organization (WHO) diagnostic criteria for diabetes mellitus were determined in part by evidence that in some populations the plasma glucose level 2 h after an oral glucose load is a mixture of two distinct distributions. We present a finite mixture model that allows the two component densities to be generalized linear models and the mixture probability to be a logistic regression model. The model allows us to estimate the prevalence of diabetes and sensitivity and specificity of the diagnostic criteria as a function of covariates and to estimate them in the absence of an external standard. Sensitivity is the probability that a test indicates disease conditionally on disease being present. Specificity is the probability that a test indicates no disease conditionally on no disease being present. We obtained maximum likelihood estimates via the em algorithm and derived the standard errors from the information matrix and by the bootstrap. In the application to data from the diabetes in Egypt project a two-component mixture model fits well and the two components are interpreted as normal and diabetes. The means and variances are similar to results found in other populations. The minimum misclassification cutpoints decrease with age, are lower in urban areas and are higher in rural areas than the 200 mg dl(-1) cutpoint recommended by the WHO. These differences are modest and our results generally support the WHO criterion. Our methods allow the direct inclusion of concomitant data whereas past analyses were based on partitioning the data.
In this paper we discuss maximum likelihood estimation when some observations are missing in mixed graphical interaction models assuming a conditional Gaussian distribution as introduced by Lauritzen & Wermuth (19...
详细信息
In this paper we discuss maximum likelihood estimation when some observations are missing in mixed graphical interaction models assuming a conditional Gaussian distribution as introduced by Lauritzen & Wermuth (1989). The approach via the em algorithm of Little & Schluchter (1985) for the saturated case is expanded to cover the special restrictions in graphical models. A more efficient way to compute the E-step is indicated. The main purpose of the paper is to show that for certain missing patterns the computational effort can be considerably reduced.
We consider the problem of detecting features, such as minefields or seismic faults, in spatial point processes when there is substantial clutter. We use model-based clustering based on a mixture model for the process...
详细信息
We consider the problem of detecting features, such as minefields or seismic faults, in spatial point processes when there is substantial clutter. We use model-based clustering based on a mixture model for the process, in which features are assumed to generate points according to highly linear multivariate normal densities, and the clutter arises according to a spatial Poisson process. Nonlinear features are represented by several densities, giving a piecewise linear representation. Hierarchical model-based clustering provides a first estimate of the features, and this is then refined using the em algorithm. The number of features is estimated from an approximation to its posterior distribution. The method gives good results for the minefield and seismic fault problems. Software to implement it is available on the World Wide Web.
Distorted segregation has been repeatedly observed in various plant species in molecular-marker linkage mapping where distant crosses were made. It may be caused by a partial lethal-factor acting in the filial generat...
详细信息
Distorted segregation has been repeatedly observed in various plant species in molecular-marker linkage mapping where distant crosses were made. It may be caused by a partial lethal-factor acting in the filial generations. A method is presented for estimating the recombination values between a partial lethal-factor locus and a linked molecular marker and the relative viability or fertilization ability of zygotes or gametes, respectively affected by the partial lethal factor in backcross (BC) and doubled-haploid (DH) populations using the maximum-likelihood method associated with the expectation maximization (em) algorithm. The method was applied to segregation data of molecular markers for a population of 150 DH lines developed from the 'Steptoe' x 'Morex' cross in barley. The presence of a partial lethal-factor locus, located on chromosome 4, causing partial selection was suggested. This locus was tightly linked to the ABG500B marker, and the chance of fertilization of female gametes possessing the partial lethal factor was, on average, 59.8% that of a normal one. Two additional partial lethal factors were found on chromosome 5.
We propose a new stochastic approximation (SA) algorithm for maximum-likelihood estimation (MLE) in the incomplete-data setting. This algorithm is most useful for problems when the em algorithm is not possible due to ...
详细信息
We propose a new stochastic approximation (SA) algorithm for maximum-likelihood estimation (MLE) in the incomplete-data setting. This algorithm is most useful for problems when the em algorithm is not possible due to an intractable E-step or M-step. Compared to other algorithm that have been proposed for intractable em problems, such as the MCem algorithm of Wei and Tanner (1990), our proposed algorithm appears more generally applicable and efficient. The approach we adopt is inspired by the Robbins-Monro (1951) stochastic approximation procedure, and we show that the proposed algorithm can be used to solve some of the long-standing problems in computing an MLE with incomplete data. We prove that in general O(n) simulation steps are required in computing the MLE with the SA algorithm and O(n log n) simulation steps are required in computing the MLE using the MCem and/or the MCNR algorithm, where n is the sample size of the observations. Examples include computing the MLE in the nonlinear error-in-variable model and nonlinear regression model with random effects.
We show how the concept of hidden Markov model may be accommodated in a setting involving multiple sequences of observations. The resulting class of models allows for both interrelationships between different sequence...
详细信息
We show how the concept of hidden Markov model may be accommodated in a setting involving multiple sequences of observations. The resulting class of models allows for both interrelationships between different sequences and serial dependence within sequences. Missing values in the observation sequences may be handled in a straightforward manner. We also examine a group of methods, based upon the observed Fisher Information matrix, for estimating the covariance matrix of the parameter estimates. We illustrate the methods with both real and simulated data sets.
暂无评论