Data augmentation, sometimes known as the method of auxiliary variables, is a powerful tool for constructing optimisation and simulation algorithms. In the context of optimisation, Meng & van Dyk (1997, 1998) repo...
详细信息
Data augmentation, sometimes known as the method of auxiliary variables, is a powerful tool for constructing optimisation and simulation algorithms. In the context of optimisation, Meng & van Dyk (1997, 1998) reported several successes of the 'working parameter' approach for constructing efficient data-augmentation schemes for fast and simple em-type algorithms. This paper investigates the use of working parameters in the context of Markov chain Monte Carlo, in particular in the context of Tanner & Wong's (1987) data augmentation algorithm, via a theoretical study of two working-parameter approaches, the conditional augmentation approach and the marginal augmentation approach. Posterior sampling under the univariate t model is used as a running example, which particularly illustrates how the marginal augmentation approach obtains a fast-mixing positive recurrent Markov chain by first constructing a nonpositive recurrent Markov chain in a larger space.
In likelihood-based approaches to robustify state space models, Gaussian error distributions are replaced by non-normal alternatives with heavier tails. Robustified observation models are appropriate for time series w...
详细信息
In likelihood-based approaches to robustify state space models, Gaussian error distributions are replaced by non-normal alternatives with heavier tails. Robustified observation models are appropriate for time series with additive outliers, while state or transition equations with heavy-tailed error distributions lead to filters and smoothers that can cope with structural changes in trend or slope caused by innovations outliers. As a consequence, however, conditional filtering and smoothing densities become analytically intractable. Various attempts have been made to deal with this problem, reaching from approximate conditional mean type estimation to fully Bayesian analysis using MCMC simulation. In this article we consider penalized likelihood smoothers, this means estimators which maximize penalized likelihoods of, equivalently, posterior densities. Filtering and smoothing for additive and innovations outlier models can be carried out by computationally efficient Fisher scoring steps or iterative Kalman-type filters. Special emphasis is on the Student family, for which em-type algorithms to estimate unknown hyperparameters are developed. Operational behaviour is illustrated by simulation experiments and by real data applications.
Airborne laser scanner data collected over forests provide a canopy height, To obtain tree heights from airborne laser scanner data one needs a recovery model. Two such models, one (A) assuming that observations are s...
详细信息
Airborne laser scanner data collected over forests provide a canopy height, To obtain tree heights from airborne laser scanner data one needs a recovery model. Two such models, one (A) assuming that observations are sampled with probability proportional to displayed crown area, and the other (B) derived from the probability that a laser beam penetrates to a given canopy depth, were developed and applied to laser scanner data obtained over stands of Douglas-fir. Model estimates of recovered arithmetic mean tree heights and quantiles (75%, 85%, and 95%) were not significantly (P > 0.24) different from ground-based equivalents. An overall mean bias of -3 m in the laser canopy heights was eliminated by both methods, The median absolute difference between observed and predicted plot means and quantiles we re reduced by 40 to 60%. Three alternative recovery procedures are presented for model B. For a single plot, the predictions varied significantly among the models and estimation procedures with no consistent pattern, Predictions of arithmetic mean heights were best for plots with no understory, while predictions of upper quantiles were consistent in all plots.
We show that under reasonable conditions the nonparametric maximum likelihood estimate (NPMLE) of the distribution function from left-truncated and case 1 interval-censored data is inconsistent, in contrast to the con...
详细信息
We show that under reasonable conditions the nonparametric maximum likelihood estimate (NPMLE) of the distribution function from left-truncated and case 1 interval-censored data is inconsistent, in contrast to the consistency properties of the NPMLE from only left-truncated data or only interval-censored data. However, the conditional NPMLE is shown to be consistent. Numerical examples are provided to illustrate their finite sample properties.
Modeling by mixed-distribution was proposed in order to analyze heterogeneity of costs and length of stays within Diagnosis Related Groups (DRGs). A mixed-distribution model based on Weibull distributions was applied ...
详细信息
Modeling by mixed-distribution was proposed in order to analyze heterogeneity of costs and length of stays within Diagnosis Related Groups (DRGs). A mixed-distribution model based on Weibull distributions was applied to 791 discharge abstracts of French DRG no. 450 (Health Care Financing Administration 3 DRG no. 316 "Renal failure") from a national database. Three subgroups of cost and length of stay were identified. Except for age, clinical criteria significantly linked with the long-stay subgroup were the same as those associated with the high-cost subgroup: acute renal failure, intensive care, infectious complications, and vascular investigations. The identification of factors associated with high costs, based on the proposed model, will allow physicians to understand more accurately how their choice of specific procedures influences hospital costs. J CLIN EPIDemIOL 52;3:251-258, 1999. (C) 1999 Elsevier Science Inc.
This paper reviews estimation problems with missing, or hidden data. We formulate this problem in the context of Markov models and consider two interrelated issues, namely, the estimation of a state given measured dat...
详细信息
This paper reviews estimation problems with missing, or hidden data. We formulate this problem in the context of Markov models and consider two interrelated issues, namely, the estimation of a state given measured data and model parameters, and the estimation of model parameters given the measured data alone. We also consider situations where the measured data is, itself, incomplete in some sense. We deal with various combinations of discrete and continuous states and observations.
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors...
详细信息
Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an em algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.
We consider the problem of modelling the failure-time distribution, where failure is due to two distinct causes. One approach is to adopt a two-component mixture model where the components correspond to the two differ...
详细信息
We consider the problem of modelling the failure-time distribution, where failure is due to two distinct causes. One approach is to adopt a two-component mixture model where the components correspond to the two different causes of failure. However, routine application of this approach with typical parametric forms for the component densities proves to be inadequate in modelling the time to a re-replacement operation or death after the initial replacement of the aortic valve in the heart by a prosthesis, such as a xenograft valve. Hence we consider modifications to the usual mixture model approach to handle situations where there exists a strong dependency between the failure times of the distinct causes. With these modifications, a suitable model is able to be provided for the distribution of the time to a re-replacement operation conditional on the age of the patient at the time of the initial replacement operation. The estimate so obtained by the probability that a patient of a given age will undergo a re-replacement operation provides a useful guide to heart surgeons on the type of valve to be used in view of the patient's age. Copyright (C) 1999 John Wiley & Sons, Ltd.
Background In 1994 a small cluster of hepatitis-C cases in Rhesus-negative women in Ireland prompted a nationwide screening programme for hepatitis-C antibodies in all anti-D recipients. A total of 55 386 women presen...
详细信息
Background In 1994 a small cluster of hepatitis-C cases in Rhesus-negative women in Ireland prompted a nationwide screening programme for hepatitis-C antibodies in all anti-D recipients. A total of 55 386 women presented for screening and a history of exposure to anti-D was sought from all those testing positive and a sample of those testing negative. The resulting data comprised 620 antibody-positive and 1708 antibody-negative women with known exposure history, and interest was focused on using these data to estimate the infectivity of anti-D in the period 1970-1993. Methods Any exposure to anti-D provides an opportunity for infection, but the infection status at each exposure dme is not observed. Instead, the available data from antibody testing only indicate whether at least one of the exposures resulted in infection. Using a simple Bernoulli model to describe the risk of infection in each year, the absence of information regarding which exposure(s) led to infection fits neatly into the framework of 'incomplete data'. Hence the expectation-maximization (em) algorithm provides estimates of the infectiousness of anti-D in each of the 24 years studied. Results The analysis highlighted the 1977 anti-D as a source of infection, a fact which was confirmed by laboratory investigation. Other suspect batches were also identified, helping to direct the efforts of laboratory investigators. Conclusions We have presented a method to estimate the risk of infection at each exposure time from multiple exposure data. The method can also be used to estimate transmission rates and the risk associated with different sources of infection in a range of infectious disease applications.
The performance of an automatic speech recognizer degrades when there exists an acoustic mismatch between the training and the testing conditions in the data. Though it is certain that the mismatch is nonlinear, its e...
详细信息
The performance of an automatic speech recognizer degrades when there exists an acoustic mismatch between the training and the testing conditions in the data. Though it is certain that the mismatch is nonlinear, its exact form is unknown. Tackling the problem of nonlinear mismatches is a difficult task that has not been adequately addressed before. In this paper, we develop an approach that uses nonlinear transformations in the stochastic matching framework to compensate for acoustic mismatches, The functional form of the nonlinear transformation is modeled by neural networks. We develop a new technique to train neural networks using the generalized em algorithm. This technique eliminates the need for stereo databases, which are difficult to obtain in practical applications. The new technique is data-driven and hence can be used under a wide variety of conditions without a priori knowledge of the environment, Using this technique, we show that we can provide improvement under various types of acoustic mismatch;in some cases a 72% reduction in word error rate is achieved.
暂无评论