Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or ...
详细信息
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.
The well-known em algorithm is an optimization transfer algorithm that depends on the notion of incomplete or missing data. By invoking convexity arguments, one can construct a variety of other optimization transfer a...
详细信息
The well-known em algorithm is an optimization transfer algorithm that depends on the notion of incomplete or missing data. By invoking convexity arguments, one can construct a variety of other optimization transfer algorithms that do not involve missing data. These algorithms all rely on a majorizing or minorizing function that serves as a surrogate for the objective function. Optimizing the surrogate function drives the objective function in the correct direction. This article illustrates this general principle by a number of specific examples drawn from the statistical literature. Because optimization transfer algorithms often exhibit the slow convergence of em algorithms, two methods of accelerating optimization transfer are discussed and evaluated in the context of specific problems.
In the context of structural equation modeling, a general interaction model with multiple latent interaction effects is introduced. A stochastic analysis represents the nonnormal distribution of the joint indicator ve...
详细信息
In the context of structural equation modeling, a general interaction model with multiple latent interaction effects is introduced. A stochastic analysis represents the nonnormal distribution of the joint indicator vector as a finite mixture of normal distributions. The Latent Moderated Structural Equations (LMS) approach is a new method developed for the analysis of the general interaction model that utilizes the mixture distribution and provides a ML estimation of model parameters by adapting the em algorithm. The finite sample properties and the robustness of LMS are discussed. Finally, the applicability of the new method is illustrated by an empirical example.
In this paper we discuss a general model framework within which manifest variables with different distributions in the exponential Family can be analyzed with a latent trait model. A unified maximum likelihood method ...
详细信息
In this paper we discuss a general model framework within which manifest variables with different distributions in the exponential Family can be analyzed with a latent trait model. A unified maximum likelihood method For estimating the parameters of the generalized latent trait model will be presented. We discuss in addition the scoring of individuals on the latent dimensions. The general framework presented allows, not only the analysis of manifest variables all of one type but also the simultaneous analysis of a collection of variables with different distributions. The approach used analyzes the data as they are by making assumptions about the distribution of the manifest variables directly.
作者:
Dunson, DBNIEHS
Biostat Branch Res Triangle Pk NC 27709 USA
In cancer studies that use transgenic or knockout mice, skin tumour counts are recorded over time to measure tumorigenicity. In these studies cancer biologists are interested in the effect of endogenous and/or exogeno...
详细信息
In cancer studies that use transgenic or knockout mice, skin tumour counts are recorded over time to measure tumorigenicity. In these studies cancer biologists are interested in the effect of endogenous and/or exogenous factors on papilloma onset, multiplicity and regression. In this paper an analysis of data from a study conducted by the National Institute of Environmental Health Sciences on the effect of genetic factors on skin tumorigenesis is presented. Papilloma multiplicity and regression are modelled by using Bernoulli, Poisson and binomial latent variables, each of which can depend on covariates and previous outcomes. An em algorithm is proposed for parameter estimation, and generalized estimating equations adjust for extra dependence between outcomes within individual animals. A Cox proportional hazards model is used to describe covariate effects on the onset of tumours.
Suppose that when a unit operates in a certain environment, its lifetime has distribution G, and when the unit operates in another environment, its lifetime has a different distribution, say F. Moreover, suppose the u...
详细信息
Suppose that when a unit operates in a certain environment, its lifetime has distribution G, and when the unit operates in another environment, its lifetime has a different distribution, say F. Moreover, suppose the unit is operated for a certain period of time in the first environment and is then transferred to the second environment. Thus we observe a censored lifetime in the first environment and a failure time of a "used" unit in the second environment. We propose an em algorithm approach for obtaining a self-consistent estimator of F using observations from both environments. The case where failure times are subject to right censoring is considered as well. We also establish the maximum likelihood estimator of F when the unit is repairable. Application and simulation studies are presented to illustrate the methods derived.
Selective genotyping is a cost-saving strategy in mapping quantitative trait loci (QTLs). When the proportion of individuals selected for genotyping is low, the majority of the individuals are not genotyped, but their...
详细信息
Selective genotyping is a cost-saving strategy in mapping quantitative trait loci (QTLs). When the proportion of individuals selected for genotyping is low, the majority of the individuals are not genotyped, but their phenotypic values, if available, are still included in the data analysis to correct the bias in parameter estimation. These ungenotyped individuals do not contribute much information about linkage analysis and their inclusion can substantially increase the computational burden. For multiple trait analysis, ungenotyped individuals may not have a full array of phenotypic measurements. In this case, unbiased estimation of QTL effects using current methods seems to be impossible. In this study, we develop a maximum likelihood method of QTL mapping under selective genotyping using only the phenotypic values of genotyped individuals. Compared with the full data analysis (using all phenotypic values), the proposed method performs well. We derive an expectation-maximization (em) algorithm that appears to be a simple modification of the existing em algorithm for standard interval mapping. The new method can be readily incorporated into a standard QTL mapping software, e.g. MAPMAKER. A general recommendation is that whenever full data analysis is possible, the full maximum likelihood analysis should be performed. If it is impossible to analyse the full data, e.g. sample sizes are too large, phenotypic values of ungenotyped individuals are missing or composite interval mapping is to be performed, the proposed method can be applied.
A problem arising from the study of the spread of a viral infection among potato plants by aphids appears to involve a mixture of two linear regressions on a single predictor variable. The plant scientists studying th...
详细信息
A problem arising from the study of the spread of a viral infection among potato plants by aphids appears to involve a mixture of two linear regressions on a single predictor variable. The plant scientists studying the problem were particularly interested in obtaining a 95% confidence upper bound for the infection rate. We discuss briefly the procedure for fitting mixtures of regression models by means of maximum likelihood, effected via the em algorithm. We give general expressions for the implementation of the M-step and then address the issue of conducting statistical inference in this context. A technique due to T. A. Louis may be used to estimate the covariance matrix of the parameter estimates by calculating the observed Fisher information matrix. We develop general expressions for the entries of this information matrix. Having the complete covariance matrix permits the calculation of confidence and prediction bands for the fitted model. We also investigate the testing of hypotheses concerning the number of components in the mixture via parametric and 'semiparametric' bootstrapping. Finally, we develop a method of producing diagnostic plots of the residuals from a mixture of linear regressions.
Many papers (including most of the papers in this issue of Computational Statistics) deal with Markov Chain Monte Carlo (MCMC) methods. This paper will give an introduction to the augmented Gibbs sampler (a special ca...
详细信息
Many papers (including most of the papers in this issue of Computational Statistics) deal with Markov Chain Monte Carlo (MCMC) methods. This paper will give an introduction to the augmented Gibbs sampler (a special case of MCMC), illustrated using the random intercept model. A 'nonstandard' application of the augmented Gibbs sampler will be discussed to give an illustration of the power of MCMC methods. Furthermore, it will be illustrated that the posterior sample resulting from an application of MCMC can be used for more than determination of convergence and the computation of simple estimators like the a posteriori expectation and standard deviation. Posterior samples give access to many other inferential possibilities. Using a simulation study, the frequency properties of some of these possibilities will be evaluated.
Recently the authors introduced a general Bayesian statistical method for modeling and analysis in linear inverse problems involving certain types of count data. emission-based tomography in medical imaging is a parti...
详细信息
ISBN:
(纸本)0819437646
Recently the authors introduced a general Bayesian statistical method for modeling and analysis in linear inverse problems involving certain types of count data. emission-based tomography in medical imaging is a particularly important and common example of this type of problem. In this paper we provide an overview of the methodology and illustrate its application to problems in emission tomography through a series of simulated and real-data examples. The framework rests on the special manner in which a multiscale representation of recursive dyadic partitions (essentially an unnormalized Haar analysis) interacts with the statistical likelihood of data with Poisson noise characteristics. In particular, the likelihood function permits a factorization, with respect to location-scale indexing, analogous to the manner in which, say, an arbitrary signal allows a wavelet transform. Recovery of an object from tomographic data is then posed as a problem involving the statistical estimation of a multiscale parameter vector. A type of statistical shrinkage estimation is used, induced by careful choice of a Bayesian prior probability structure for the parameters. Finally, the ill-posedness of the tomographic imaging problem is accounted for by embedding the above-described framework within a larger, but simpler statistical estimation problem, via the so-called Expectation-Maximization (em) approach. The resulting image reconstruction algorithm is iterative in nature, entailing the calculation of two closed-form algebraic expressions at each iteration. Convergence of the algorithm to a unique solution, under appropriate choice of Bayesian prior, can be assured.
暂无评论