In some applications of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observation. In this...
详细信息
In some applications of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observation. In this paper we extend the censored linear regression model with normal errors to the case where the random errors follow a finite mixture of Student-t distributions. This approach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically tractable and efficient em-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of the truncated Student-t distributions. The efficacy of the method is verified through the analysis of simulated and real datasets. The proposed algorithm and methods are implemented in the new R package CensMixReg.
Finite mixture models have been widely used for the modeling and analysis of data from a heterogeneous population. Moreover, data of this kind can be subject to some upper and/or lower detection limits because of the ...
详细信息
Finite mixture models have been widely used for the modeling and analysis of data from a heterogeneous population. Moreover, data of this kind can be subject to some upper and/or lower detection limits because of the restriction of experimental apparatus. Another complication arises when measures of each population depart significantly from normality, for instance, in the presence of heavy tails or atypical observations. For such data structures, we propose a robust model for censored data based on finite mixtures of multivariate Student-t distributions. This approach allows us to model data with great flexibility, accommodating multimodality, heavy tails and also skewness depending on the structure of the mixture components. We develop an analytically simple, yet efficient, em-type algorithm for conducting maximum likelihood estimation of the parameters. The algorithm has closed-form expressions at the E-step that rely on formulas for the mean and variance of the multivariate truncated Student-t distributions. Further, a general information-based method for approximating the asymptotic covariance matrix of the estimators is also presented. Results obtained from the analysis of both simulated and real datasets are reported to demonstrate the effectiveness of the proposed methodology. The proposed algorithm and methods are implemented in the new R package CensMixReg. (C) 2017 Elsevier Inc. All rights reserved.
A framework of using t mixture models with fourteen eigen-decomposed covariance structures for the unsupervised learning of heterogeneous multivariate data with possible missing values is designed and implemented. Com...
详细信息
A framework of using t mixture models with fourteen eigen-decomposed covariance structures for the unsupervised learning of heterogeneous multivariate data with possible missing values is designed and implemented. Computationally flexible em-type algorithms are developed for parameter estimation of these models under a missing at random (MAR) mechanism. For ease of computation and theoretical developments, two auxiliary indicator matrices are incorporated into the estimating procedure for exactly extracting the location of observed and missing components of each observation. Computational aspects related to the specification of starting values, convergence assessment and model choice are also discussed. The practical usefulness of the proposed methodology is illustrated with real data examples and a simulation study with varying proportions of missing values. (C) 2013 Elsevier B.V. All rights reserved.
This paper presents a robust probabilistic mixture model based on the multivariate skew-t-normal distribution, a skew extension of the multivariate Student's t distribution with more powerful abilities in modellin...
详细信息
This paper presents a robust probabilistic mixture model based on the multivariate skew-t-normal distribution, a skew extension of the multivariate Student's t distribution with more powerful abilities in modelling data whose distribution seriously deviates from normality. The proposed model includes mixtures of normal, t and skew-normal distributions as special cases and provides a flexible alternative to recently proposed skew t mixtures. We develop two analytically tractable em-type algorithms for computing maximum likelihood estimates of model parameters in which the skewness parameters and degrees of freedom are asymptotically uncorrelated. Standard errors for the parameter estimates can be obtained via a general information-based method. We also present a procedure of merging mixture components to automatically identify the number of clusters by fitting piecewise linear regression to the rescaled entropy plot. The effectiveness and performance of the proposed methodology are illustrated by two real-life examples.
When the Newton-Raphson algorithm or the Fisher scoring algorithm does not work and the em-type algorithms are not available, the quadratic lower-bound (QLB) algorithm may be a useful optimization tool. However, like ...
详细信息
When the Newton-Raphson algorithm or the Fisher scoring algorithm does not work and the em-type algorithms are not available, the quadratic lower-bound (QLB) algorithm may be a useful optimization tool. However, like all em-type algorithms, the QLB algorithm may also suffer from slow convergence which can be viewed as the cost for having the ascent property. This paper proposes a novel 'shrinkage parameter' approach to accelerate the QLB algorithm while maintaining its simplicity and stability (i.e., monotonic increase in log-likelihood). The strategy is first to construct a class of quadratic surrogate functions Qr(theta vertical bar theta((t))) that induces a class of QLB algorithms indexed by a 'shrinkage parameter' r (r is an element of R) and then to optimize r over R under some criterion of convergence. For three commonly used criteria (i.e., the smallest eigenvalue, the trace and the determinant), we derive a uniformly optimal shrinkage parameter and find an optimal QLB algorithm. Some theoretical justifications are also presented. Next, we generalize the optimal QLB algorithm to problems with penalizing function and then investigate the associated properties of convergence. The optimal QLB algorithm is applied to fit a logistic regression model and a Cox proportional hazards model. Two real datasets are analyzed to illustrate the proposed methods. (C) 2011 Elsevier B.V. All rights reserved.
A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatm...
详细信息
A finite mixture model using the Student's t distribution has been recognized as a robust extension of normal mixtures. Recently, a mixture of skew normal distributions has been found to be effective in the treatment of heterogeneous data involving asymmetric behaviors across subclasses. In this article, we propose a robust mixture framework based on the skew t distribution to efficiently deal with heavy-tailedness, extra skewness and multimodality in a wide range of settings. Statistical mixture modeling based on normal, Student's t and skew normal distributions can be viewed as special cases of the skew t mixture model. We present analytically simple em-type algorithms for iteratively computing maximum likelihood estimates. The proposed methodology is illustrated by analyzing a real data example.
In recent years, there has been an avalanche of new data in observational high-energy astrophysics. Recently launched or soon-to-be launched space-based telescopes that are designed to detect and map ultra-violet, X-r...
详细信息
In recent years, there has been an avalanche of new data in observational high-energy astrophysics. Recently launched or soon-to-be launched space-based telescopes that are designed to detect and map ultra-violet, X-ray, and gamma-ray electromagnetic emission are opening a whole new window to study the cosmos. Because the production of high-energy electromagnetic emission requires temperatures of millions of degrees and is an indication of the release of vast quantities of stored energy, these instruments give a completely new perspective on the hot and turbulent regions of the universe. The new instrumentation allows for very high resolution imaging, spectral analysis, and time series analysis;the Chandra X-ray Observatory, for example, produces images atleast thirty times sharper than any previous X-ray telescope. The complexity of the instruments, of the astronomical sources, and of the scientific questions leads to a subtle inference problem that requires sophisticated statistical tools. For example, data are subject to non-uniform stochastic censoring, heteroscedastic errors in measurement, and background contamination. Astronomical sources exhibit complex and irregular spatial structure. Scientists wish to draw conclusions as to the physical environment and structure of the source, the processes and laws which govern the birth and death of planets, stars, and galaxies, and ultimately the structure and evolution of the universe. The California-Harvard Astrostatistics Collaboration is a group of astrophysicists and statisticians working together to develop statistical methods, computational techniques, and freely available software to address outstanding inferential problems in high-energy astrophysics. We emphasize fully model-based statistical inference;we explicitly model the complexities of both astronomical sources and the data generation mechanisms inherent in new high-tech instruments, and fully utilize the resulting highly structured models in learning a
暂无评论