Analysing data from educational tests allows governments to make decisions for improving the quality of life of individuals in a society. One of the key responsibilities of statisticians is to develop models that prov...
详细信息
The traditional factor analysis rested on the assumption of multivariate normality has been extended by considering the restricted multivariate skew-t (rMST) distribution for the unobserved factors and errors jointly....
详细信息
Mixtures of t factor analyzers (MtFA) have been well recognized as a prominent tool in modeling and clustering multivariate data contaminated with heterogeneity and outliers. In certain practical situations, however, ...
详细信息
Mixtures of t factor analyzers (MtFA) have been well recognized as a prominent tool in modeling and clustering multivariate data contaminated with heterogeneity and outliers. In certain practical situations, however, data are likely to be censored such that the standard methodology becomes computationally complicated or even infeasible. This paper presents an extended framework of MtFA that can accommodate censored data, referred to as MtFAC in short. For maximum likelihood estimation, we construct an alternating expectation conditional maximization algorithm in which the E-step relies on the first-two moments of truncated multivariate-t distributions and CM-steps offer tractable solutions of updated estimators. Asymptotic standard errors of mixing proportions and component mean vectors are derived by means of missing information principle, or the so-called Louis' method. Several numerical experiments are conducted to examine the finite-sample properties of estimators and the ability of the proposed model to downweight the impact of censoring and outlying effects. Further, the efficacy and usefulness of the proposed method are also demonstrated by analyzing a real dataset with genuine censored observations.
Satellite remote sensing can provide indicative measures of environmental variables that are crucial to understanding the environment. The spatial and temporal coverage of satellite images allows scientists to investi...
详细信息
Satellite remote sensing can provide indicative measures of environmental variables that are crucial to understanding the environment. The spatial and temporal coverage of satellite images allows scientists to investigate the changes in environmental variables in an unprecedented scale. However, identifying spatiotemporal patterns from such images is challenging due to the complexity of the data, which can be large in volume yet sparse within individual images. This paper proposes a new approach, state space functional principal components analysis (SS-FPCA), to identify the spatiotemporal patterns in processed satellite retrievals and simultaneously reduce the dimensionality of the data, through the use of functional principal components. Furthermore our approach can be used to produce interpolations over the sparse areas. An algorithm based on the alternating expectation-conditional maximisation framework is proposed to estimate the model. The uncertainty of the estimated parameters is investigated through a parametric bootstrap procedure. Lake chlorophyll-a data hold key information on water quality status. Such information is usually only available from limited in situ sampling locations or not at all for remote inaccessible lakes. In this paper, the SS-FPCA is used to investigate the spatiotemporal patterns in chlorophyll-a data of Taruo Lake on the Tibetan Plateau, observed by the European Space Agency MEdium Resolution Imaging Spectrometer.
Censored data arise frequently in diverse applications in which observations to be measured may be subject to some upper and lower detection limits due to the restriction of experimental apparatus such that they are n...
详细信息
Censored data arise frequently in diverse applications in which observations to be measured may be subject to some upper and lower detection limits due to the restriction of experimental apparatus such that they are not exactly quantifiable. Mixtures of factor analyzers with censored data (MFAC) have been recently proposed for model-based density estimation and clustering of high-dimensional data in the presence of censored observations. In this paper, we consider an extended version of MFAC by considering regression equations to describe the relationship between covariates and multiply censored dependent variables. Two analytically feasible EM-type algorithms are developed for computing maximum likelihood estimates of model parameters with closed-form expressions. Moreover, we provide an information-based method to compute asymptotic standard errors of mixing proportions and regression coefficients. The utility and performance of our proposed methodology are illustrated through a simulation study and two real data examples.
Principal component analysis (PCA) is one of the most popular tools in multivariate exploratory data analysis. Its probabilistic version (PPCA) based on the maximum likelihood procedure provides a probabilistic manner...
详细信息
Principal component analysis (PCA) is one of the most popular tools in multivariate exploratory data analysis. Its probabilistic version (PPCA) based on the maximum likelihood procedure provides a probabilistic manner to implement dimension reduction. Recently, the bilinear PPCA (BPPCA) model, which assumes that the noise terms follow matrix variate Gaussian distributions, has been introduced to directly deal with two-dimensional (2-D) data for preserving the matrix structure of 2-D data, such as images, and avoiding the curse of dimensionality. However, Gaussian distributions are not always available in real-life applications which may contain outliers within data sets. In order to make BPPCA robust for outliers, in this paper, we propose a robust BPPCA model under the assumption of matrix variate t distributions for the noise terms. The alternating expectation conditional maximization (aecm) algorithm is used to estimate the model parameters. Numerical examples on several synthetic and publicly available data sets are presented to demonstrate the superiority of our proposed model in feature extraction, classification and outlier detection.
Mixtures of factor analyzers (MFA) provide a promising tool for modeling and clustering high-dimensional data that contain an overwhelmingly large number of attributes measured on individuals arisen from a heterogeneo...
详细信息
Mixtures of factor analyzers (MFA) provide a promising tool for modeling and clustering high-dimensional data that contain an overwhelmingly large number of attributes measured on individuals arisen from a heterogeneous population. Due to the restriction of experimental apparatus, measurements can be limited to some lower and/or upper detection bounds and thus the data are possibly censored. In this paper, we extend the MFA to accommodate censored data, and the new model is called the MFA with censoring (MFAC). A computationally feasible alternating expectation conditional maximization (aecm) algorithm is developed to carry out maximum likelihood estimation of the MFAC model. Practical issues related to model-based clustering and recovery of censored data are also discussed. Simulation studies are conducted to examine the effect of censoring in classification, estimation and cluster validation. We also present an application of the proposed approach to two real data examples in which a certain number of left-censored observations are present. (C) 2019 Elsevier B.V. All rights reserved.
The issue of model-based clustering of longitudinal data has attracted increasing attention in past two decades. Finite mixtures of Student's-t linear mixed-effects (FM-tLME) models have been considered for implem...
详细信息
The issue of model-based clustering of longitudinal data has attracted increasing attention in past two decades. Finite mixtures of Student's-t linear mixed-effects (FM-tLME) models have been considered for implementing this task especially when data contain extreme observations. This paper presents an extended finite mixtures of Student's-t linear mixed-effects (EFM-tLME) model, where the categorical component labels are assumed to be influenced by the observed covariates. As compared with the naive methods assuming the mixing proportions to be fixed but unknown, the proposed EFM-tLME model exploits a logistic function to link the relationship between the prior classification probabilities and the covariates of interest. To carry out maximum likelihood estimation, an alternating expectation conditional maximization (aecm) algorithm is developed under several model reduction schemes. The technique for extracting the information-based standard errors of parameter estimates is also investigated. The proposed method is illustrated using simulation experiments and real data from an AIDS clinical study. (C) 2020 Elsevier B.V. All rights reserved.
The assessment of pollution exposure is based on the analysis of a multivariate time series that include the concentrations of several pollutants as well as the measurements of multiple atmospheric variables. It typic...
详细信息
The assessment of pollution exposure is based on the analysis of a multivariate time series that include the concentrations of several pollutants as well as the measurements of multiple atmospheric variables. It typically requires methods of dimensionality reduction that are capable of identifying potentially dangerous combinations of pollutants and simultaneously segmenting exposure periods according to air quality conditions. When the data are high-dimensional, however, efficient methods of dimensionality reduction are challenging because of the formidable structure of cross-correlations that arise from the dynamic interaction between weather conditions and natural/anthropogenic pollution sources. In order to assess pollution exposure in an urban area while taking the above mentioned difficulties into account, we have developed a class of parsimonious hidden Markov models. In a multivariate time series setting, this approach simultaneously allows for the performance of temporal segmentation and dimensionality reduction. We specifically approximate the distribution of multiple pollutant concentrations by mixtures of factor analysis models, whose parameters evolve according to a latent Markov chain. Covariates are included as predictors of the chain transition probabilities. Parameter constraints on the factorial component of the model are exploited to tune the flexibility of dimensionality reduction. In order to estimate the model parameters efficiently, we have proposed a novel three-step Alternating Expected Conditional Maximization (aecm) algorithm, which is also assessed in a simulation study. In the case study, the proposed methods could (1) describe the exposure to pollution in terms of a few latent regimes, (2) associate these regimes with specific combinations of pollutant concentration levels as well as distinct correlation structures between concentrations, and (3) capture the influence of weather conditions on transitions between regimes.
The mixture of factor analyzers model, which has been used successfully for the model-based clustering of high-dimensional data, is extended to generalized hyperbolic mixtures. The development of a mixture of generali...
详细信息
The mixture of factor analyzers model, which has been used successfully for the model-based clustering of high-dimensional data, is extended to generalized hyperbolic mixtures. The development of a mixture of generalized hyperbolic factor analyzers is outlined, drawing upon the relationship with the generalized inverse Gaussian distribution. An alternating expectation-conditional maximization algorithm is used for parameter estimation, and the Bayesian information criterion is used to select the number of factors as well as the number of components. The performance of our generalized hyperbolic factor analyzers model is illustrated on real and simulated data, where it performs favourably compared to its Gaussian analogue and other approaches.
暂无评论