检索结果-内蒙古大学图书馆

Mixtures of t factor analysers with censored responses and external covariates: An application to educational data from Peru

引用

British Journal of Mathematical and Statistical Psychology 2024年第2期77卷 316-336页

作者： Wang, Wan-Lun Castro, Luis M. Li, Huei-Jyun Lin, Tsung-I Department of Statistics and Institute of Data Science National Cheng Kung University Tainan Taiwan Department of Statistics Pontificia Universidad Católica de Chile Santiago Chile Center for the Discovery of Structures in Complex Data Santiago Chile Institute of Statistics National Chung Hsing University Taichung Taiwan Department of Public Health China Medical University Taichung Taiwan

Analysing data from educational tests allows governments to make decisions for improving the quality of life of individuals in a society. One of the key responsibilities of statisticians is to develop models that provide decision-makers with pertinent information about the latent process that educational tests seek to represent. Mixtures of (Formula presented.) factor analysers (MtFA) have emerged as a powerful device for model-based clustering and classification of high-dimensional data containing one or several groups of observations with fatter tails or anomalous outliers. This paper considers an extension of MtFA for robust clustering of censored data, referred to as the MtFAC model, by incorporating external covariates. The enhanced flexibility of including covariates in MtFAC enables cluster-specific multivariate regression analysis of dependent variables with censored responses arising from upper and/or lower detection limits of experimental equipment. An alternating expectation conditional maximization (aecm) algorithm is developed for maximum likelihood estimation of the proposed model. Two simulation experiments are conducted to examine the effectiveness of the techniques presented. Furthermore, the proposed methodology is applied to Peruvian data from the 2007 Early Grade Reading Assessment, and the results obtained from the analysis provide new insights regarding the reading skills of Peruvian students. © 2023 British Psychological Society.

关键词： aecm algorithm censored data factor analysis outliers truncated multivariate t distribution

来源：评论

学校读者我要写书评

暂无评论

A robust factor analysis model based on the canonical fundamental skew-t distribution

引用

Statistical Papers 2023年第2期64卷 367-393页

作者： Lin, Tsung-I Chen, I-An Wang, Wan-Lun Department of Statistics and Institute of Data Science National Cheng Kung University Tainan 701 Taiwan Institute of Statistics National Chung Hsing University Taichung 402 Taiwan Department of Public Health China Medical University Taichung 404 Taiwan

The traditional factor analysis rested on the assumption of multivariate normality has been extended by considering the restricted multivariate skew-t (rMST) distribution for the unobserved factors and errors jointly. However, the rMST distribution has limited use for characterising skewness that concentrates in a single direction. This paper is devoted to introducing a more flexible robust factor analysis model based on the broader canonical fundamental skew-t (CFUST) distribution, called the CFUSTFA model. The proposed new model can account for more complex features of skewness toward multiple directions. An efficient alternating expectation conditional maximization algorithm fabricated under several reduced complete-data spaces is developed to estimate parameters under the maximum likelihood (ML) perspective. To assess the variability of parameter estimates, we present an information-based approach to approximating the asymptotic covariance matrix of the ML estimators. The effectiveness and applicability of the proposed techniques are demonstrated through the analysis of simulated and real datasets. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.

关键词： aecm algorithm Canonical fundamental skew-t distribution Factor scores Truncated multivariate t distribution Unrestricted multivariate skew-t distribution

来源：评论

学校读者我要写书评

暂无评论

Robust clustering of multiply censored data via mixtures of t factor analyzers

引用

TEST 2022年第1期31卷 22-53页

作者： Wang, Wan-Lun Lin, Tsung-, I Feng Chia Univ Grad Inst Stat & Actuarial Sci Dept Stat Taichung Taiwan Natl Chung Hsing Univ Inst Stat Taichung Taiwan China Med Univ Dept Publ Hlth Taichung Taiwan

Mixtures of t factor analyzers (MtFA) have been well recognized as a prominent tool in modeling and clustering multivariate data contaminated with heterogeneity and outliers. In certain practical situations, however, data are likely to be censored such that the standard methodology becomes computationally complicated or even infeasible. This paper presents an extended framework of MtFA that can accommodate censored data, referred to as MtFAC in short. For maximum likelihood estimation, we construct an alternating expectation conditional maximization algorithm in which the E-step relies on the first-two moments of truncated multivariate-t distributions and CM-steps offer tractable solutions of updated estimators. Asymptotic standard errors of mixing proportions and component mean vectors are derived by means of missing information principle, or the so-called Louis' method. Several numerical experiments are conducted to examine the finite-sample properties of estimators and the ability of the proposed model to downweight the impact of censoring and outlying effects. Further, the efficacy and usefulness of the proposed method are also demonstrated by analyzing a real dataset with genuine censored observations.

关键词： aecm algorithm Censored data Factor analysis Maximum likelihood estimation Missing information principle Truncated multivariate t distribution

来源：评论

学校读者我要写书评

暂无评论

State space functional principal component analysis to identify spatiotemporal patterns in remote sensing lake water quality

引用

STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT 2021年第12期35卷 2521-2536页

作者： Gong, Mengyi Miller, Claire Scott, Marian O'Donnell, Ruth Simis, Stefan Groom, Steve Tyler, Andrew Hunter, Peter Spyrakos, Evangelos Univ Lancaster Dept Math & Stat Lancaster LA1 4YF England Univ Glasgow Sch Math & Stat Glasgow G12 8QQ Lanark Scotland Plymouth Marine Lab Plymouth PL1 3JH Devon England Univ Stirling Sch Biol & Environm Sci Stirling KF9 4LA Scotland

Satellite remote sensing can provide indicative measures of environmental variables that are crucial to understanding the environment. The spatial and temporal coverage of satellite images allows scientists to investigate the changes in environmental variables in an unprecedented scale. However, identifying spatiotemporal patterns from such images is challenging due to the complexity of the data, which can be large in volume yet sparse within individual images. This paper proposes a new approach, state space functional principal components analysis (SS-FPCA), to identify the spatiotemporal patterns in processed satellite retrievals and simultaneously reduce the dimensionality of the data, through the use of functional principal components. Furthermore our approach can be used to produce interpolations over the sparse areas. An algorithm based on the alternating expectation-conditional maximisation framework is proposed to estimate the model. The uncertainty of the estimated parameters is investigated through a parametric bootstrap procedure. Lake chlorophyll-a data hold key information on water quality status. Such information is usually only available from limited in situ sampling locations or not at all for remote inaccessible lakes. In this paper, the SS-FPCA is used to investigate the spatiotemporal patterns in chlorophyll-a data of Taruo Lake on the Tibetan Plateau, observed by the European Space Agency MEdium Resolution Imaging Spectrometer.

关键词： Functional principal component analysis State space model aecm algorithm Remote sensing images Lake chlorophyll-a

来源：评论

学校读者我要写书评

暂无评论

Mixtures of factor analyzers with covariates for modeling multiply censored dependent variables

引用

STATISTICAL PAPERS 2021年第5期62卷 2119-2145页

作者： Wang, Wan-Lun Castro, Luis M. Hsieh, Wan-Chen Lin, Tsung-, I Feng Chia Univ Dept Stat Grad Inst Stat & Actuarial Sci Taichung 40724 Taiwan Pontificia Univ Catolica Chile Dept Stat Casilla 306Correo 22 Santiago Chile Millennium Nucleus Ctr Discovery Struct Complex D Santiago Chile Pontificia Univ Catolica Chile Ctr Riesgos & Seguros UC Santiago Chile Natl Chung Hsing Univ Inst Stat Taichung 402 Taiwan China Med Univ Dept Publ Hlth Taichung 404 Taiwan

Censored data arise frequently in diverse applications in which observations to be measured may be subject to some upper and lower detection limits due to the restriction of experimental apparatus such that they are not exactly quantifiable. Mixtures of factor analyzers with censored data (MFAC) have been recently proposed for model-based density estimation and clustering of high-dimensional data in the presence of censored observations. In this paper, we consider an extended version of MFAC by considering regression equations to describe the relationship between covariates and multiply censored dependent variables. Two analytically feasible EM-type algorithms are developed for computing maximum likelihood estimates of model parameters with closed-form expressions. Moreover, we provide an information-based method to compute asymptotic standard errors of mixing proportions and regression coefficients. The utility and performance of our proposed methodology are illustrated through a simulation study and two real data examples.

关键词： aecm algorithm Censored data Detection limit Factor analysis ML estimation Truncated multivariate normal distribution

来源：评论

学校读者我要写书评

暂无评论

Robust Bilinear Probabilistic Principal Component Analysis

引用

algorithmS 2021年第11期14卷 322页

作者： Lu, Yaohang Teng, Zhongming Fujian Agr & Forestry Univ Coll Comp & Informat Sci Fuzhou 350002 Peoples R China

Principal component analysis (PCA) is one of the most popular tools in multivariate exploratory data analysis. Its probabilistic version (PPCA) based on the maximum likelihood procedure provides a probabilistic manner to implement dimension reduction. Recently, the bilinear PPCA (BPPCA) model, which assumes that the noise terms follow matrix variate Gaussian distributions, has been introduced to directly deal with two-dimensional (2-D) data for preserving the matrix structure of 2-D data, such as images, and avoiding the curse of dimensionality. However, Gaussian distributions are not always available in real-life applications which may contain outliers within data sets. In order to make BPPCA robust for outliers, in this paper, we propose a robust BPPCA model under the assumption of matrix variate t distributions for the noise terms. The alternating expectation conditional maximization (aecm) algorithm is used to estimate the model parameters. Numerical examples on several synthetic and publicly available data sets are presented to demonstrate the superiority of our proposed model in feature extraction, classification and outlier detection.

关键词： 2-D data probabilistic principal component analysis aecm algorithm matrix variate Gaussian distributions matrix variate t distributions outliers

来源：评论

学校读者我要写书评

暂无评论

Model-based clustering of censored data via mixtures of factor analyzers

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2019年 140卷 104-121页

作者： Wang, Wan-Lun Castro, Luis M. Lachos, Victor H. Lin, Tsung-I Feng Chia Univ Grad Inst Stat & Actuarial Sci Dept Stat Taichung 40724 Taiwan Pontificia Univ Catolica Chile Dept Stat Casilla 306Correo 22 Santiago Chile Millennium Nucleus Ctr Discovery Struct Complex D Santiago Chile Pontificia Univ Catolica Chile Ctr Riesgos & Seguros UC Santiago Chile Univ Connecticut Dept Stat Storrs CT 06269 USA Natl Chung Hsing Univ Inst Stat Taichung 402 Taiwan China Med Univ Dept Publ Hlth Taichung 404 Taiwan

Mixtures of factor analyzers (MFA) provide a promising tool for modeling and clustering high-dimensional data that contain an overwhelmingly large number of attributes measured on individuals arisen from a heterogeneous population. Due to the restriction of experimental apparatus, measurements can be limited to some lower and/or upper detection bounds and thus the data are possibly censored. In this paper, we extend the MFA to accommodate censored data, and the new model is called the MFA with censoring (MFAC). A computationally feasible alternating expectation conditional maximization (aecm) algorithm is developed to carry out maximum likelihood estimation of the MFAC model. Practical issues related to model-based clustering and recovery of censored data are also discussed. Simulation studies are conducted to examine the effect of censoring in classification, estimation and cluster validation. We also present an application of the proposed approach to two real data examples in which a certain number of left-censored observations are present. (C) 2019 Elsevier B.V. All rights reserved.

关键词： aecm algorithm Censored data Detection limit Outright clustering Truncated multivariate normal distribution

来源：评论

学校读者我要写书评

暂无评论

Extending finite mixtures of t linear mixed-effects models with concomitant covariates

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2020年 148卷 106961-000页

作者： Yang, Yu-Chen Lin, Tsung-, I Castro, Luis M. Wang, Wan-Lun Natl Chung Hsing Univ Dept Appl Math Taichung 402 Taiwan Natl Chung Hsing Univ Inst Stat Taichung 402 Taiwan China Med Univ Dept Publ Hlth Taichung 404 Taiwan Pontificia Univ Catolica Chile Dept Stat Casilla 306Correo 22 Santiago Chile Chilean Govt Millennium Nucleus Ctr Discovery Struct Complex D Santiago Chile Pontificia Univ Catolica Chile Ctr Riesgos & Seguros UC Santiago Chile Feng Chia Univ Grad Inst Stat & Actuarial Sci Dept Stat Taichung 40724 Taiwan

The issue of model-based clustering of longitudinal data has attracted increasing attention in past two decades. Finite mixtures of Student's-t linear mixed-effects (FM-tLME) models have been considered for implementing this task especially when data contain extreme observations. This paper presents an extended finite mixtures of Student's-t linear mixed-effects (EFM-tLME) model, where the categorical component labels are assumed to be influenced by the observed covariates. As compared with the naive methods assuming the mixing proportions to be fixed but unknown, the proposed EFM-tLME model exploits a logistic function to link the relationship between the prior classification probabilities and the covariates of interest. To carry out maximum likelihood estimation, an alternating expectation conditional maximization (aecm) algorithm is developed under several model reduction schemes. The technique for extracting the information-based standard errors of parameter estimates is also investigated. The proposed method is illustrated using simulation experiments and real data from an AIDS clinical study. (C) 2020 Elsevier B.V. All rights reserved.

关键词： aecm algorithm Heavy-tailed behavior Longitudinal data Model-based clustering Multivariate Student's-t distribution

来源：评论

学校读者我要写书评

暂无评论

DYNAMIC MIXTURES OF FACTOR ANALYZERS TO CHARACTERIZE MULTIVARIATE AIR POLLUTANT EXPOSURES

引用

ANNALS OF APPLIED STATISTICS 2017年第3期11卷 1617-1648页

作者： Maruotti, Antonello Bulla, Jan Lagona, Francesco Picone, Marco Martella, Francesca Univ Southampton Ctr Innovat & Leadership Hlth Sci Univ Rd Southampton SO17 1BJ Hants England Libera Univ Maria Ss Assunta Dipartimento Sci Econ Polit & Lingue Moderne Via Pompeo Magno 22 I-00192 Rome Italy Univ Bergen Dept Math Allegaten 41 N-5007 Bergen Norway Univ Rome Tre Dipartimento Sci Polit Via Grabriello Chiabrera 199 I-00145 Rome Italy Inst Environm Protect & Res ISPRA Dipartimento Tutela Acque Interne & Marine Via Brancati 48 I-00144 Rome Italy Sapienza Univ Roma Dipartimento Sci Stat Piazzale Aldo Moro 5 I-00185 Rome Italy

The assessment of pollution exposure is based on the analysis of a multivariate time series that include the concentrations of several pollutants as well as the measurements of multiple atmospheric variables. It typically requires methods of dimensionality reduction that are capable of identifying potentially dangerous combinations of pollutants and simultaneously segmenting exposure periods according to air quality conditions. When the data are high-dimensional, however, efficient methods of dimensionality reduction are challenging because of the formidable structure of cross-correlations that arise from the dynamic interaction between weather conditions and natural/anthropogenic pollution sources. In order to assess pollution exposure in an urban area while taking the above mentioned difficulties into account, we have developed a class of parsimonious hidden Markov models. In a multivariate time series setting, this approach simultaneously allows for the performance of temporal segmentation and dimensionality reduction. We specifically approximate the distribution of multiple pollutant concentrations by mixtures of factor analysis models, whose parameters evolve according to a latent Markov chain. Covariates are included as predictors of the chain transition probabilities. Parameter constraints on the factorial component of the model are exploited to tune the flexibility of dimensionality reduction. In order to estimate the model parameters efficiently, we have proposed a novel three-step Alternating Expected Conditional Maximization (aecm) algorithm, which is also assessed in a simulation study. In the case study, the proposed methods could (1) describe the exposure to pollution in terms of a few latent regimes, (2) associate these regimes with specific combinations of pollutant concentration levels as well as distinct correlation structures between concentrations, and (3) capture the influence of weather conditions on transitions between regimes.

关键词： Hidden Markov models aecm algorithm dimensionality reduction three-step algorithm

来源：评论

学校读者我要写书评

暂无评论

A mixture of generalized hyperbolic factor analyzers

引用

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION 2016年第4期10卷 423-440页

作者： Tortora, Cristina McNicholas, Paul D. Browne, Ryan P. McMaster Univ Dept Math & Stat Hamilton ON Canada

The mixture of factor analyzers model, which has been used successfully for the model-based clustering of high-dimensional data, is extended to generalized hyperbolic mixtures. The development of a mixture of generalized hyperbolic factor analyzers is outlined, drawing upon the relationship with the generalized inverse Gaussian distribution. An alternating expectation-conditional maximization algorithm is used for parameter estimation, and the Bayesian information criterion is used to select the number of factors as well as the number of components. The performance of our generalized hyperbolic factor analyzers model is illustrated on real and simulated data, where it performs favourably compared to its Gaussian analogue and other approaches.

关键词： Clustering Generalized hyperbolic distribution Mixture of factor analyzers aecm algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：