检索结果-内蒙古大学图书馆

model-based clustering via linear cluster-weighted models

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2014年 71卷 159-182页

作者： Ingrassia, Salvatore Minotti, Simona C. Punzo, Antonio Univ Catania Dept Econ & Business I-95129 Catania Italy Univ Milano Bicocca Dept Stat & Quantitat Methods I-20126 Milan Italy

A novel family of twelve mixture models with random covariates, nested in the linear t cluster-weighted model (CWM), is introduced for model-based clustering. The linear t CWM was recently presented as a robust alternative to the better known linear Gaussian CWM. The proposed family of models provides a unified framework that also includes the linear Gaussian CWM as a special case. Maximum likelihood parameter estimation is carried out within the EM framework, and both the BIC and the ICL are used for model selection. A simple and effective hierarchical random initialization is also proposed for the EM algorithm. The novel model-based clustering technique is illustrated in some applications to real data. Finally, a simulation study for evaluating the performance of the BIC and the ICL is presented. (C) 2013 Elsevier B.V. All rights reserved.

关键词： Cluster-weighted model Mixture models with random covariates model-based clustering Multivariate t distribution

来源：评论

学校读者我要写书评

暂无评论

model-based clustering of high-dimensional data: Variable selection versus facet determination

引用

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING 2013年第1期54卷 196-215页

作者： Poon, Leonard K. M. Zhang, Nevin L. Liu, Tengfei Liu, April H. Hong Kong Univ Sci & Technol Dept Comp Sci & Engn Hong Kong Hong Kong Peoples R China

Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the "best" clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection. (C) 2012 Elsevier Inc. All rights reserved.

关键词： model-based clustering Facet determination Variable selection Latent tree models Gaussian mixture models

来源：评论

学校读者我要写书评

暂无评论

model-based clustering WITH DATA CORRECTION FOR REMOVING ARTIFACTS IN GENE EXPRESSION DATA

引用

ANNALS OF APPLIED STATISTICS 2017年第4期11卷 1998-2026页

作者： Young, William Chad Raftery, Adrian E. Yeung, Ka Yee Univ Washington Dept Stat Box 354322 Seattle WA 98195 USA Univ Washington Inst Technol Campus Box 3584261900 Commerce St Tacoma WA 98402 USA

The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.

关键词： model-based clustering MCDC gene regulatory network LINCS

来源：评论

学校读者我要写书评

暂无评论

model-based clustering and analysis of life history data

引用

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY 2020年第3期183卷 1231-1251页

作者： Scott, Marc A. Mohan, Kaushik Gauthier, Jacques-Antoine NYU 3rd Floor246 Greene St New York NY 10003 USA Univ Lausanne Lausanne Switzerland

Methods and models for longitudinal data with categorical, multi-dimensional outcomes are quite limited, but they are essential to the study of life histories. For example, in the Swiss Household Panel, information on the co-residence and professional status of several thousand individuals is available through to age 45 years. Interest centres on the time and order of life course events such as having children and working full or part time and the duration of the phases that they delineate. With data of this type, optimal matching and clustering algorithms relying on a distance metric or parametric models of duration in a competing risks framework are used;the appropriateness of each derives from competing goals and orientation. We prefer model-based approaches when certain goals are paramount: simulation of individual trajectories;adjusting for time-dependent covariates;handling multistate trajectories and missing outcomes. Several of these goals are particularly challenging when the number of states is of moderate size, and many transitions are infrequent and/or time inhomogeneous. Using the Swiss Household Panel, we demonstrate the appropriateness of latent class growth curve models for analysing sequence data. In particular, models including heterogeneous dependence structure provide new techniques for assessing goodness of fit as well as yield insights into social processes.

关键词： Categorical data Life course studies Longitudinal data model-based clustering Sequence analysis Swiss Household Panel

来源：评论

学校读者我要写书评

暂无评论

model-based clustering of high-dimensional data: A review

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2014年 71卷 52-78页

作者： Bouveyron, Charles Brunet-Saumard, Camille Univ Paris 01 Lab SAMM EA 4543 F-75231 Paris 05 France Univ Angers Lab LAREMA UMR CNRS 6093 F-49045 Angers France

model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, high-dimensional data are nowadays more and more frequent and, unfortunately, classical model-based clustering techniques show a disappointing behavior in high-dimensional spaces. This is mainly due to the fact that model-based clustering methods are dramatically over-parametrized in this case. However, high-dimensional spaces have specific characteristics which are useful for clustering and recent techniques exploit those characteristics. After having recalled the bases of model-based clustering, dimension reduction approaches, regularization-based techniques, parsimonious modeling, subspace clustering methods and clustering methods based on variable selection are reviewed. Existing softwares for model-based clustering of high-dimensional data will be also reviewed and their practical use will be illustrated on real-world data sets. (C) 2012 Elsevier B.V. All rights reserved.

关键词： model-based clustering High-dimensional data Dimension reduction Regularization Parsimonious models Subspace clustering Variable selection Software R package

来源：评论

学校读者我要写书评

暂无评论

model-based clustering of Categorical Time Series

引用

BAYESIAN ANALYSIS 2010年第2期5卷 345-368页

作者： Pamminger, Christoph Fruehwirth-Schnatter, Sylvia Johannes Kepler Univ Linz Dept Appl Stat Linz Austria

Two approaches for model-based clustering of categorical time series based on time-homogeneous first-order Markov chains are discussed. For Markov chain clustering the individual transition probabilities are fixed to a group-specific transition matrix. In a new approach called Dirichlet multinomial clustering the rows of the individual transition matrices deviate from the group mean and follow a Dirichlet distribution with unknown group-specific hyperparameters. Estimation is carried out through Markov chain Monte Carlo. Various well-known clustering criteria are applied to select the number of groups. An application to a panel of Austrian wage mobility data leads to an interesting segmentation of the Austrian labor market.

关键词： Markov chain Monte Carlo model-based clustering panel data transition matrices labor market wage mobility

来源：评论

学校读者我要写书评

暂无评论

model-based clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions

引用

JOURNAL OF CLASSIFICATION 2023年第1期40卷 145-167页

作者： Dang, Utkarsh J. Gallaugher, Michael P. B. Browne, Ryan P. McNicholas, Paul D. Carleton Univ Dept Hlth Sci Ottawa ON Canada Baylor Univ Dept Stat Sci Waco TX USA Univ Waterloo Dept Stat & Actuarial Sci Waterloo ON Canada McMaster Univ Dept Math & Stat Hamilton ON Canada

Families of mixtures of multivariate power exponential (MPE) distributions have already been introduced and shown to be competitive for cluster analysis in comparison to other mixtures of elliptical distributions, including mixtures of Gaussian distributions. A family of mixtures of multivariate skewed power exponential distributions is proposed that combines the flexibility of the MPE distribution with the ability to model skewness. These mixtures are more robust to variations from normality and can account for skewness, varying tail weight, and peakedness of data. A generalized expectation-maximization approach, which combines minorization-maximization and optimization based on accelerated line search algorithms on the Stiefel manifold, is used for parameter estimation. These mixtures are implemented both in the unsupervised and semi-supervised classification frameworks. Both simulated and real data are used for illustration and comparison to other mixture families.

关键词： Generalized expectation-maximization algorithm Mixture models model-based classification model-based clustering Multivariate skewed power exponential distribution

来源：评论

学校读者我要写书评

暂无评论

Bayesian regularization for normal mixture estimation and model-based clustering

引用

JOURNAL OF CLASSIFICATION 2007年第2期24卷 155-181页

作者： Fraley, Chris Raftery, Adrian E. Univ Washington Dept Stat Seattle WA 98195 USA

Normal mixture models are widely used for statistical modeling of data, including cluster analysis. However maximum likelihood estimation (MLE) for normal mixtures using the EM algorithm may fail as the result of singularities or degeneracies. To avoid this, we propose replacing the MLE by a maximum a posteriori (MAP) estimator, also found by the EM algorithm. For choosing the number of components and the model parameterization, we propose a modified version of BIC, where the likelihood is evaluated at the MAP instead of the MLE. We use a highly dispersed proper conjugate prior, containing a small fraction of one observation's worth of information. The resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE, EM and BIC.

关键词： BIC EM algorithm mixture models model-based clustering conjugate prior posterior mode

来源：评论

学校读者我要写书评

暂无评论

model-based clustering

引用

JOURNAL OF CLASSIFICATION 2016年第3期33卷 331-373页

作者： McNicholas, Paul D. McMaster Univ Hamilton ON L8S 4L8 Canada

The notion of defining a cluster as a component in a mixture model was put forth by Tiedeman in 1955;since then, the use of mixture models for clustering has grown into an important subfield of classification. Considering the volume of work within this field over the past decade, which seems equal to all of that which went before, a review of work to date is timely. First, the definition of a cluster is discussed and some historical context for model-based clustering is provided. Then, starting with Gaussian mixtures, the evolution of model-based clustering is traced, from the famous paper by Wolfe in 1965 to work that is currently available only in preprint form. This review ends with a look ahead to the next decade or so.

关键词： Cluster Cluster analysis Mixture models model-based clustering

来源：评论

学校读者我要写书评

暂无评论

model-based clustering with sparse covariance matrices

引用

STATISTICS AND COMPUTING 2019年第4期29卷 791-819页

作者： Fop, Michael Murphy, Thomas Brendan Scrucca, Luca Univ Coll Dublin Sch Math & Stat Dublin Ireland Univ Coll Dublin Insight Res Ctr Dublin Ireland Univ Perugia Dept Econ Perugia Italy

Finite Gaussian mixture models are widely used for model-based clustering of continuous data. Nevertheless, since the number of model parameters scales quadratically with the number of variables, these models can be easily over-parameterized. For this reason, parsimonious models have been developed via covariance matrix decompositions or assuming local independence. However, these remedies do not allow for direct estimation of sparse covariance matrices nor do they take into account that the structure of association among the variables can vary from one cluster to the other. To this end, we introduce mixtures of Gaussian covariance graph models for model-based clustering with sparse covariance matrices. A penalized likelihood approach is employed for estimation and a general penalty term on the graph configurations can be used to induce different levels of sparsity and incorporate prior knowledge. model estimation is carried out using a structural-EM algorithm for parameters and graph structure estimation, where two alternative strategies based on a genetic algorithm and an efficient stepwise search are proposed for inference. With this approach, sparse component covariance matrices are directly obtained. The framework results in a parsimonious model-based clustering of the data via a flexible model for the within-group joint distribution of the variables. Extensive simulated data experiments and application to illustrative datasets show that the method attains good classification performance and model quality. The general methodology for model-based clustering with sparse covariance matrices is implemented in the R package mixggm, available on CRAN.

关键词： Finite Gaussian mixture models Gaussian graphical models Genetic algorithm model-based clustering Penalized likelihood Sparse covariance matrices Stepwise search Structural-EM algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：