A novel model-based classification technique is introduced based on mixtures of multivariate t-distributions. A family of four mixture models is defined by constraining, or not, the covariance matrices and the degrees...
详细信息
A novel model-based classification technique is introduced based on mixtures of multivariate t-distributions. A family of four mixture models is defined by constraining, or not, the covariance matrices and the degrees of freedom to be equal across mixture components. Parameters for each of the resulting four models are estimated using a multicycle expectation-conditional maximization algorithm, where convergence is determined using a criterion based on the Aitken acceleration. A straightforward, but very effective, technique for the initialization of the unknown component memberships is introduced and compared with a popular, more sophisticated, initialization procedure. This novel four-member family is applied to real and simulated data, where it gives good classification performance, even when compared with more established techniques. (C) 2010 Elsevier B.V. All rights reserved.
A Bayesian approach is presented for model-based classification of images with application to synthetic-aperture radar Posterior probabilities are computed for candidate hypotheses using physical features estimated fr...
详细信息
A Bayesian approach is presented for model-based classification of images with application to synthetic-aperture radar Posterior probabilities are computed for candidate hypotheses using physical features estimated from sensor data along with features predicted from these hypotheses. The likelihood scoring allows propagation of uncertainty arising in both the sensor data and object models. The Bayesian classification, including the determination of a correspondence between unordered random features, is shown to be tractable, yielding a classification algorithm, a method for estimating error rates, and a tool for evaluating performance sensitivity, The radar image features used for classification are point locations with an associated vector of physical attributes;the attributed features are adopted from a parametric model of high-frequency radar scattering. With the emergence of wideband sensor technology, these physical features expand interpretation of radar imagery to access the frequency- and aspect-dependent scattering information carried in the image phase.
A model-based classification technique is developed, based on mixtures of multivariate t-factor analyzers. Specifically, two related mixture models are developed and their classification efficacy studied. An AECM algo...
详细信息
A model-based classification technique is developed, based on mixtures of multivariate t-factor analyzers. Specifically, two related mixture models are developed and their classification efficacy studied. An AECM algorithm is used for parameter estimation, and convergence of these algorithms is determined using Aitken's acceleration. Two different techniques are proposed for model selection: the BIC and the ICL. Our classification technique is applied to data on red wine samples from Italy and to fatty acid measurements on Italian olive oils. These results are discussed and compared to more established classification techniques;under this comparison, our mixture models give excellent classification performance.
A novel model-based classification technique is introduced based on parsimonious Gaussian mixture models (PGMMs). PGMMs, which were introduced recently as a model-based clustering technique, arise from a generalizatio...
详细信息
A novel model-based classification technique is introduced based on parsimonious Gaussian mixture models (PGMMs). PGMMs, which were introduced recently as a model-based clustering technique, arise from a generalization of the mixtures of factor analyzers model and are based on a latent Gaussian mixture model. In this paper, this mixture modelling structure is used for model-based classification and the particular area of application is food authenticity. model-based classification is performed by jointly modelling data with known and unknown group memberships within a likelihood framework and then estimating parameters, including the unknown group memberships, within an alternating expectation-conditional maximization framework. model selection is carried out using the Bayesian information criteria and the quality of the maximum a posteriori classifications is summarized using the misclassification rate and the adjusted Rand index. This new model-based classification technique gives excellent classification performance when applied to real food authenticity data on the chemical properties of olive oils from nine areas of Italy. (C) 2009 Elsevier B.V. All rights reserved.
In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labe...
详细信息
In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the model-based classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.
classification methods can be used to classify samples of unknown type into known types. Many classification methods have been proposed in the chemometrics, statistical and computer science literature. model-based cla...
详细信息
classification methods can be used to classify samples of unknown type into known types. Many classification methods have been proposed in the chemometrics, statistical and computer science literature. model-based classification methods have been developed from a statistical modelling viewpoint. This approach allows for uncertainty in the classification procedure to be quantified using probabilities. Linear discriminant analysis and quadratic discriminant analysis are particular model-based classification methods. Partial least squares discriminant analysis is commonly used in food authentication studies based on spectroscopic data. This method uses partial least squares regression with a binary outcome variable for two-group classification problems. In this paper, model-based classification is compared to partial least squares discriminant analysis for its ability to correctly classify pure and adulterated honey samples when the honey has been extended by three different adulterants. Two model selection criteria are examined: the Bayesian Information Criterion and 5-fold cross validation. The methods are compared using the classification performance and the interpretability of the results. In addition, since the percentage of adulterated samples in any given sample set is unlikely to be known in a real-life setting, the ability of updating procedures within model-based clustering to accurately predict the adulterated samples, even when the proportion of pure to adulterated samples in the training data is grossly unrepresentative of the true situation, is studied in detail. The performance of both model-based and partial least squares discriminant analysis is found to be robust to the composition of the training data and to model selection method. The Bayesian Information Criterion is shown to be more robust than 5-fold cross validation as a model selection method, especially when the training data set is very small and unrepresentative of the entire data set. (c) 2
Families of mixtures of multivariate power exponential (MPE) distributions have already been introduced and shown to be competitive for cluster analysis in comparison to other mixtures of elliptical distributions, inc...
详细信息
Families of mixtures of multivariate power exponential (MPE) distributions have already been introduced and shown to be competitive for cluster analysis in comparison to other mixtures of elliptical distributions, including mixtures of Gaussian distributions. A family of mixtures of multivariate skewed power exponential distributions is proposed that combines the flexibility of the MPE distribution with the ability to model skewness. These mixtures are more robust to variations from normality and can account for skewness, varying tail weight, and peakedness of data. A generalized expectation-maximization approach, which combines minorization-maximization and optimization based on accelerated line search algorithms on the Stiefel manifold, is used for parameter estimation. These mixtures are implemented both in the unsupervised and semi-supervised classification frameworks. Both simulated and real data are used for illustration and comparison to other mixture families.
Supervised learning in presence of multiple sets of noisy labels is a challenging task that is receiving increasing interest in the ever-evolving landscape of healthcare analytics. Such an issue arises when multiple a...
详细信息
Supervised learning in presence of multiple sets of noisy labels is a challenging task that is receiving increasing interest in the ever-evolving landscape of healthcare analytics. Such an issue arises when multiple annotators are tasked to manually label the same training samples, potentially giving rise to discrepancies in class assignments among the supplied labels with respect to the ground truth. Commonly, the labeling process is entrusted to a small group of domain experts, and different level of experience and subjectivity may result in noisy training labels. To solve the classification task leveraging on the availability of multiple data annotators, we introduce a novel ensemble methodology constructed combining model-based classifiers separately trained on single sets of noisy labels. Eigenvalue Decomposition Discriminant Analysis is employed for the definition of the base learners, and six distinct averaging strategies are proposed to combine them. Two solutions necessitate a priori information, such as the partial knowledge of the ground truth labels or the annotators' level of expertise. Differently, the remaining four approaches are entirely data-driven. A simulation study and an application on real data showcase the improved predictive performance of our proposal, while also demonstrating the ability of automatically inferring annotators' expertise level as a by-product of the learning process.
This work introduces a refinement of the Parsimonious model for fitting a Gaussian Mixture. The improvement is based on the consideration of clusters of the involved covariance matrices according to a criterion, such ...
详细信息
This work introduces a refinement of the Parsimonious model for fitting a Gaussian Mixture. The improvement is based on the consideration of clusters of the involved covariance matrices according to a criterion, such as sharing Principal Directions. This and other similarity criteria that arise from the spectral decomposition of a matrix are the bases of the Parsimonious model. We show that such groupings of covariance matrices can be achieved through simple modifications of the CEM (classification Expectation Maximization) algorithm. Our approach leads to propose Gaussian Mixture models for model-based clustering and discriminant analysis, in which covariance matrices are clustered according to a parsimonious criterion, creating intermediate steps between the fourteen widely known parsimonious models. The added versatility not only allows us to obtain models with fewer parameters for fitting the data, but also provides greater interpretability. We show its usefulness for model-based clustering and discriminant analysis, providing algorithms to find approximate solutions verifying suitable size, shape and orientation constraints, and applying them to both simulation and real data examples.
In this work, a family of generative Gaussian models designed for the supervised classification of high-dimensional data is presented as well as the associated classification method called High-Dimensional Discriminan...
详细信息
In this work, a family of generative Gaussian models designed for the supervised classification of high-dimensional data is presented as well as the associated classification method called High-Dimensional Discriminant Analysis (HDDA). The features of these Gaussian models are as follows: i) the representation of the input density model is smooth;ii) the data of each class are modeled in a specific subspace of low dimensionality;iii) each class may have its own covariance structure;iv) model regularization is coupled to the classification criterion to avoid data over-fitting. To illustrate the abilities of the method, HDDA is applied on complex high-dimensional multi-class classification problems in mid-infrared and near-infrared spectroscopy and compared to state-of-the-art methods. Copyright (C) 2010 John Wiley & Sons, Ltd.
暂无评论