The Self-Organizing Feature Maps (SOFM;Kohonen 1984) algorithm is a well-known example of unsupervised learning in connectionism and is a clustering method closely related to the k-means. Generally the data set is ava...
详细信息
The Self-Organizing Feature Maps (SOFM;Kohonen 1984) algorithm is a well-known example of unsupervised learning in connectionism and is a clustering method closely related to the k-means. Generally the data set is available before running the algorithm and the clustering problem can be approached by an inertia criterion optimization. In this paper we consider the probabilistic approach to this problem. We propose a new algorithm based on the Expectation Maximization principle (em;Dempster, Laird, and Rubin 1977). The new method can be viewed as a Kohonen type of em and gives a better insight into the SOFM according to constrained clustering. We perform numerical experiments and compare our results with the standard Kohonen approach.
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and m...
详细信息
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.
We discuss ways of analysing panel data when the response is binary and there is attrition or drop-out. In general, informative or non-ignorable drop-out models are non-identifiable and arbitrary constraints on the dr...
详细信息
We discuss ways of analysing panel data when the response is binary and there is attrition or drop-out. In general, informative or non-ignorable drop-out models are non-identifiable and arbitrary constraints on the drop-out model must be imposed before carrying out a statistical analysis. The problem is particularly acute when predictors as well as response variables are lost by attrition. We describe a likelihood-based method for dealing with the drop-out process in this difficult case and show how the effect of non-identifiability can be reduced by importing additional data from a cross-sectional survey of the same population. The methods are primarily motivated by data from the 1987-92 British Election Panel Study and the 1992 British Election Study.
An em algorithm to obtain maximum a posteriori estimates ibr incomplete categorical data under informative general censoring is presented. It is an alternative version to the Bayesian approach described by Paulino and...
详细信息
An em algorithm to obtain maximum a posteriori estimates ibr incomplete categorical data under informative general censoring is presented. It is an alternative version to the Bayesian approach described by Paulino and Pereira but which allows more general prior specifications.
An em algorithm is developed for computing the maximum likelihood estimates, along with their standard errors, of the accuracy rates of a new medical diagnostic test, as well as for those of a reference test (not nece...
详细信息
An em algorithm is developed for computing the maximum likelihood estimates, along with their standard errors, of the accuracy rates of a new medical diagnostic test, as well as for those of a reference test (not necessarily a perfect gold standard), based on the outcomes of the tests when both are applied simultaneously to individuals with unknown disease state sampled from an arbitrary number of populations for which the prevalence rate of the disease in question is also unknown. This algorithm is heuristically appealing in that it also estimates the prevalence rate in each population and aids the perception of the effects of numerical constraints imposed on some of the rate parameters. Several illustrative examples are provided.
Item response theory models posit latent variables to account for regularities in students' performances on test items. Wilson's ''Saltus'' model extends the ideas of IRT to development that oc...
详细信息
Item response theory models posit latent variables to account for regularities in students' performances on test items. Wilson's ''Saltus'' model extends the ideas of IRT to development that occurs in stages, where expected changes can be discontinuous, show different patterns for different types of items, or even exhibit reversals in probabilities of success on certain tasks. Examples include Piagetian stages of psychological development and Siegler's rule-based learning. This paper derives marginal maximum likelihood (MML) estimation equations for the structural parameters of the Saltus model and suggests a computing approximation based on the em algorithm. For individual examinees, empirical Bayes probabilities of learning-stage are given, along with proficiency parameter estimates conditional on stage membership. The MML solution is illustrated with simulated data and an example from the domain of mixed number subtraction.
This paper introduces dynamic latent-class models for the analysis and interpretation of stability and change in recurrent choice data. These latent-class models provide a nonparametric representation of individual ta...
详细信息
This paper introduces dynamic latent-class models for the analysis and interpretation of stability and change in recurrent choice data. These latent-class models provide a nonparametric representation of individual taste differences. Changes in preferences are modeled by allowing for individual-level transitions from one latent class to another over time. The most general model facilitates a saturated representation of class membership changes. Several special cases are presented to obtain a parsimonious description of latent change mechanisms. An easy to implement em algorithm is derived for parameter estimation. The approach is illustrated by a detailed analysis of a purchase incidence data set.
A multidimensional scaling methodology (STUNMIX) for the analysis of subjects' preference/choice of stimuli that sets out to integrate the previous work in this area into a single framework, as well as to provide ...
详细信息
A multidimensional scaling methodology (STUNMIX) for the analysis of subjects' preference/choice of stimuli that sets out to integrate the previous work in this area into a single framework, as well as to provide a variety of new options and models, is presented. Locations of the stimuli and the ideal points of derived segments of subjects on latent dimensions are estimated simultaneously. The methodology is formulated in the framework of the exponential family of distributions, whereby a wide range of different data types can be analyzed. Possible reparameterizations of stimulus coordinates by stimulus characteristics, as well as of probabilities of segment membership by subject background variables, are permitted. The models are estimated in a maximum likelihood framework. The performance of the models is demonstrated on synthetic data, and robustness is investigated. An empirical application is provided, concerning intentions to buy portable telephones.
Nelson and Plosser (1982) argued that the trend in many macroeconomic time series changes frequently. Perron (1989), Rappoport and Reichlin (1989), and Balke and Fomby (1991), on the other hand, presented models that ...
详细信息
Nelson and Plosser (1982) argued that the trend in many macroeconomic time series changes frequently. Perron (1989), Rappoport and Reichlin (1989), and Balke and Fomby (1991), on the other hand, presented models that say changes in trend occur only relatively rarely. This paper provides a model that can account for evidence like that presented by Perron, Rappoport and Reichlin, and Balke and Fomby while being consistent with Nelson and Plosser's hypothesis of frequent shifts in trend. The paper gives methods for estimating the model, and it shows how estimates of the model's parameters can be used to construct parametric bootstrap forecast intervals. These methods are applied to the extended Nelson-Plosser data set used by Schotman and van Dijk (1991).
A probabilistic multidimensional scaling model is proposed. The model assumes that the coordinates of each stimulus are normally distributed with variance Sigma = diag(sigma(1)(2), ,... sigma(2)2(R,)). The advantage o...
详细信息
A probabilistic multidimensional scaling model is proposed. The model assumes that the coordinates of each stimulus are normally distributed with variance Sigma = diag(sigma(1)(2), ,... sigma(2)2(R,)). The advantage of this model is that axes are determined uniquely. The distribution of the distance between two stimuli is obtained by polar coordinates transformation, The method oi maximum likelihood estimation for means and variances using the em algorithm is discussed. Further simulated annealing is suggested as a means of obtaining initial values in order to avoid local maxima. A simulation study shows that the estimates are accurate, and a numerical example concerning the location of Japanese cities shows that natural axes can be obtained without introducing individual parameters.
暂无评论