This paper introduces a finite mixture of canonical fundamental skew t (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed (in: Lee and McLachlan, ...
详细信息
This paper introduces a finite mixture of canonical fundamental skew t (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed (in: Lee and McLachlan, arXiv: 1401.8182 [statME], 2014b). The family of CFUST distributions includes the restricted multivariate skew t and unrestricted multivariate skew t distributions as special cases. In recent years, a few versions of the multivariate skew t (MST) mixture model have been put forward, together with various em-type algorithms for parameter estimation. These formulations adopted either a restricted or unrestricted characterization for their MST densities. In this paper, we examine a natural generalization of these developments, employing the CFUST distribution as the parametric family for the component distributions, and point out that the restricted and unrestricted characterizations can be unified under this general formulation. We show that an exact implementation of the em algorithm can be achieved for the CFUST distribution and mixtures of this distribution, and present some new analytical results for a conditional expectation involved in the E-step.
Accurately estimating of IP Traffic matrix (TM) is still a challenging task and it has wide applications in network management, load-balancing, traffic detecting and so on. In this paper, we propose an accurate method...
详细信息
Accurately estimating of IP Traffic matrix (TM) is still a challenging task and it has wide applications in network management, load-balancing, traffic detecting and so on. In this paper, we propose an accurate method, i.e., the Moore-Penrose inverse based neural network approach for the estimation of IP network traffic matrix with extended input and expectation maximization iteration, which is termed as MNETME for short. Firstly, MNETME adopts the extended input component, i.e., the product of routing matrix's Moore-Penrose inverse and the link load vector, as the input to the neural network. Secondly, the em algorithm is incorporated into its architecture to deal with the output data of the neural network. Therefore, MNETME manifests itself with the advantages that-it needs less input data, but has better accuracy of estimation. We theoretically analyze the algorithm and then study its performance using the real data from the Abilene Network. The simulation results show that MNETME leads to a more accurate estimation in contrast to the previous methods, meanwhile it holds better robustness and can well track the traffic fluctuations. We finally extend MNETME to random routing networks by proposing a new model of random routing which overcomes three fatal deficiencies of the existing model and it is easier, more practical and more precise. (C) 2015 Elsevier Ltd. All rights reserved.
Accurate estimation and prediction of urban link travel times are important for urban traffic operations and management. This paper develops a Bayesian mixture model to estimate short-term average urban link travel ti...
详细信息
Accurate estimation and prediction of urban link travel times are important for urban traffic operations and management. This paper develops a Bayesian mixture model to estimate short-term average urban link travel times using large-scale trip-based data with partial information. Unlike typical GPS trajectory data, trip-based data from taxies or other sources provide limited trip level information, which only contains the trip origin and destination locations, trip travel times and distances, etc. The focus of this study is to develop a robust probabilistic short-term average link travel time estimation model and demonstrate the feasibility of estimating network conditions using large-scale trip level information. In the model, the path taken by each trip is considered as latent and modeled using a multinomial logit distribution. The observed trip data given the possible path set and the mean and variance of the average link travel times can thus be characterized using a finite mixture distribution. A transition model is also introduced to serve as an informative prior that captures the temporal and spatial dependencies of link travel times. A solution approach based on the expectation-maximization (em) algorithm is proposed to solve the problem. The model is tested on estimating the mean and variance of the average link travel times for 30 min time intervals using a large-scale taxi trip dataset from New York City. More robust estimation results are obtained owing to the adoption of the Bayesian framework. (C) 2015 Elsevier B.V. All rights reserved.
The problem of temporal data clustering is addressed using a dynamic Gaussian mixture model. In addition to the missing clusters used in the classical Gaussian mixture model, the proposed approach assumes that the mea...
详细信息
The problem of temporal data clustering is addressed using a dynamic Gaussian mixture model. In addition to the missing clusters used in the classical Gaussian mixture model, the proposed approach assumes that the means of the Gaussian densities are latent variables distributed according to random walks. The parameters of the proposed algorithm are estimated by the maximum likelihood approach. However, the em algorithm cannot be applied directly due to the complex structure of the model, and some approximations are required. Using a variational approximation, an algorithm called Vem-DyMix is proposed to estimate the parameters of the proposed model. Using simulated data, the ability of the proposed approach to accurately estimate the parameters is demonstrated. Vem-DyMix outperforms, in terms of clustering and estimation accuracy, other state-of-the-art algorithms. The experiments performed on real world data from two fields of application (railway condition monitoring and object tracking from videos) show the strong potential of the proposed algorithms. (C) 2016 Elsevier B.V. All rights reserved.
Within the educational context, a key goal is to assess students' acquired skills and to cluster students according to their ability level. In this regard, a relevant element to be accounted for is the possible ef...
详细信息
Within the educational context, a key goal is to assess students' acquired skills and to cluster students according to their ability level. In this regard, a relevant element to be accounted for is the possible effect of the school students come from. For this aim, we provide a methodological tool which takes into account the multilevel structure of the data (i.e., students in schools) and allows us to cluster both students and schools into homogeneous classes of ability and effectiveness, and to assess the effect of certain students' and school characteristics on the probability to belong to such classes. The proposed approach relies on an extended class of multidimensional latent class IRT models characterised by: (i) latent traits defined at student and school level, (ii) latent traits represented through random vectors with a discrete distribution, (iii) the inclusion of covariates at student and school level, and (iv) a two-parameter logistic parametrisation for the conditional probability of a correct response given the ability. The approach is applied for the analysis of data collected by two national tests administered in Italy to middle school students in June 2009: the INVALSI Language Test and the Mathematics Test.
A test for ordered categorical variables is of considerable importance, because they are frequently encountered in biomedical studies. This paper introduces a simple ordering test approach for the two-way r x c contin...
详细信息
A test for ordered categorical variables is of considerable importance, because they are frequently encountered in biomedical studies. This paper introduces a simple ordering test approach for the two-way r x c contingency tables with incomplete counts by developing six test statistics, i.e., the likelihood ratio test statistic, score test statistic, global score test statistic, Hausman-Wald test statistic,Wald test statistic and distance-based test statistic. Bootstrap resampling methods are also presented. The performance of the proposed tests is evaluated with respect to their empirical type I error rates and empirical powers. The results show that the likelihood ratio test statistic based on the bootstrap resampling methods perform satisfactorily for small to large sample sizes. A real example from a wheeze study in six cities is used to illustrate the proposed methodologies. (C) 2016 Elsevier B.V. All rights reserved.
In this letter, we propose a hybrid maximum likelihood (HML) classifier for continuous phase modulation (CPM). To the best of our knowledge, the proposed likelihood function is the first one for CPM signals that is ba...
详细信息
In this letter, we propose a hybrid maximum likelihood (HML) classifier for continuous phase modulation (CPM). To the best of our knowledge, the proposed likelihood function is the first one for CPM signals that is based on two of its main features: nonlinear waveform, which is represented with its principal components, and signal memory, which is modeled as a Markov mapping symbol sequence. Unknown channel parameters are estimated through the expectation-maximization (em) algorithm. An approximation method is further proposed to ensure that the proposed classifier improves classification performance at the cost of a moderate increase in calculations. Numerical results prove the superiority of the proposed approach over the classical HML classifier and feature-based classifier in terms of classifying CPM and linear modulation.
Alatent Gaussian mixture model to classify ordinal data is proposed. The observed categorical variables are considered as a discretization of an underlying finite mixture of Gaussians. The model is estimated within th...
详细信息
Alatent Gaussian mixture model to classify ordinal data is proposed. The observed categorical variables are considered as a discretization of an underlying finite mixture of Gaussians. The model is estimated within the expectation-maximization (em) framework maximizing a pairwise likelihood. This allows us to overcome the computational problems arising in the full maximum likelihood approach due to the evaluation of multidimensional integrals that cannot be written in closed form. Moreover, a method to cluster the observations on the basis of the posterior probabilities in output of the pairwise em algorithm is suggested. The effectiveness of the proposal is shown comparing the pairwise likelihood approach with the full maximum likelihood and the maximum likelihood for continuous data ignoring the ordinal nature of the variables. The comparison is made by means of a simulation study;applications to real data are provided.
Follow-up studies on a group of units are commonly carried out to explore the possibility that a response distribution has changed at unobservable time points that are different for different units. Often, in practice...
详细信息
Follow-up studies on a group of units are commonly carried out to explore the possibility that a response distribution has changed at unobservable time points that are different for different units. Often, in practice, there will be many potential covariates, which may not only be associated with the response distribution but also with the distribution of the unobservable change-points. Here, the covariates are allowed to enter the change point distribution through a proportional odds model whose baseline odds is assumed to be piecewise constant as a function of time. The combination of a large number of putative regression coefficients in the response distributions as well as the change-point distribution, alone leads to a challenging simultaneous variable selection and estimation problem. Moreover, selection and estimation of the parameters that determine the coarseness of the baseline odds function adds a further level of complexity. Using penalized likelihood methods we are able to simultaneously perform variable selection, estimation, and determine the coarseness of the baseline odds function. Our approach is computationally efficient and shown to be consistent in variable selection and parameter estimation. We assess its performance through simulations, and demonstrate its usage in fitting a model for cognitive decline in subjects with Alzheimer's disease. (C) 2016 Elsevier Inc. All rights reserved.
An important challenge in speech processing involves extracting non-linguistic information from a fundamental frequency (F_0) contour of speech. We propose a fast algorithm for estimating the model parameters of the F...
详细信息
ISBN:
(纸本)9781509041183
An important challenge in speech processing involves extracting non-linguistic information from a fundamental frequency (F_0) contour of speech. We propose a fast algorithm for estimating the model parameters of the Fujisaki model, namely, the timings and magnitudes of the phrase and accent commands. Although a powerful parameter estimation framework based on a stochastic counterpart of the Fujisaki model has recently been proposed, it still had room for improvement in terms of both computational efficiency and parameter estimation accuracy. This paper describes our two contributions. First, we propose a hard expectation-maximization (em) algorithm for parameter inference where the E step of the conventional em algorithm is replaced with a point estimation procedure to accelerate the estimation process. Second, to improve the parameter estimation accuracy, we add a generative process of a spectral feature sequence to the generative model. This makes it possible to use linguistic or phonological information as an additional clue to estimate the timings of the accent commands. The experiments confirmed that the present algorithm was approximately 16 times faster and estimated parameters about 3% more accurately than the conventional algorithm.
暂无评论