We develop a novel estimation algorithm for a dynamic factor model (DFM) applied to panel data with a short time dimension and a large cross sectional dimension. Current DFMs usually require panels with a minimum of 2...
详细信息
We develop a novel estimation algorithm for a dynamic factor model (DFM) applied to panel data with a short time dimension and a large cross sectional dimension. Current DFMs usually require panels with a minimum of 20 years of quarterly data (80 time observations per panel). In contrast, the application we consider includes panels with a median of 8 annual observations. As a result, the time dimension in our paper is substantially shorter than previous work in the DFM literature. This difference increases the computational challenges of the estimation process which we address by developing the "Two-Cycle Conditional Expectation - Maximization" (2CCem) algorithm which is a variant of the em algorithm and its extensions. We analyze the conditions under which our model is identified and provide simulation results demonstrating consistency of our 2CCem estimator. We apply the DFM to a dataset of 802 water and sanitation utilities from 43 countries and use the 2CCem algorithm in order to estimate dynamic performance trajectories for each utility.
In this paper, we propose a penalized likelihood method to simultaneous select covariate, and mixing component and obtain parameter estimation in the localized mixture of experts models. We develop an expectation maxi...
详细信息
In this paper, we propose a penalized likelihood method to simultaneous select covariate, and mixing component and obtain parameter estimation in the localized mixture of experts models. We develop an expectation maximization algorithm to solve the proposed penalized likelihood procedure, and introduce a data-driven procedure to select the tuning parameters. Extensive numerical studies are carried out to compare the finite sample performances of our proposed method and other existing methods. Finally, we apply the proposed methodology to analyze the Boston housing price data set and the baseball salaries data set.
We present a two-step approach to classification problems in the large P, small N setting, where the number of predictors may be larger than the sample size. We assume that the association between the predictors and t...
详细信息
We present a two-step approach to classification problems in the large P, small N setting, where the number of predictors may be larger than the sample size. We assume that the association between the predictors and the class variable has an approximate linear-logistic form, but we allow the class boundaries to be nonlinear. We further assume that the number of true predictors is relatively small. In the first step, we use a binomial generalized linear model to identify which predictors are associated with each class and then restrict the data set to these predictors and run a nonlinear classifier, such as a random forest or a support vector machine. We show that, without the variable screening step, the classification performance of both the random forest and support vector machine is degraded when many among the P predictors are not related to the class.
Two dice are rolled repeatedly, only their sum is registered. Have the two dice been "shaved," so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of...
详细信息
Two dice are rolled repeatedly, only their sum is registered. Have the two dice been "shaved," so two of the six sides appear more frequently? Pavlides and Perlman discussed this somewhat complicated type of situation through curved exponential families. Here, we contrast their approach by regarding data as incomplete data from a simple exponential family. The latter, supplementary approach is in some respects simpler, it provides additional insight about the relationships among the likelihood equation, the Fisher information, and the em algorithm, and it illustrates the information content in ancillary statistics.
We consider a logistic regression. The spatial dependence is captured through a hidden Gaussian process after the logit transformation of the Bernoulli success probabilities. In a hierarchical framework, likelihood-ba...
详细信息
We consider a logistic regression. The spatial dependence is captured through a hidden Gaussian process after the logit transformation of the Bernoulli success probabilities. In a hierarchical framework, likelihood-based estimation requires an em algorithm. However, the expectations in the E-step are not available in closed-from expressions. We propose a variational approximation of the complete likelihood, that has a Gaussian form. We then obtain the desired approximations of the expectations. We conduct a simulation study to compare our approach with Laplace approximation. (C) 2019 Elsevier B.V. All rights reserved.
In this article, a general approach to latent variable models based on an underlying generalized linear model (GLM) with factor analysis observation process is introduced. We call these models Generalized Linear Facto...
详细信息
In this article, a general approach to latent variable models based on an underlying generalized linear model (GLM) with factor analysis observation process is introduced. We call these models Generalized Linear Factor Models (GLFM). The observations are produced from a general model framework that involves observed and latent variables that are assumed to be distributed in the exponential family. More specifically, we concentrate on situations where the observed variables are both discretely measured (e.g., binomial, Poisson) and continuously distributed (e.g., gamma). The common latent factors are assumed to be independent with a standard multivariate normal distribution. Practical details of training such models with a new local expectation-maximization (em) algorithm, which can be considered as a generalized em-type algorithm, are also discussed. In conjunction with an approximated version of the Fisher score algorithm (FSA), we show how to calculate maximum likelihood estimates of the model parameters, and to yield inferences about the unobservable path of the common factors. The methodology is illustrated by an extensive Monte Carlo simulation study and the results show promising performance.
Finite mixture models are a popular approach for unsupervised machine learning tasks. Mixtures of factor analyzers assume a latent variable structure, thereby modelling the data in a lower dimensional space. Herein, w...
详细信息
Finite mixture models are a popular approach for unsupervised machine learning tasks. Mixtures of factor analyzers assume a latent variable structure, thereby modelling the data in a lower dimensional space. Herein, we augment the traditional alternating expectation-conditional maximization algorithm by incorporating the nonparametric bootstrap during the parameter estimation process. This augmentation is shown to improve discovery of both the true number of groups and the true latent dimensionality through simulations, while also showing superior clustering performance on benchmark data sets.
In this paper, we present two comparative studies. The first one is between two hidden stationaries models of Markov using in image segmentation such as Hidden Markov Chain with Independent Noise (HMC-IN) and Pairwise...
详细信息
ISBN:
(纸本)9781538662205
In this paper, we present two comparative studies. The first one is between two hidden stationaries models of Markov using in image segmentation such as Hidden Markov Chain with Independent Noise (HMC-IN) and Pairwise Markov Chain (PMC). The second one is between three parameter estimators such as em (Exceptation-Maximization) algorithm, ICE (Iterative Conditional Estimation) algorithm and Sem (Stochastic Exceptation-Maximization) algorithm. To estimate the final configuration of X, we have used MPM (Marginal Posteriori Mode) algorithm. From these comparisons, we can confirm that PMC provides better results of segmentation than HMC-IN. Moreover, em, ICE, Sem give the same results under HMC-IN and PMC.
Phase Contrast and Differential Interference Contrast (DIC) microscopy are two popular noninvasive techniques for monitoring live cells. Each of these two imaging modalities has its own advantages and disadvantages to...
详细信息
Phase Contrast and Differential Interference Contrast (DIC) microscopy are two popular noninvasive techniques for monitoring live cells. Each of these two imaging modalities has its own advantages and disadvantages to visualize specimens, so biologists need these two complementary modalities together to analyze specimens. In this paper, we propose a novel data-driven learning method capable of transferring microscopy images from one imaging modality to the other imaging modality, reflecting the characteristics of specimens from different perspectives. For example, given a Phase Contrast microscope, we can transfer its images to the corresponding DIC images without using DIC microscope, vice versa. The preliminary experiments demonstrate that the image transfer approach can provide biologists a computational way to switch between microscopy imaging modalities, so biologists can combine the advantages of different imaging modalities to better visualize and analyze specimens over time, without purchasing all types of microscopy imaging modalities or switching between imaging systems back-and-forth during time-lapse experiments.
The gradient is an important property in an image. According to the characteristics of image gradient histogram (image gradient magnitude distribution), Gamma distribution model is near to the actual distribution, so ...
详细信息
The gradient is an important property in an image. According to the characteristics of image gradient histogram (image gradient magnitude distribution), Gamma distribution model is near to the actual distribution, so Gamma mixture model is used to fit natural image gradient distribution. First, an image can be divided into edge region and non-edge region on the aspect of the gradient;the authors assume that each region obeys sub-Gamma distribution with different parameters. Then, expectation maximisation (em) algorithm is used to estimate the parameters of each part. Finally, the accuracy of the fitting result of the entire gradient distribution is verified by the correlation coefficient and the validity of the estimated gradient magnitude distribution of non-edge region and edge region is verified by the edge-detection experiment with different threshold. This work can select Canny edge-detector high threshold adaptively, which can improve algorithm automatic level.
暂无评论