We consider a mixture model with latent Bayesian network (MLBN) for a set of random vectors X-(t), X-(t) is an element of R-dt, t = 1, ..., T. Each X-(t) is associated with a latent state s(t), given which X-(t) is co...
详细信息
We consider a mixture model with latent Bayesian network (MLBN) for a set of random vectors X-(t), X-(t) is an element of R-dt, t = 1, ..., T. Each X-(t) is associated with a latent state s(t), given which X-(t) is conditionally independent from other variables. The joint distribution of the states is governed by a Bayes net. Although specific types of MLBN have been used in diverse areas such as biomedical research and image analysis, the exact expectation-maximization (em) algorithm for estimating the models can involve visiting all the combinations of states, yielding exponential complexity in the network size. A prominent exception is the Baum-Welch algorithm for the hidden Markov model, where the underlying graph topology is a chain. We hereby develop a new Baum-Welch algorithm on directed acyclic graph (BW-DAG) for the general MLBN and prove that it is an exact em algorithm. BW-DAG provides insight on the achievable complexity of em. For a tree graph, the complexity of BW-DAG is much lower than that of the brute-force em. Copyright (c) 2017 John Wiley & Sons, Ltd.
In this paper, we develop a bivariate discrete generalized exponential distribution, whose marginals are discrete generalized exponential distribution as proposed by Nekoukhou, Alamatsaz and Bidram [Discrete generaliz...
详细信息
In this paper, we develop a bivariate discrete generalized exponential distribution, whose marginals are discrete generalized exponential distribution as proposed by Nekoukhou, Alamatsaz and Bidram [Discrete generalized exponential distribution of a second type. Statistics. 2013;47:876-887]. It is observed that the proposed bivariate distribution is a very flexible distribution and the bivariate geometric distribution can be obtained as a special case of this distribution. The proposed distribution can be seen as a natural discrete analogue of the bivariate generalized exponential distribution proposed by Kundu and Gupta [Bivariate generalized exponential distribution. J Multivariate Anal. 2009;100:581-593]. We study different properties of this distribution and explore its dependence structures. We propose a new em algorithm to compute the maximum-likelihood estimators of the unknown parameters which can be implemented very efficiently, and discuss some inferential issues also. The analysis of one data set has been performed to show the effectiveness of the proposed model. Finally, we propose some open problems and conclude the paper.
It is well known that the widely popular mean regression model could be inadequate if the probability distribution of the observed responses do not follow a symmetric distribution. To deal with this situation, the qua...
详细信息
It is well known that the widely popular mean regression model could be inadequate if the probability distribution of the observed responses do not follow a symmetric distribution. To deal with this situation, the quantile regression turns to be a more robust alternative for accommodating outliers and the misspecification of the error distribution because it characterizes the entire conditional distribution of the outcome variable. This paper presents a likelihood-based approach for the estimation of the regression quantiles based on a new family of skewed distributions. This family includes the skewed version of normal, Student-t, Laplace, contaminated normal and slash distribution, all with the zero quantile property for the error term and with a convenient and novel stochastic representation that facilitates the implementation of the expectation-maximization algorithm for maximum likelihood estimation of the pth quantile regression parameters. We evaluate the performance of the proposed expectation-maximization algorithm and the asymptotic properties of the maximum likelihood estimates through empirical experiments and application to a real-life dataset. The algorithm is implemented in the R package lqr, providing full estimation and inference for the parameters as well as simulation envelope plots useful for assessing the goodness of fit. Copyright (C) 2017 John Wiley & Sons, Ltd.
This article proposes a new approach for Bayesian and maximum likelihood parameter estimation for stationary Gaussian processes observed on a large lattice with missing values. We propose a Markov chain Monte Carlo ap...
详细信息
This article proposes a new approach for Bayesian and maximum likelihood parameter estimation for stationary Gaussian processes observed on a large lattice with missing values. We propose a Markov chain Monte Carlo approach for Bayesian inference, and a Monte Carlo expectation-maximization algorithm for maximum likelihood inference. Our approach uses data augmentation and circulant embedding of the covariance matrix, and provides likelihood-based inference for the parameters and the missing data. Using simulated data and an application to satellite sea surface temperatures in the Pacific Ocean, we show that our method provides accurate inference on lattices of sizes up to 512 x 512, and is competitive with two popular methods: composite likelihood and spectral approximations.
Data extracted from air quality monitoring can require spatiotemporal clustering techniques. Of late, many clustering techniques are based on mixture models;however, there is a shortage of model-based approaches for s...
详细信息
Data extracted from air quality monitoring can require spatiotemporal clustering techniques. Of late, many clustering techniques are based on mixture models;however, there is a shortage of model-based approaches for spatiotemporal data. A new mixture to cluster spatiotemporal data, named STM, is introduced, and generic identifiability is proved. The resulting model defines each mixture component as a mixture of autoregressive polynomial regressions in which the weights consider the spatial and temporal information with logistic links. Under the maximum likelihood framework, parameter estimation is carried out via an expectation-maximization algorithm while classical information criteria can be used for model selection. The proposed model is applied to air quality monitoring data from the periphery of Paris considering one of the critical pollutants, nitrogen dioxide, at different times during the day. The STM model is implemented in the R package SpaTimeClust.
In this paper, the destructive negative binomial (DNB) cure rate model with a latent activation scheme [V. Cancho, D. Bandyopadhyay, F. Louzada, and B. Yiqi, The DNB cure rate model with a latent activation scheme, St...
详细信息
In this paper, the destructive negative binomial (DNB) cure rate model with a latent activation scheme [V. Cancho, D. Bandyopadhyay, F. Louzada, and B. Yiqi, The DNB cure rate model with a latent activation scheme, Statistical Methodology 13 (2013b), pp. 48-68] is extended to the case where the observations are grouped into clusters. Parameter estimation is performed based on the restricted maximum likelihood approach and on a Bayesian approach based on Dirichlet process priors. An application to a real data set related to a sealant study in a dentistry experiment is considered to illustrate the performance of the proposed model.
In this paper, the estimation of parameters for a generalized inverted exponential distribution based on the progressively first-failure type-II right-censored sample is studied. An expectation-maximization (em) algor...
详细信息
In this paper, the estimation of parameters for a generalized inverted exponential distribution based on the progressively first-failure type-II right-censored sample is studied. An expectation-maximization (em) algorithm is developed to obtain maximum likelihood estimates of unknown parameters as well as reliability and hazard functions. Using the missing value principle, the Fisher information matrix has been obtained for constructing asymptotic confidence intervals. An exact interval and an exact confidence region for the parameters are also constructed. Bayesian procedures based on Markov Chain Monte Carlo methods have been developed to approximate the posterior distribution of the parameters of interest and in addition to deduce the corresponding credible intervals. The performances of the maximum likelihood and Bayes estimators are compared in terms of their mean-squared errors through the simulation study. Furthermore, Bayes two-sample point and interval predictors are obtained when the future sample is ordinary order statistics. The squared error, linear-exponential and general entropy loss functions have been considered for obtaining the Bayes estimators and predictors. To illustrate the discussed procedures, a set of real data is analyzed.
For professional basketball, finding valuable and suitable players is the key to building a winning team. To deal with such challenges, basketball managers, scouts and coaches are increasingly turning to analytics. Ob...
详细信息
For professional basketball, finding valuable and suitable players is the key to building a winning team. To deal with such challenges, basketball managers, scouts and coaches are increasingly turning to analytics. Objective evaluation of players and teams has always been the top goal of basketball analytics. Typical statistical analytics mainly focuses on the box score and has developed various metrics. In spite of the more and more advanced methods, metrics built upon box score statistics provide limited information about how players interact with each other. Two players with similar box scores may deliver distinct team plays. Thus professional basketball scouts have to watch real games to evaluate players. Live scouting is effective, but suffers from inefficiency and subjectivity. In this paper, we go beyond the static box score and model basketball games as dynamic networks. The proposed continuous-time stochastic block model clusters the players according to their playing style and performance. The model provides cluster-specific estimates of the effectiveness of players at scoring, rebounding, stealing, etc., and also captures player interaction patterns within and between clusters. By clustering similar players together, the model can help basketball scouts to narrow down the search space. Moreover, the model is able to reveal the subtle differences in the offensive strategies of different teams. An application to NBA basketball games illustrates the performance of the model.
The real stress field in an area associated with earthquake generation cannot be directly observed. For that purpose we apply hidden semi-Markov models (HSMMs) for strong earthquake occurrence in the areas of North an...
详细信息
The real stress field in an area associated with earthquake generation cannot be directly observed. For that purpose we apply hidden semi-Markov models (HSMMs) for strong earthquake occurrence in the areas of North and South Aegean Sea considering that the stress field constitutes the hidden process. The advantage of HSMMs compared to hidden Markov models (HMMs) is that they allow any arbitrary distribution for the sojourn times. Poisson, Logarithmic and Negative Binomial distributions as well as different model dimensions are tested. The parameter estimation is achieved via the em algorithm. For the decoding procedure, a new Viterbi algorithm with a simple form is applied detecting precursory phases (hidden stress variations) and warning for anticipated earthquake occurrences. The optimal HSMM provides an alarm period for 70 out of 88 events. HMMs are also studied presenting poor results compared to these obtained via HSMMs. Bootstrap standard errors and confidence intervals for the parameters are evaluated and the forecasting ability of the Poisson models is examined.
We consider methods for estimating the treatment effect and/or the covariate by treatment interaction effect in a randomized clinical trial under noncompliance with time-to-event outcome. As in Cuzick et al. (2007), a...
详细信息
We consider methods for estimating the treatment effect and/or the covariate by treatment interaction effect in a randomized clinical trial under noncompliance with time-to-event outcome. As in Cuzick et al. (2007), assuming that the patient population consists of three (possibly latent) subgroups based on treatment preference: the ambivalent group, the insisters, and the refusers, we estimate the effects among the ambivalent group. The parameters have causal interpretations under standard assumptions. The article contains two main contributions. First, we propose a weighted per-protocol (Wtd PP) estimator through incorporating time-varying weights in a proportional hazards model. In the second part of the article, under the model considered in Cuzick et al. (2007), we propose an em algorithm to maximize a full likelihood (FL) as well as the pseudo likelihood (PL) considered in Cuzick et al. (2007). The E step of the algorithm involves computing the conditional expectation of a linear function of the latent membership, and the main advantage of the em algorithm is that the risk parameters can be updated by fitting a weighted Cox model using standard software and the baseline hazard can be updated using closed-form solutions. Simulations show that the em algorithm is computationally much more efficient than directly maximizing the observed likelihood. The main advantage of the Wtd PP approach is that it is more robust to model misspecifications among the insisters and refusers since the outcome model does not impose distributional assumptions among these two groups.
暂无评论