The receiver operating characteristic (ROC) curve is widely applied in measuring the performance of diagnostic tests. Many direct and indirect approaches have been proposed for modelling the ROC curve and, because of ...
详细信息
The receiver operating characteristic (ROC) curve is widely applied in measuring the performance of diagnostic tests. Many direct and indirect approaches have been proposed for modelling the ROC curve and, because of its tractability, the Gaussian distribution has typically been used to model both diseased and non-diseased populations. Using a Gaussian mixture model leads to a more flexible approach that better accounts for atypical data. The Monte Carlo method can be used to circumvent the absence of a closed-form for a functional form of the ROC curve. The proposed method, in which a Gaussian mixture is used in conjunction with the Monte Carlo method, performs favourably when compared to the crude binormal curve and the semi-parametric frequentist binormal ROC using the well-known LABROC procedure. (C) 2015 Elsevier B.V. All rights reserved.
When a sample is obtained from a two-stage cluster sampling scheme with unequal selection probabilities the sample distribution can differ from that of the population and the sampling design can be informative. In thi...
详细信息
When a sample is obtained from a two-stage cluster sampling scheme with unequal selection probabilities the sample distribution can differ from that of the population and the sampling design can be informative. In this case making valid inference under generalized linear mixed models can be quite challenging. We propose a novel approach for parameter estimation using an em algorithm based on the approximate predictive distribution of the random effect. In the approximate predictive distribution instead of using the intractable sample likelihood function we use a normal approximation of the sampling distribution of the profile pseudo maximum likelihood estimator of the random effects in the level-one model. Two limited simulation studies show that the proposed method using the normal approximation performs well for modest cluster sizes. The proposed method is applied to the real data arising from 2011 Private Education Expenditures Survey (PEES) in Korea. The Canadian Journal of Statistics 45: 479-497;2017 (c) 2017 Statistical Society of Canada
This article proposes the elliptical multivariate leptokurtic-normal (MLN) distribution to fit data with excess kurtosis. The MLN distribution is a multivariate Gram-Charlier expansion of the multivariate normal (MN) ...
详细信息
This article proposes the elliptical multivariate leptokurtic-normal (MLN) distribution to fit data with excess kurtosis. The MLN distribution is a multivariate Gram-Charlier expansion of the multivariate normal (MN) distribution and has a closed-form representation characterized by one additional parameter denoting the excess kurtosis. It is obtained from the elliptical representation of the MN distribution, by reshaping its generating variate with the associated orthogonal polynomials. The strength of this approach for obtaining the MLN distribution lies in its general applicability as it can be applied to any multivariate elliptical law to get a suitable distribution to fit data. Maximum likelihood is discussed as a parameter estimation technique for the MLN distribution. Mixtures of MLN distributions are also proposed for robust model based clustering. An em algorithm is presented to obtain estimates of the mixture parameters. Benchmark real data are used to show the usefulness of mixtures of MLN distributions. (C) 2016 Statistical Society of Canada
The diagonal method (DM) is an innovative technique to obtain trustworthy survey data on an arbitrary categorical sensitive characteristic Y* (e.g., income classes, number of tax evasions). The estimation of the uncon...
详细信息
The diagonal method (DM) is an innovative technique to obtain trustworthy survey data on an arbitrary categorical sensitive characteristic Y* (e.g., income classes, number of tax evasions). The estimation of the unconditional distribution of Y* from DM data has already been shown. Now, a covariate extension of the DM, that is, methods to investigate the dependence of Y* on nonsensitive covariates, is sought. For instance, the dependence of income on gender and profession may be under study. The covariate extensions of privacy-protecting survey designs are broadened by the covariate DM, especially because existing methods focus on binary Y*. LR-DM estimation and stratum-wise estimation are described, where the former is based on a logistic regression model, leads to a generalized linear model, and requires computer-intensive methods. The existence of a certain regression estimate is investigated. Moreover, the connection between efficiency of the LR-DM estimation and the degree of privacy protection is studied and appropriate model parameters of the DM are searched. This problem of finding suitable model parameters is rarely addressed for privacy-protecting survey methods for multicategorical Y*. Finally, the LR-DM estimation is compared with the stratum-wise estimation. MATLAB programs that conduct the presented estimations are provided as supplemental material (see Appendix E). (C) 2016 Elsevier B.V. All rights reserved.
In this paper, the destructive negative binomial (DNB) cure rate model with a latent activation scheme [V. Cancho, D. Bandyopadhyay, F. Louzada, and B. Yiqi, The DNB cure rate model with a latent activation scheme, St...
详细信息
In this paper, the destructive negative binomial (DNB) cure rate model with a latent activation scheme [V. Cancho, D. Bandyopadhyay, F. Louzada, and B. Yiqi, The DNB cure rate model with a latent activation scheme, Statistical Methodology 13 (2013b), pp. 48-68] is extended to the case where the observations are grouped into clusters. Parameter estimation is performed based on the restricted maximum likelihood approach and on a Bayesian approach based on Dirichlet process priors. An application to a real data set related to a sealant study in a dentistry experiment is considered to illustrate the performance of the proposed model.
We consider a discrete latent variable model for two-way data arrays, which allows one to simultaneously produce clusters along one of the data dimensions (e.g.,exchangeable observational units or features) and contig...
详细信息
We consider a discrete latent variable model for two-way data arrays, which allows one to simultaneously produce clusters along one of the data dimensions (e.g.,exchangeable observational units or features) and contiguous groups, or segments, along the other (e.g.,consecutively ordered times or locations). The model relies on a hidden Markov structure but, given its complexity, cannot be estimated by full maximum likelihood. Therefore, we introduce a composite likelihood methodology based on considering different subsets of the data. The proposed approach is illustrated by simulation, and with an application to genomic data.
In this paper, we develop a bivariate discrete generalized exponential distribution, whose marginals are discrete generalized exponential distribution as proposed by Nekoukhou, Alamatsaz and Bidram [Discrete generaliz...
详细信息
In this paper, we develop a bivariate discrete generalized exponential distribution, whose marginals are discrete generalized exponential distribution as proposed by Nekoukhou, Alamatsaz and Bidram [Discrete generalized exponential distribution of a second type. Statistics. 2013;47:876-887]. It is observed that the proposed bivariate distribution is a very flexible distribution and the bivariate geometric distribution can be obtained as a special case of this distribution. The proposed distribution can be seen as a natural discrete analogue of the bivariate generalized exponential distribution proposed by Kundu and Gupta [Bivariate generalized exponential distribution. J Multivariate Anal. 2009;100:581-593]. We study different properties of this distribution and explore its dependence structures. We propose a new em algorithm to compute the maximum-likelihood estimators of the unknown parameters which can be implemented very efficiently, and discuss some inferential issues also. The analysis of one data set has been performed to show the effectiveness of the proposed model. Finally, we propose some open problems and conclude the paper.
Data extracted from air quality monitoring can require spatiotemporal clustering techniques. Of late, many clustering techniques are based on mixture models;however, there is a shortage of model-based approaches for s...
详细信息
Data extracted from air quality monitoring can require spatiotemporal clustering techniques. Of late, many clustering techniques are based on mixture models;however, there is a shortage of model-based approaches for spatiotemporal data. A new mixture to cluster spatiotemporal data, named STM, is introduced, and generic identifiability is proved. The resulting model defines each mixture component as a mixture of autoregressive polynomial regressions in which the weights consider the spatial and temporal information with logistic links. Under the maximum likelihood framework, parameter estimation is carried out via an expectation-maximization algorithm while classical information criteria can be used for model selection. The proposed model is applied to air quality monitoring data from the periphery of Paris considering one of the critical pollutants, nitrogen dioxide, at different times during the day. The STM model is implemented in the R package SpaTimeClust.
In this article, we consider a lifetime distribution, the Weibull-Logarithmic distri- bution introduced by [6]. We investigate some new statistical characterizations and properties. We develop the maximum likelihood i...
详细信息
In this article, we consider a lifetime distribution, the Weibull-Logarithmic distri- bution introduced by [6]. We investigate some new statistical characterizations and properties. We develop the maximum likelihood inference using em algorithm. Asymptotic properties of the MLEs are obtained and extensive simulations are conducted to assess the performance of parameter estimation. A numerical example is used to illustrate the application.
We consider a mixture model with latent Bayesian network (MLBN) for a set of random vectors X-(t), X-(t) is an element of R-dt, t = 1, ..., T. Each X-(t) is associated with a latent state s(t), given which X-(t) is co...
详细信息
We consider a mixture model with latent Bayesian network (MLBN) for a set of random vectors X-(t), X-(t) is an element of R-dt, t = 1, ..., T. Each X-(t) is associated with a latent state s(t), given which X-(t) is conditionally independent from other variables. The joint distribution of the states is governed by a Bayes net. Although specific types of MLBN have been used in diverse areas such as biomedical research and image analysis, the exact expectation-maximization (em) algorithm for estimating the models can involve visiting all the combinations of states, yielding exponential complexity in the network size. A prominent exception is the Baum-Welch algorithm for the hidden Markov model, where the underlying graph topology is a chain. We hereby develop a new Baum-Welch algorithm on directed acyclic graph (BW-DAG) for the general MLBN and prove that it is an exact em algorithm. BW-DAG provides insight on the achievable complexity of em. For a tree graph, the complexity of BW-DAG is much lower than that of the brute-force em. Copyright (c) 2017 John Wiley & Sons, Ltd.
暂无评论