Often, the functional form of covariate effects in an additive model varies across groups defined by levels of a categorical variable. This structure represents a factor-by-curve interaction. This article presents pen...
详细信息
Often, the functional form of covariate effects in an additive model varies across groups defined by levels of a categorical variable. This structure represents a factor-by-curve interaction. This article presents penalized spline models that incorporate factor-by-curve interactions into additive models. A mixedmodel formulation for penalized splines allows for straightforward model fitting and smoothing parameter selection. We illustrate the proposed model by applying it to pollen ragweed data in which seasonal trends vary by year.
We explore a hierarchical generalized latent factor model for discrete and bounded response variables and in particular, binomial responses. Specifically, we develop a novel two-step estimation procedure and the corre...
详细信息
We explore a hierarchical generalized latent factor model for discrete and bounded response variables and in particular, binomial responses. Specifically, we develop a novel two-step estimation procedure and the corresponding statistical inference that is computationally efficient and scalable for the high dimension in terms of both the number of subjects and the number of features per subject. We also establish the validity of the estimation procedure, particularly the asymptotic properties of the estimated effect size and the latent structure, as well as the estimated number of latent factors. The results are corroborated by a simulation study and for illustration, the proposed methodology is applied to analyze a dataset in a gene-environment association study.
A multilevel model for ordinal data in generalized linear mixed models (GLMM) framework is developed to account for the inherent dependencies among observations within clusters. Motivated by a data set from the Britis...
详细信息
A multilevel model for ordinal data in generalized linear mixed models (GLMM) framework is developed to account for the inherent dependencies among observations within clusters. Motivated by a data set from the British Social Attitudes Panel Survey (BSAPS), the random district effects and respondent effects are incorporated into the linear predictor to accommodate the nested clusterings. The fixed (random) effects are estimated (predicted) by maximizing the penalized quasi likelihood (PQL) function, whereas the variance component parameters are obtained via the restricted maximum likelihood (REML) estimation method. The model is employed to analyze the BSAPS data. Simulation studies are conducted to assess the performance of estimators. (C) 2015 Elsevier B.V. All rights reserved.
In the Bayesian stochastic search variable selection framework, a common prior distribution for the regression coefficients is the g-prior of Zellner. However there are two standard cases where the associated covarian...
详细信息
In the Bayesian stochastic search variable selection framework, a common prior distribution for the regression coefficients is the g-prior of Zellner. However there are two standard cases where the associated covariance matrix does not exist and the conventional prior of Zellner cannot be used: if the number of observations is lower than the number of variables (large p and small n paradigm), or if some variables are linear combinations of others. In such situations, a prior distribution derived from the prior of Zellner can be considered by introducing a ridge parameter. This prior is a flexible and simple adaptation of the g-prior and its influence on the selection of variables is studied. A simple way to choose the associated hyper-parameters is proposed. The method is valid for any generalized linear mixed model and particular attention is paid to the study of probit mixedmodels when some variables are linear combinations of others. The method is applied to both simulated and real datasets obtained from Affymetrix microarray experiments. Results are compared to those obtained with the Bayesian Lasso. (c) 2011 Elsevier B.V. All rights reserved.
A method for modeling survival data with multilevel clustering is described. The Cox partial likelihood is incorporated into the generalized linear mixed model (GLMM) methodology. Parameter estimation is achieved by m...
详细信息
A method for modeling survival data with multilevel clustering is described. The Cox partial likelihood is incorporated into the generalized linear mixed model (GLMM) methodology. Parameter estimation is achieved by maximizing a log likelihood analogous to the likelihood associated with the best linear unbiased prediction (BLUP) at the initial step of estimation and is extended to obtain residual maximum likelihood (REML) estimators of the variance component. Estimating equations for a three-level hierarchical survival model are developed in detail, and such a model is applied to analyze a set of chronic granulomatous disease (CGD) data on recurrent infections as an illustration with both hospital and patient effects being considered as random. Only the latter gives a significant contribution. A simulation study is carried out to evaluate the performance of the REML estimators. Further extension of the estimation procedure to models with an arbitrary number of levels is also discussed.
In a 1992 Technometrics paper, Lambert (1992, 34, 1-14) described zero-inflated Poisson (ZIP) regression, a class of models for count data with excess zeros. In a ZIP model, a count response variable is assumed to be ...
详细信息
In a 1992 Technometrics paper, Lambert (1992, 34, 1-14) described zero-inflated Poisson (ZIP) regression, a class of models for count data with excess zeros. In a ZIP model, a count response variable is assumed to be distributed as a mixture of a Poisson(lambda) distribution and a distribution with point mass of one at zero, with mixing probability p. Both p and lambda are allowed to depend on covariates through canonical link generalizedlinearmodels. In this paper, we adapt Lambert's methodology to an upper bounded count situation, thereby obtaining a zero-inflated binomial (ZIB) model. In addition, we add to the flexibility of these fixed effects models by incorporating random effects so that, e.g., the within-subject correlation and between-subject heterogeneity typical of repeated measures data can be accommodated. We motivate, develop, and illustrate the methods described here with an example from horticulture, where both upper bounded count (binomial-type) and unbounded count (Poisson-type) data with excess zeros were collected in a repeated measures designed experiment.
In epidemiological research, outcomes are frequently non-normal, sample sizes may be large, and effect sizes are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting...
详细信息
In epidemiological research, outcomes are frequently non-normal, sample sizes may be large, and effect sizes are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. I focus on binary outcomes, with the risk surface a smooth function of space, but the development herein is relevant for non-normal data in general. I compare penalized likelihood (PL) models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis (SB) representation of the spatial surface via the Fourier basis provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being reasonably computationally efficient. One of the contributions of this work is further development of this underused representation. The SB model outperforms the PL methods, which are prone to overfitting, but is slower to fit and not as easily implemented. A Bayesian Markov random field model performs less well statistically than the SB model, but is very computationally efficient. We illustrate the methods on a real data set of cancer cases in Taiwan. The success of the SB with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models. (c) 2006 Elsevier B.V. All rights reserved.
Disease mapping is an important area of statistical research. Contributions to the area over the last twenty years have been instrumental in helping to pinpoint potential causes of mortality and to provide a strategy ...
详细信息
Disease mapping is an important area of statistical research. Contributions to the area over the last twenty years have been instrumental in helping to pinpoint potential causes of mortality and to provide a strategy for effective allocation of health funding. Because of the complexity of spatial analyses, new developments in methodology have not generally found application at Vital Statistics agencies. Inference for spatio-temporal analyses remains computationally prohibitive, for routine preparation of mortality atlases. This paper considers whether approximate methods of inference are reliable for mapping studies, especially in terms of providing accurate estimates of relative risks, ranks of regions and standard errors of risks. These approximate methods lie in the broader realm of approximate inference for generalized linear mixed models. Penalized quasi-likelihood is specifically considered here. The main focus is on assessing how close the penalized quasi-likelihood estimates are to target values, by comparison with the more rigorous and widespread Bayesian Markov Chain Monte Carlo methods. No previous studies have compared these two methods. The quantities of prime interest are small-area relative risks and the estimated ranks of the risks which are often used for ordering the regions. It will be shown that penalized quasi-likelihood is a reasonably accurate method of inference and can be recommended as a simple, yet quite precise method for initial exploratory studies. (C) 2005 Elsevier B.V. All rights reserved.
The conditional autoregressive (CAR) model is widely used to describe the geographical distribution of a specific disease risk in lattice mapping. Successful developments based on frequentist and Bayesian procedures h...
详细信息
The conditional autoregressive (CAR) model is widely used to describe the geographical distribution of a specific disease risk in lattice mapping. Successful developments based on frequentist and Bayesian procedures have been extensively applied to obtain two-stage disease risk predictions at the subregional level. Bayesian procedures are preferred for making inferences, as the posterior standard errors (SE) of the two-stage prediction account for the variability in the variance component estimates;however, some recent work based on frequentist procedures and the use of bootstrap adjustments for the SE has been undertaken. In this article we investigate the suitability of an analytical adjustment for disease risk inference that provides accurate interval predictions by using the penalized quasilikelihood (PQL) technique to obtain model parameter estimates. The method is a first-order approximation of the naive SE based on a Taylor expansion and is interpreted as a conditional measure of variability providing conditional calibrated prediction intervals, given the data. We conduct a simulation study to demonstrate how the method can be used to estimate the specific subregion risk by interval. We evaluate the proposed methodology by analyzing the commonly used example data set of lip cancer incidence in the 56 counties of Scotland for the period 1975-1980. This evaluation reveals a close similarity between the solutions provided by the method proposed here and those of its fully Bayesian counterpart.
Count data are usually modeled using the Poisson generalizedlinearmodel. The Poisson model requires that the variance be a deterministic function of the mean. This assumption may not be met for a particular data set...
详细信息
Count data are usually modeled using the Poisson generalizedlinearmodel. The Poisson model requires that the variance be a deterministic function of the mean. This assumption may not be met for a particular data set, that is, the model may not adequately capture the variability in the data. The extra-variability in the data may be accommodated using overdispersion models, such as the negative binomial distribution. In addition to the overdispersion outliers may be present in the data as indicated by the model residuals or some functions of the model residuals. A variance shift outlier model (VSOM) for count data is introduced. The model is used to detect potential outliers in the data, and to down-weight them in the analysis if desired. In this model the overdispersion is modeled using an observation-specific random effect. The status of a given observation as an outlier is indicated by the size of the associated shift in variance for that observation. The model is then extended to longitudinal count data for the detection of outliers at the subject level. We illustrate the methodology using a real data set taken from the literature. Extensions of the VSOM for count data to other non-normal responses are discussed. (C) 2014 Elsevier B.V. All rights reserved.
暂无评论