Many research fields involve count data with zero inflation. A commonly chosen model for analysing a relationship between predictors and a response variable in these scenarios is a zero-inflated generalizedlinear mod...
详细信息
Many research fields involve count data with zero inflation. A commonly chosen model for analysing a relationship between predictors and a response variable in these scenarios is a zero-inflated generalizedlinear model (GLM). This model is a mixture of a count-based GLM and a zero-inflation component, with a mixing proportion that determines the amount of excess zeroes. As the use of zero-inflated count models is rising, it is important to be able to conduct a power analysis to properly design studies with such models. In this paper, we propose a flexible method for power analysis with zero-inflated count models using Monte Carlo simulation. We have created the R package ZIPowerAnalysis, which can be used to easily conduct a power analysis for any designed study that will incorporate a zero-inflated count GLM.
Bayesian modeling provides a principled approach to quantifying uncertainty and has seen a surge of applications in recent years. Within the context of a Bayesian workflow, we are concerned with model selection for th...
详细信息
Bayesian modeling provides a principled approach to quantifying uncertainty and has seen a surge of applications in recent years. Within the context of a Bayesian workflow, we are concerned with model selection for the purpose of finding models that best explain the data or underlying data generating process. Since insight into the true process is rare, what remains is incomplete causal knowledge and model predictions of the data. This leads to the important question of when the use of prediction as a proxy for explanation for the purpose of model selection is valid. We approach this question by means of large-scale simulations of Bayesian generalized linear models where we investigate various causal and statistical misspecifications. Our results indicate that the use of prediction as proxy for explanation is valid and safe if the models under consideration are sufficiently consistent with the underlying causal structure of the true data generating process.
Consider the following generalizedlinear model (GLM) yi = h(x(i)(T) beta) + e(i), i = 1, 2,..., n, where h(.) is a continuous differentiable function, {e(i)} are independent identically distributed (i.i.d.) random va...
详细信息
Consider the following generalizedlinear model (GLM) yi = h(x(i)(T) beta) + e(i), i = 1, 2,..., n, where h(.) is a continuous differentiable function, {e(i)} are independent identically distributed (i.i.d.) random variables with zero mean and known variance sigma(2). Based on the penalized Lq-likelihood method of linear regression models, we apply the method to the GLM, and also investigate Oracle properties of the penalized Lq-likelihood estimator (PLqE). In order to show the robustness of the PLqE, we discuss influence function of the PLqE. Simulation results support the validity of our approach. Furthermore, it is shown that the PLqE is robust, while the penalized maximum likelihood estimator is not.
In order to break the constraints and barriers caused by limited computing power in processing massive datasets, we propose an outcome dependent subsampling divide and conquer strategy in this paper. The proposed stra...
详细信息
In order to break the constraints and barriers caused by limited computing power in processing massive datasets, we propose an outcome dependent subsampling divide and conquer strategy in this paper. The proposed strategy can process data on multiple blocks in parallel and concentrate the computing resources of each block on regions with the most information. We develop a distributed statistical inference method and propose a computation-efficient algorithm in the generalized linear models for massive data. The proposed method only need to preserve some summary statistics from each data block and then use them to directly construct the proposed estimator. The asymptotic properties of the proposed method are established. Simulation studies and real data analysis are conducted to illustrate the merits of the proposed method.
This paper examines the extent to which inappropriate source use - verbatim source use and patchwriting - can be predicted by scores of other textual features that are commonly evaluated in second/foreign language (L2...
详细信息
This paper examines the extent to which inappropriate source use - verbatim source use and patchwriting - can be predicted by scores of other textual features that are commonly evaluated in second/foreign language (L2) integrated writing assessment. 246 advanced-level English as a Foreign Language (EFL) test-takers enrolled in a Chinese higher education institution provided integrated essays that required both summary and argumentation. All the collected essays were rated by two experienced raters and checked for interrater reliability by way of generalizability theory. Then, a series of generalized linear models was compared to identify the best-fitting model that explained the relationship between the independent variables and inappropriate source use. Results indicated that the zero-inflated beta-binomial provided the best fit to the data, with approximately 43.67% of the data estimated to be an extra zero. Next, parameter estimates of this model included (1) non-significant effects of language use and source comprehension on inappropriate source use and (2) a significantly negative effect of organizational features on the dependent variable. This suggests that focusing on organizational features, operationalized herein as organization, coherence, development of ideas, and authorial voice, can help L2 test-takers reduce reliance on inappropriate source use. Implications for research and practice are discussed.
In this paper, we study the problem of estimating smooth generalized linear models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Unlike its classical setting, our model allows the server to ac...
详细信息
In this paper, we study the problem of estimating smooth generalized linear models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Unlike its classical setting, our model allows the server to access additional public but unlabeled data. In the first part of the paper, we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an (epsilon, delta)-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an l(2)-norm estimation error of alpha (with high probability) is O(p alpha(-2)) and (O) over tilde (p(3)alpha(-2) epsilon(-2)) respectively, where p is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in alpha-1, or exponential in p sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded l(1)-norm. Based on a variant of Stein's lemma, we propose an (epsilon, delta)-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an l(infinity)-norm estimation error of alpha is O(p(2)alpha(-2)) and (O) over tilde (p(2)alpha(-2) epsilon(-2)) respectively, under some mild assumptions and if alpha is not too small (i.e., alpha >= Omega( 1/root p )). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLMs and non-linear regressions in the NLDP model with unlabeled public
In many situations information from a sample of individuals can be supplemented by population level information on the relationship between a dependent variable and explanatory variables. Inclusion of the population l...
详细信息
In many situations information from a sample of individuals can be supplemented by population level information on the relationship between a dependent variable and explanatory variables. Inclusion of the population level information can reduce bias and increase the efficiency of the parameter estimates. Population level information can be incorporated via constraints on functions of the model parameters. In general the constraints are non-linear, making the task of maximum likelihood estimation more difficult. We develop an alternative approach exploiting the notion of an empirical likelihood. It is shown that, within the framework of generalized linear models, the population level information corresponds to linear constraints, which are comparatively easy to handle. We provide a two-step algorithm that produces parameter estimates by using only unconstrained estimation. We also provide computable expressions for the standard errors. We give an application to demographic hazard modelling by combining panel survey data with birth registration data to estimate annual birth probabilities by parity.
作者:
James, GMUniv So Calif
Informat & Operat Management Dept Marshall Sch Business Los Angeles CA 90089 USA
We present a technique for extending generalized linear models to the situation where some of the predictor variables are observations from a curve or function. The technique is particularly useful when only fragments...
详细信息
We present a technique for extending generalized linear models to the situation where some of the predictor variables are observations from a curve or function. The technique is particularly useful when only fragments of each curve have been observed. We demonstrate, on both simulated and real data sets, how this approach can be used to perform linear, logistic and censored regression With functional predictors. In addition, we show how functional principal components can be used to gain insight into the relationship between the response and functional predictors. Finally, we extend the methodology to apply generalized linear models and principal components to standard missing data problems.
Several approaches have been proposed for optimizing both the mean and variation of a process simultaneously. This paper reviews some of these methods and studies ways in which generalized linear models can be adapted...
详细信息
Several approaches have been proposed for optimizing both the mean and variation of a process simultaneously. This paper reviews some of these methods and studies ways in which generalized linear models can be adapted for use with them. Specifically, a generalizedlinear model with gamma error distribution and log link function is used to model variation as (1) part of a screening method for variance control factors and (2) part of an algorithm for simultaneous maximum likelihood estimation of mean and variance parameters. The advantages and disadvantages of these two approaches are examined in detail and compared to other current methods.
The authors review the applications of generalized linear models to actuarial problems. This rich class of statistical model has been successfully applied in recent years to a wide range df problems, involving mortali...
详细信息
The authors review the applications of generalized linear models to actuarial problems. This rich class of statistical model has been successfully applied in recent years to a wide range df problems, involving mortality multiple-state models, lapses, premium rating and reserving. Selective examples of these applications are presented.
暂无评论