In this paper, for the generalized linear models (GLMs) with diverging number of covariates, the asymptotic properties of maximum quasi-likelihood estimators (MQLEs) under some regular conditions are developed. Th...
详细信息
In this paper, for the generalized linear models (GLMs) with diverging number of covariates, the asymptotic properties of maximum quasi-likelihood estimators (MQLEs) under some regular conditions are developed. The existence, weak convergence and the rate of convergence and asymptotic normality of linear combination of MQLEs and asymptotic distribution of single linear hypothesis teststatistics are presented. The results are illustrated by Monte-Carlo simulations.
We propose Dirichlet Process mixtures of generalized linear models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joi...
详细信息
We propose Dirichlet Process mixtures of generalized linear models (DP-GLM), a new class of methods for nonparametric regression. Given a data set of input-response pairs, the DP-GLM produces a global model of the joint distribution through a mixture of local generalized linear models. DP-GLMs allow both continuous and categorical inputs, and can model the same class of responses that can be modeled with a generalizedlinear model. We study the properties of the DP-GLM, and show why it provides better predictions and density estimates than existing Dirichlet process mixture regression models. We give conditions for weak consistency of the joint distribution and pointwise consistency of the regression estimate.
The use of parametric link transformation families in generalized linear models (GLM) has been shown to improve substantially the fit of standard analyses using a fixed link in some data sets (see Czado, 1993, for exa...
详细信息
The use of parametric link transformation families in generalized linear models (GLM) has been shown to improve substantially the fit of standard analyses using a fixed link in some data sets (see Czado, 1993, for example). When link and regression parameters are globally orthogonal (Cox and Reid, 1987), then the variance inflation of the regression parameter estimates due to the additional estimation of the link is asymptotically zero. Parameter orthogonality also induces numerical stability which is seen in the reduction of computation time required for the calculation of parameter estimates. This stability remains a desirable property even for inferences which are conditional on a fixed link value. Czado and Santner (1992b), for binomial error, and Czado (1992), for GLMs have shown that only local orthogonality can be achieved in general. This paper provides conditions on the link family to extend the notion of local orthogonality at a point to orthogonality in a neighborhood asymptotically and shows that the resulting links are location and scale invariant. General concepts for the construction of such links are given, and it is shown how they relate to link families proposed in the literature. The ideas are illustrated by two examples.
In this paper, our goal is to enhance the interpretability of generalized linear models by identifying the most relevant interactions between categorical predictors. Searching for interaction effects can quickly becom...
详细信息
In this paper, our goal is to enhance the interpretability of generalized linear models by identifying the most relevant interactions between categorical predictors. Searching for interaction effects can quickly become a highly combinatorial, and thus computationally costly, problem when we have many categorical predictors or even a few of them but with many categories. Moreover, the estimation of coefficients requires large training samples with enough observations for each interaction between categories. To address these bottlenecks, we propose to find a reduced representation for each categorical predictor as a binary predictor, where categories are clustered based on a dissimilarity. We provide a collection of binarized representations for each categorical predictor, where the dissimilarity takes into account information from the main effects and the interactions. The choice of the binarized predictors representing the categorical predictors is made with a novel heuristic procedure that is guided by the accuracy of the so-called binarized model. We test our methodology on both real-world and simulated data, illustrating that, without damaging the out-of-sample accuracy, our approach trains sparse models including only the most relevant interactions between categorical predictors.
Situations in which the observations are not normally distributed arise frequently in the quality engineering field. The standard approach to the analysis of such responses is to transform the response into a new quan...
详细信息
Situations in which the observations are not normally distributed arise frequently in the quality engineering field. The standard approach to the analysis of such responses is to transform the response into a new quantity that behaves more like a normal random variable. An alternative approach is to use an analysis procedure based on the generalizedlinear model (GLM), where a nonnormal error distribution and a function that links the predictor to the response may be specified. We present an introduction to the GLM, and show how such models may be fit. We present the GLM as an analog to the normal theory linear model. The usefulness of this approach is illustrated with examples.
A generalized case-control (GCC) study, like the standard case-control study, leverages outcome-dependent sampling (ODS) to extend to nonbinary responses. We develop a novel, unifying approach for analyzing GCC study ...
详细信息
A generalized case-control (GCC) study, like the standard case-control study, leverages outcome-dependent sampling (ODS) to extend to nonbinary responses. We develop a novel, unifying approach for analyzing GCC study data using the recently developed semiparametric extension of the generalizedlinear model (GLM), which is substantially more robust to model misspecification than existing approaches based on parametric GLMs. For valid estimation and inference, we use a conditional likelihood to account for the biased sampling design. We describe analysis procedures for estimation and inference for the semiparametric GLM under a conditional likelihood, and we discuss problems with estimation and inference under a conditional likelihood when the response distribution is misspecified. We demonstrate the flexibility of our approach over existing ones through extensive simulation studies, and we apply the methodology to an analysis of the Asset and Health Dynamics Among the Oldest Old study, which motives our research. The proposed approach yields a simple yet versatile solution for handling ODS in a wide variety of possible response distributions and sampling schemes encountered in practice.
It is known that collinearity among the explanatory variables in generalized linear models (GLMs) inflates the variance of maximum likelihood estimators. To overcome multicollinearity in GLMs, ordinary ridge estimator...
详细信息
It is known that collinearity among the explanatory variables in generalized linear models (GLMs) inflates the variance of maximum likelihood estimators. To overcome multicollinearity in GLMs, ordinary ridge estimator and restricted estimator were proposed. In this study, a restricted ridge estimator is introduced by unifying the ordinary ridge estimator and the restricted estimator in GLMs and its mean squared error (MSE) properties are discussed. The MSE comparisons are done in the context of first-order approximated estimators. The results are illustrated by a numerical example and two simulation studies are conducted with Poisson and binomial responses.
In this article, we present a compressive sensing based framework for generalizedlinear model regression that employs a two-component noise model and convex optimization techniques to simultaneously detect outliers a...
详细信息
In this article, we present a compressive sensing based framework for generalizedlinear model regression that employs a two-component noise model and convex optimization techniques to simultaneously detect outliers and determine optimally sparse representations of noisy data from arbitrary sets of basis functions. We then extend our model to include model order reduction capabilities that can uncover inherent sparsity in regression coefficients and achieve simple, superior fits. Second, we use the mixed l(2)/l(1) norm to develop another model that can efficiently uncover block-sparsity in regression coefficients. By performing model order reduction over all independent variables and basis functions, our algorithms successfully deemphasize the effect of independent variables that become uncorrelated with dependent variables. This desirable property has various applications in real-time anomaly detection, such as faulty sensor detection and sensor jamming in wireless sensor networks. After developing our framework and inheriting a stable recovery theorem from compressive sensing theory, we present two simulation studies on sparse or block-sparse problems that demonstrate the superior performance of our algorithms with respect to (1) classic outlier-invariant regression techniques like least absolute value and iteratively reweighted least-squares and (2) classic sparse-regularized regression techniques like LASSO.
The problem of detection of multicollinearity in generalized linear models is discussed. For this class of models the Belsley, Kuh, and Welsch (1980) multicollinearity diagnostic for the linear model is applied, perfo...
详细信息
The problem of detection of multicollinearity in generalized linear models is discussed. For this class of models the Belsley, Kuh, and Welsch (1980) multicollinearity diagnostic for the linear model is applied, performing the singular value decomposition on the scaled observed information matrix at the final solution I(s)(beta) tripple-overdot. The performance of this adapted diagnostic in detecting collinearity is examined in detail for this class of models, in particular, the discrete response model as exemplified by the binary logistic and proportional odds regression models. The effects of centering of independent variables on the estimation of parameters and on the sensitivity of the proposed diagnostic in the presence of collinearity are also investigated.
Under "measurement constraints," responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our goal is to sample...
详细信息
Under "measurement constraints," responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our goal is to sample a relatively small portion of the dataset where the expensive responses will be measured and the resultant sampling estimator is statistically efficient. Measurement constraints require the sampling probabilities can only depend on a very small set of the responses. A sampling procedure that uses responses at most only on a small pilot sample will be called "response-free." We propose a response-free sampling procedure optimal sampling under measurement constraints (OSUMC) for generalized linear models. Using the A-optimality criterion, that is, the trace of the asymptotic variance, the resultant estimator is statistically efficient within a class of sampling estimators. We establish the unconditional asymptotic distribution of a general class of response-free sampling estimators. This result is novel compared with the existing conditional results obtained by conditioning on both covariates and responses. Under our unconditional framework, the subsamples are no longer independent and new martingale techniques are developed for our asymptotic theory. We further derive the A-optimal response-free sampling distribution. Since this distribution depends on population level quantities, we propose the OSUMC algorithm to approximate the theoretical optimal sampling. Finally, we conduct an intensive empirical study to demonstrate the advantages of OSUMC algorithm over existing methods in both statistical and computational perspectives. We find that OSUMC's performance is comparable to that of sampling algorithms that use complete responses. This shows that, provided an efficient algorithm such as OSUMC is used, there is little or no loss in accuracy due to the unavailability of responses because of measurement constraints. Supplementary materials for this article are ava
暂无评论