This article is concerned with the problem of variable selection and estimation for high dimensional generalized linear models. In this article, we introduce a general iteratively reweighted adaptive ridge regression ...
详细信息
This article is concerned with the problem of variable selection and estimation for high dimensional generalized linear models. In this article, we introduce a general iteratively reweighted adaptive ridge regression method (GAR). We show that the GAR estimator possesses oracle property and grouping effect. A data-driven parameter gamma is introduced in the GAR method to adapt the different cases of the true model. Then, such an adaptive parameter gamma is adequately taken into consideration to establish a gamma-dependent sufficient condition to guarantee the oracle property and the grouping effect. Furthermore, to apply the GAR method more efficiently, a coordinate-wise Newton algorithm is employed to successfully avoid the inverse matrix operation and the numerical instability caused by iteration. Extensive numerical simulation results show that the GAR method outperforms the commonly used methods, and the GAR method is tested on the gastric cancer dataset for further illustration.
In this paper, we study the asymptotic properties of the adaptive Lasso estimators in high-dimensional generalized linear models. The consistency of the adaptive Lasso estimator is obtained. We show that, if a reasona...
详细信息
In this paper, we study the asymptotic properties of the adaptive Lasso estimators in high-dimensional generalized linear models. The consistency of the adaptive Lasso estimator is obtained. We show that, if a reasonable initial estimator is available, under appropriate conditions, the adaptive Lasso correctly selects covariates with non zero coefficients with probability converging to one, and that the estimators of non zero coefficients have the same asymptotic distribution they would have if the zero coefficients were known in advance. Thus, the adaptive Lasso has an Oracle property. The results are examined by some simulations and a real example.
A cost-effective sampling design is desirable in large cohort studies with a limited budget due to the high cost of measurements of primary exposure *** outcome-dependent sampling(ODS) designs enrich the observed samp...
详细信息
A cost-effective sampling design is desirable in large cohort studies with a limited budget due to the high cost of measurements of primary exposure *** outcome-dependent sampling(ODS) designs enrich the observed sample by oversampling the regions of the underlying population that convey the most information about the exposure-response *** generalized linear models(GLMs) are widely used in many fields,however,much less developments have been done with the GLMs for data from the ODS *** study how to fit the GLMs to data obtained by the original ODS design and the two-phase ODS design,*** asymptotic properties of the proposed estimators are derived.A series of simulations are conducted to assess the finite-sample performance of the proposed *** to a Wilms tumor study and an air quality study demonstrate the practicability of the proposed methods.
Vector generalized linear models (VGLMs) as implemented in the VGAM R package permit multiple parameters to depend (via inverse link functions) on linear predictors. However it is often the case that one wishes differ...
详细信息
Vector generalized linear models (VGLMs) as implemented in the VGAM R package permit multiple parameters to depend (via inverse link functions) on linear predictors. However it is often the case that one wishes different parameters to be related to each other in some way (i.e., to jointly satisfy certain constraints). Prominent and important examples of such cases include the normal or Gaussian family where one wishes to model the variance as a function of the mean, e.g., variance proportional to the mean raised to some power. Another example is the negative binomial family whose variance is approximately proportional to the mean raised to some power. It is shown that such constraints can be implemented in a straightforward manner via reduced rank regression (RRR) and easily used via the rrvglm () function. To this end RRR is briefly described and applied so as to impose parameter constraints in VGLMs with two parameters. The result is a rank-1 RR-VGLM. Numerous examples are given, some new, of the use of this technique. The implication here is that RRR offers hitherto undiscovered potential usefulness to many statistical distributions. (C) 2013 Elsevier B.V. All rights reserved.
In this article, for the generalized linear models (GLMs) with "working" covariance matrix and adaptive designs, we develop the asymptotic properties of maximum quasi-likelihood estimators (MQLEs) under some...
详细信息
In this article, for the generalized linear models (GLMs) with "working" covariance matrix and adaptive designs, we develop the asymptotic properties of maximum quasi-likelihood estimators (MQLEs) under some mild regular conditions. The existence of MQLEs in quasi-likelihood equation, the rate of convergence and asymptotic normality of MQLEs are presented. The results are illustrated by Monte-Carlo simulations.
作者:
Gorosito, Irene L.Marziali Bermudez, MarianoBusch, MariaUniv Buenos Aires
Fac Ciencias Exactas & Nat Dept Ecol Genet & Evolut Buenos Aires DF Argentina Univ Buenos Aires
Consejo Nacl Invest Cient & Tecn Inst Ecol Genet & Evolut Buenos Aires Intendente Guiraldes 2160 Ciudad UnivC1428EGA Buenos Aires DF Argentina Univ Buenos Aires
Fac Ciencias Exactas & Nat Dept Fis Buenos Aires DF Argentina Univ Buenos Aires
Consejo Nacl Invest Cient & Tecn Inst Fis Buenos Aires Intendente Guiraldes 2160 Ciudad UnivC1428EGA Buenos Aires DF Argentina
models of habitat variables can be used to find indicators for a quantitative prediction of the likeliness of species occurrence or abundance. Methodological bias due to variable detectability can be critical to prope...
详细信息
models of habitat variables can be used to find indicators for a quantitative prediction of the likeliness of species occurrence or abundance. Methodological bias due to variable detectability can be critical to properly determine habitat use and, thus, for understanding species ecology, distribution, and requirements for survival. In spite of recent advances in dealing with imperfect detection through detailed modeling, this approach requires large amounts of data and usually leads to larger standard errors in parameter estimates. In this work, we explore the advantages of combining generalized linear models (GLMs) and occupancy models (OMs) for the detection of variables that may be used as indicators of habitat suitability for rodent species. As a case study, we analyzed live trapping data of three rodent species that inhabit agroecosystems at micro- and macrohabitat scales. Both methods provided complementary information: while OMs revealed that some habitat features believed to be selected by studied species actually affected detectability, some effects could only be detected by GLMs. Moreover, for some covariates apparently affecting habitat selection at both scales, comparing results between scales allowed us to determine for which it was actually relevant rather than a reflection of the other. Therefore, we advise applying complementary modeling approaches at multiple scales for habitat selection studies. A variety of outcomes and their implications are thoroughly discussed and may guide other researchers facing similar situations.
In this article, we consider the quasi-likelihood equation Sigma(n)(i=1) X-i(y(i) - mu(X-i'beta)) = 0 for generalized linear models (GLMs). Under some mild conditions, including the convergent system {e(i) = y(i) ...
详细信息
In this article, we consider the quasi-likelihood equation Sigma(n)(i=1) X-i(y(i) - mu(X-i'beta)) = 0 for generalized linear models (GLMs). Under some mild conditions, including the convergent system {e(i) = y(i) - mu(X-i'beta(0)), i >= 1} which is defined by Lai et al. (1979), we obtain the asymptotic existence of the solution (beta) over cap (n) to the above equation and show that (beta) over cap (n) - beta(0) = O((lambda) over bar (1/2)(n) (log (lambda) over bar (n))(delta/2)/(lambda) under bar (n)) a.s., where beta(0) is the true value of parameter beta and (lambda) under bar (n)((lambda) over bar (n)) denotes the smallest (largest) eigenvalue of Sigma(n)(i=1) XiXt' satisfying ((lambda) over bar (1/2)(n) (log (lambda) over bar (n))(delta/2))/(lambda) under bar (n) -> 0 as n -> infinity for given delta > 1. We also present the asymptotic normality of (beta) over cap (n) for univariate GLMs, based on which "studentized" large sample confidence intervals for beta(0) are constructed. Simulation results and related remarks are given.
Collaborative filtering (CF) is a data analysis task appearing in many challenging applications, in particular data mining in Internet and e-commerce. CF can often be formulated as identifying patterns in a large and ...
详细信息
Collaborative filtering (CF) is a data analysis task appearing in many challenging applications, in particular data mining in Internet and e-commerce. CF can often be formulated as identifying patterns in a large and mostly empty rating matrix. In this paper, we focus on predicting unobserved ratings. This task is often a part of a recommendation procedure. We propose a new CIF approach called interlaced generalized linear models (GLM);it is based on a factorization of the rating matrix and uses probabilistic modeling to represent uncertainty in the ratings. The advantage of this approach is that different configurations, encoding different intuitions about the rating process can easily be tested while keeping the same learning procedure. The GLM formulation is the keystone to derive an efficient learning procedure, applicable to large datasets. We illustrate the technique on three public domain datasets. (c) 2008 Elsevier B.V. All rights reserved.
Under the assumption that in the generalizedlinear model (GLM) the expectation of the response variable has a correct specification and some other smooth conditions, it is shown that with probability one the quasi-li...
详细信息
Under the assumption that in the generalizedlinear model (GLM) the expectation of the response variable has a correct specification and some other smooth conditions, it is shown that with probability one the quasi-likelihood equation for the GLM has a solution when the sample size n is sufficiently large. The rate of this solution tending to the true value is determined. In an important special case, this rate is the same as specified in the LIL for iid partial sums and thus cannot be improved anymore.
Structured sparsity has recently been a very popular technique to deal with the high-dimensional data. In this paper, we mainly focus on the theoretical problems for the overlapping group structure of generalized line...
详细信息
Structured sparsity has recently been a very popular technique to deal with the high-dimensional data. In this paper, we mainly focus on the theoretical problems for the overlapping group structure of generalized linear models (GLMs). Although the overlapping group lasso method for GLMs has been widely applied in some applications, the theoretical properties about it are still unknown. Under some general conditions, we presents the oracle inequalities for the estimation and prediction error of overlapping group Lasso method in the generalizedlinear model setting. Then, we apply these results to the so-called Logistic and Poisson regression models. It is shown that the results of the Lasso and group Lasso procedures for GLMs can be recovered by specifying the group structures in our proposed method. The effect of overlap and the performance of variable selection of our proposed method are both studied by numerical simulations. Finally, we apply our proposed method to two gene expression data sets: the p53 data and the lung cancer data.
暂无评论