检索结果-内蒙古大学图书馆

Goodness-of-fit testing in high dimensional generalized linear models

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY 2020年第3期82卷 773-795页

作者： Jankova, Jana Shah, Rajen D. Buhlmann, Peter Samworth, Richard J. Univ Cambridge Cambridge England Eidgenoss Tech Hsch Zurich Zurich Switzerland

We propose a family of tests to assess the goodness of fit of a high dimensional generalized linear model. Our framework is flexible and may be used to construct an omnibus test or directed against testing specific non-linearities and interaction effects, or for testing the significance of groups of variables. The methodology is based on extracting left-over signal in the residuals from an initial fit of a generalized linear model. This can be achieved by predicting this signal from the residuals by using modern powerful regression or machine learning methods such as random forests or boosted trees. Under the null hypothesis that the generalized linear model is correct, no signal is left in the residuals and our test statistic has a Gaussian limiting distribution, translating to asymptotic control of type I error. Under a local alternative, we establish a guarantee on the power of the test. We illustrate the effectiveness of the methodology on simulated and real data examples by testing goodness of fit in logistic regression models. Software implementing the methodology is available in the R package GRPtests.

关键词： Debiasing generalized linear models Goodness-of-fit testing Group testing High dimensional data Residual prediction

来源：评论

学校读者我要写书评

暂无评论

The existence of maximum likelihood estimate in high-dimensional binary response generalized linear models

引用

ELECTRONIC JOURNAL OF STATISTICS 2020年第2期14卷 4028-4053页

作者： Tang, Wenpin Ye, Yuting Columbia Univ Dept Ind Engn & Operat Res New York NY 10027 USA Univ Calif Berkeley Div Biostat Berkeley CA USA

Motivated by recent works on the high-dimensional logistic regression, we establish that the existence of the maximum likelihood estimate exhibits a phase transition for a wide range of generalized linear models with binary outcome and elliptical covariates. This extends a previous result of Candes and Sur who proved the phase transition for the logistic regression with Gaussian covariates. Our result reveals a rich structure in the phase transition phenomenon, which is simply overlooked by Gaussianity. The main tools for deriving the result are data separation, convex geometry and stochastic approximation. We also conduct simulation studies to corroborate our theoretical findings, and explore other features of the problem.

关键词： Elliptical distribution generalized linear models maximum likelihood estimate phase transition

来源：评论

学校读者我要写书评

暂无评论

Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2021年 157卷 107154-107154页

作者： Zhou, Ping Yu, Zhen Ma, Jingyi Tian, Maozai Fan, Ye Beijing Informat Sci & Technol Univ Sch Appl Sci Beijing Peoples R China Renmin Univ China Sch Stat Beijing Peoples R China Cent Univ Finance & Econ Sch Stat & Math Beijing Peoples R China

Nowadays, it has become increasingly common to store large-scale data sets distributedly across a great number of clients. The aim of the study is to develop a distributed estimator for generalized linear models (GLMs) in the "large n, diverging p(n)" framework with a weak assumption on the number of clients. When the dimension diverges at the rate of o(root n), the asymptotic efficiency of the global maximum likelihood estimator (MLE), the one-step MLE, and the aggregated estimating equation (AEE) estimator for GLMs are established. A novel distributed estimator is then proposed with two rounds of communication. It has the same asymptotic efficiency as the global MLE under p(n) = o(root n). The assumption on the number of clients is more relaxed than that of the AEE estimator and the proposed method is thus more practical for real-world applications. Simulations and a case study demonstrate the satisfactory finite-sample performance of the proposed estimator. (C) 2020 Elsevier B.V. All rights reserved.

关键词： generalized linear models Large-scale distributed data Asymptotic efficiency One-step MLE Diverging p

来源：评论

学校读者我要写书评

暂无评论

Tuning-free ridge estimators for high-dimensional generalized linear models

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2021年 159卷 107205-107205页

作者： Huang, Shih-Ting Xie, Fang Lederer, Johannes Ruhr Univ Bochum Dept Math D-44801 Bochum Germany

Ridge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that need to be calibrated. It is shown that ridge estimators can be modified such that tuning parameters can be avoided altogether, and the resulting estimator can improve on the prediction accuracies of standard ridge estimators combined with cross-validation. (C) 2021 Elsevier B.V. All rights reserved.

关键词： generalized linear models High-dimensional estimation Ridge estimator

来源：评论

学校读者我要写书评

暂无评论

ClusterBootstrap: An R package for the analysis of hierarchical data using generalized linear models with the cluster bootstrap

引用

BEHAVIOR RESEARCH METHODS 2020年第2期52卷 572-590页

作者： Deen, Mathijs de Rooij, Mark Leiden Univ Inst Psychol Methodol & Stat Unit Wassenaarsewegm 52 NL-2333 AK Leiden Netherlands

In the analysis of clustered or hierarchical data, a variety of statistical techniques can be applied. Most of these techniques have assumptions that are crucial to the validity of their outcome. Mixed models rely on the correct specification of the random effects structure. generalized estimating equations are most efficient when the working correlation form is chosen correctly and are not feasible when the within-subject variable is non-factorial. Assumptions and limitations of another common approach, ANOVA for repeated measurements, are even more worrisome: listwise deletion when data are missing, the sphericity assumption, inability to model an unevenly spaced time variable and time-varying covariates, and the limitation to normally distributed dependent variables. This paper introduces ClusterBootstrap, an R package for the analysis of hierarchical data using generalized linear models with the cluster bootstrap (GLMCB). Being a bootstrap method, the technique is relatively assumption-free, and it has already been shown to be comparable, if not superior, to GEE in its performance. The paper has three goals. First, GLMCB will be introduced. Second, there will be an empirical example, using the ClusterBootstrap package for a Gaussian and a dichotomous dependent variable. Third, GLMCB will be compared to mixed models in a Monte Carlo experiment. Although GLMCB can be applied to a multitude of hierarchical data forms, this paper discusses it in the context of the analysis of repeated measurements or longitudinal data. It will become clear that the GLMCB is a promising alternative to mixed models and the ClusterBootstrap package an easy-to-use R implementation of the technique.

关键词： Clustered data Hierarchical data generalized linear models Cluster bootstrap

来源：评论

学校读者我要写书评

暂无评论

Landscape Complexity for the Empirical Risk of generalized linear models 1

Landscape Complexity for the Empirical Risk of Generalized L...

引用

1st Mathematical and Scientific Machine Learning Conference, MSML 2020

作者： Maillard, Antoine Arous, Gérard Ben Biroli, Giulio Laboratoire de Physique de l'Ecole Normale Supérieure ENS Université PSL CNRS Sorbonne Université Université Paris-Diderot Sorbonne Paris Cité Paris France Courant Institute of Mathematical Sciences New York University 251 Mercer Street New YorkNY10012 United States

We present a method to obtain the average and the typical value of the number of critical points of the empirical risk landscape for generalized linear estimation problems and variants. This represents a substantial extension of previous applications of the Kac-Rice method since it allows to analyze the critical points of high dimensional non-Gaussian random functions. We obtain a rigorous explicit variational formula for the annealed complexity, which is the logarithm of the average number of critical points at fixed value of the empirical risk. This result is simplified, and extended, using the non-rigorous Kac-Rice replicated method from theoretical physics. In this way we find an explicit variational formula for the quenched complexity, which is generally different from its annealed counterpart, and allows to obtain the number of critical points for typical instances up to exponential accuracy. © 2020 A. Maillard, G.B. Arous & G. Biroli.

关键词： Empirical risk landscape generalized linear models Kac-Rice Landscape complexity

来源：评论

学校读者我要写书评

暂无评论

Restricted ridge estimator in generalized linear models: Monte Carlo simulation studies on Poisson and binomial distributed responses

引用

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION 2019年第4期48卷 1191-1218页

作者： Kurtoglu, Fikriye Ozkale, M. Revan Cukurova Univ Dept Stat Fac Sci & Letters TR-01330 Adana Turkey

It is known that collinearity among the explanatory variables in generalized linear models (GLMs) inflates the variance of maximum likelihood estimators. To overcome multicollinearity in GLMs, ordinary ridge estimator and restricted estimator were proposed. In this study, a restricted ridge estimator is introduced by unifying the ordinary ridge estimator and the restricted estimator in GLMs and its mean squared error (MSE) properties are discussed. The MSE comparisons are done in the context of first-order approximated estimators. The results are illustrated by a numerical example and two simulation studies are conducted with Poisson and binomial responses.

关键词： generalized linear models Mean squared error Multicollinearity Poisson distribution Restricted ridge estimation

来源：评论

学校读者我要写书评

暂无评论

Bayesian model selection for generalized linear models using non-local priors

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2019年 133卷 285-296页

作者： Shi, Guiling Lim, Chae Young Maiti, Tapabrata Michigan State Univ Dept Stat & Probabil E Lansing MI 48824 USA Seoul Natl Univ Dept Stat Seoul South Korea

Variable selection is currently an important research topic under both frequentist and Bayesian framework. While most developments in Bayesian model selection literature are based on a local prior on regression parameters, a nonlocal prior for model selection can be also used. In this article, we extend nonlocal prior approach to logistic regression and to generalized linear models. Laplace approximation is used in implementation to avoid integration in the likelihood. A convergence rate is derived under some regularity conditions. The selection based on a nonlocal prior eliminates unnecessary variables and recommends a simple model. The method is validated by simulation study and illustrated by a real data example. (C) 2018 Elsevier B.V. All rights reserved.

关键词： Bayesian variable selection generalized linear models Logistic regression Non-local priors

来源：评论

学校读者我要写书评

暂无评论

Overlapping group lasso for high-dimensional generalized linear models

引用

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS 2019年第19期48卷 4903-4917页

作者： Zhou, Shengbin Zhou, Jingke Zhang, Bo Harbin Normal Univ Dept Stat Harbin Heilongjiang Peoples R China Lingnan Normal Univ Sch Informat Engn 29 Cunjin Rd Zhanjiang Guangdong Peoples R China

Structured sparsity has recently been a very popular technique to deal with the high-dimensional data. In this paper, we mainly focus on the theoretical problems for the overlapping group structure of generalized linear models (GLMs). Although the overlapping group lasso method for GLMs has been widely applied in some applications, the theoretical properties about it are still unknown. Under some general conditions, we presents the oracle inequalities for the estimation and prediction error of overlapping group Lasso method in the generalized linear model setting. Then, we apply these results to the so-called Logistic and Poisson regression models. It is shown that the results of the Lasso and group Lasso procedures for GLMs can be recovered by specifying the group structures in our proposed method. The effect of overlap and the performance of variable selection of our proposed method are both studied by numerical simulations. Finally, we apply our proposed method to two gene expression data sets: the p53 data and the lung cancer data.

关键词： generalized linear models overlap sparsity Oracle inequalities high dimension

来源：评论

学校读者我要写书评

暂无评论

Maximum likelihood estimation of generalized linear models for adaptive designs: Applications and asymptotics

引用

BIOMETRICAL JOURNAL 2019年第3期61卷 630-651页

作者： Selvaratnam, Selvakkadunko Yi, Yanqing Oyet, Alwell Univ Alberta Dept Math & Stat Sci Edmonton AB Canada Mem Univ Dept Community Hlth & Humanities St John NF Canada Mem Univ Dept Math & Stat St John NF ALC 5S7 Canada

Due to increasing discoveries of biomarkers and observed diversity among patients, there is growing interest in personalized medicine for the purpose of increasing the well-being of patients (ethics) and extending human life. In fact, these biomarkers and observed heterogeneity among patients are useful covariates that can be used to achieve the ethical goals of clinical trials and improving the efficiency of statistical inference. Covariate-adjusted response-adaptive (CARA) design was developed to use information in such covariates in randomization to maximize the well-being of participating patients as well as increase the efficiency of statistical inference at the end of a clinical trial. In this paper, we establish conditions for consistency and asymptotic normality of maximum likelihood (ML) estimators of generalized linear models (GLM) for a general class of adaptive designs. We prove that the ML estimators are consistent and asymptotically follow a multivariate Gaussian distribution. The efficiency of the estimators and the performance of response-adaptive (RA), CARA, and completely randomized (CR) designs are examined based on the well-being of patients under a logit model with categorical covariates. Results from our simulation studies and application to data from a clinical trial on stroke prevention in atrial fibrillation (SPAF) show that RA designs lead to ethically desirable outcomes as well as higher statistical efficiency compared to CARA designs if there is no treatment by covariate interaction in an ideal model. CARA designs were however more ethical than RA designs when there was significant interaction.

关键词： adaptive designs clinical trials consistency generalized linear models maximum likelihood estimation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：