检索结果-内蒙古大学图书馆

generalized linear models in Non-interactive Local Differential Privacy with Public Data

JOURNAL OF MACHINE LEARNING RESEARCH 2023年第1期24卷 1-57页

作者： Wang, Di Hu, Lijie Zhang, Huanyu Gaboardi, Marco Xu, Jinhui King Abdullah Univ Sci & Technol CEMSE Thuwal Saudi Arabia Meta New York NY USA Boston Univ Dept Comp Sci Boston MA 02215 USA SUNY Buffalo Dept Comp Sci & Engn Buffalo NY 14260 USA

In this paper, we study the problem of estimating smooth generalized linear models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Unlike its classical setting, our model allows the server to access additional public but unlabeled data. In the first part of the paper, we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an (epsilon, delta)-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an l(2)-norm estimation error of alpha (with high probability) is O(p alpha(-2)) and (O) over tilde (p(3)alpha(-2) epsilon(-2)) respectively, where p is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in alpha-1, or exponential in p sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded l(1)-norm. Based on a variant of Stein's lemma, we propose an (epsilon, delta)-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an l(infinity)-norm estimation error of alpha is O(p(2)alpha(-2)) and (O) over tilde (p(2)alpha(-2) epsilon(-2)) respectively, under some mild assumptions and if alpha is not too small (i.e., alpha >= Omega( 1/root p )). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLMs and non-linear regressions in the NLDP model with unlabeled public

关键词： Differential Privacy generalized linear models Local Differential Privacy

来源：评论

学校读者我要写书评

暂无评论

generalized linear models for massive data via doubly-sketching

引用

STATISTICS AND COMPUTING 2023年第5期33卷 1-19页

作者： Hou-Liu, Jason Browne, Ryan P. Univ Waterloo Dept Stat & Actuarial Sci 200 Univ Ave West Waterloo ON N2L 3G1 Canada

generalized linear models are a popular analytics tool with interpretable results and broad applicability, but require iterative estimation procedures that impose data transfer and computational costs that can be problematic under some infrastructure constraints. We propose a doubly-sketched approximation of the iteratively re-weighted least squares algorithm to estimate generalized linear model parameters using a sequence of surrogate datasets. The procedure sketches once to reduce data transfer costs, and sketches again to reduce data computation costs, yielding wall-clock time savings. Regression coefficients and standard errors are produced, with comparison against literature methods. Asymptotic properties of the proposed procedure are shown, with empirical results from simulated and real-world datasets. The efficacy of the proposed method is investigated across a variety of commodity computational infrastructure configurations accessible to practitioners. A highlight of the present work is the estimation of a Poisson-log generalized linear model across almost 1.7 billion observations on a personal computer in 25 min.

关键词： generalized linear models Stochastic approximation Subsampling Sketching Database systems

来源：评论

学校读者我要写书评

暂无评论

A binarization approach to model interactions between categorical predictors in generalized linear models

引用

APPLIED INTELLIGENCE 2024年第17-18期54卷 7969-7981页

作者： Carrizosa, Emilio Restrepo, Marcela Galvis Morales, Dolores Romero Univ Seville Inst Matemat Seville Spain Copenhagen Metropolitan Area DEAS Grp Frederiksberg Denmark Copenhagen Business Sch Frederiksberg Denmark

In this paper, our goal is to enhance the interpretability of generalized linear models by identifying the most relevant interactions between categorical predictors. Searching for interaction effects can quickly become a highly combinatorial, and thus computationally costly, problem when we have many categorical predictors or even a few of them but with many categories. Moreover, the estimation of coefficients requires large training samples with enough observations for each interaction between categories. To address these bottlenecks, we propose to find a reduced representation for each categorical predictor as a binary predictor, where categories are clustered based on a dissimilarity. We provide a collection of binarized representations for each categorical predictor, where the dissimilarity takes into account information from the main effects and the interactions. The choice of the binarized predictors representing the categorical predictors is made with a novel heuristic procedure that is guided by the accuracy of the so-called binarized model. We test our methodology on both real-world and simulated data, illustrating that, without damaging the out-of-sample accuracy, our approach trains sparse models including only the most relevant interactions between categorical predictors.

关键词： generalized linear models Interpretability Categorical predictors Interactions Clustering of categories

来源：评论

学校读者我要写书评

暂无评论

Power analysis for zero-inflated Poisson and negative binomial generalized linear models using Monte Carlo simulation

引用

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2025年第4期95卷 868-885页

作者： Dennis, Trent Rubin-McGregor, Jordan O'Connell, Michael J. Miami Univ Dept Stat 105 Tallawanda Rd Oxford OH 45056 USA Miami Univ Dept Psychol Oxford OH USA

Many research fields involve count data with zero inflation. A commonly chosen model for analysing a relationship between predictors and a response variable in these scenarios is a zero-inflated generalized linear model (GLM). This model is a mixture of a count-based GLM and a zero-inflation component, with a mixing proportion that determines the amount of excess zeroes. As the use of zero-inflated count models is rising, it is important to be able to conduct a power analysis to properly design studies with such models. In this paper, we propose a flexible method for power analysis with zero-inflated count models using Monte Carlo simulation. We have created the R package ZIPowerAnalysis, which can be used to easily conduct a power analysis for any designed study that will incorporate a zero-inflated count GLM.

关键词： Power analysis zero-inflated count models generalized linear models simulation

来源：评论

学校读者我要写书评

暂无评论

Prediction can be safely used as a proxy for explanation in causally consistent Bayesian generalized linear models

引用

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2025年第6期95卷 1226-1249页

作者： Scholz, Maximilian Buerkner, Paul-Christian Univ Stuttgart Cluster Excellence SimTech Univ Str 32 D-70569 Stuttgart Germany

Bayesian modeling provides a principled approach to quantifying uncertainty and has seen a surge of applications in recent years. Within the context of a Bayesian workflow, we are concerned with model selection for the purpose of finding models that best explain the data or underlying data generating process. Since insight into the true process is rare, what remains is incomplete causal knowledge and model predictions of the data. This leads to the important question of when the use of prediction as a proxy for explanation for the purpose of model selection is valid. We approach this question by means of large-scale simulations of Bayesian generalized linear models where we investigate various causal and statistical misspecifications. Our results indicate that the use of prediction as proxy for explanation is valid and safe if the models under consideration are sufficiently consistent with the underlying causal structure of the true data generating process.

关键词： Bayesian workflow causal inference explanation prediction generalized linear models simulation study

来源：评论

学校读者我要写书评

暂无评论

Distributed variable screening for generalized linear models

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2025年 211卷

作者： Diao, Tianbo Li, Bo Qu, Lianqiang Sun, Liuquan Cent China Normal Univ Sch Math & Stat Wuhan 430079 Hubei Peoples R China Nanyang Inst Technol Sch Math & Sci Nanyang 473004 Henan Peoples R China Chinese Acad Sci Acad Math & Syst Sci Beijing 100190 Peoples R China Univ Chinese Acad Sci Sch Math Sci Beijing 100190 Peoples R China

In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.

关键词： Distributed learning generalized linear models Massive data Variable screening

来源：评论

学校读者我要写书评

暂无评论

Penalized Lq-likelihood estimator and its influence function in generalized linear models

引用

METRIKA 2025年第1期88卷 1-18页

作者： Hu, Hongchang Liu, Mingqiu Zeng, Zhen Hubei Normal Univ Sch Math & Stat Huangshi 435002 Peoples R China Nanjing Univ Finance & Econ Dept Appl Math Nanjing 210023 Peoples R China

Consider the following generalized linear model (GLM) yi = h(x(i)(T) beta) + e(i), i = 1, 2,..., n, where h(.) is a continuous differentiable function, {e(i)} are independent identically distributed (i.i.d.) random variables with zero mean and known variance sigma(2). Based on the penalized Lq-likelihood method of linear regression models, we apply the method to the GLM, and also investigate Oracle properties of the penalized Lq-likelihood estimator (PLqE). In order to show the robustness of the PLqE, we discuss influence function of the PLqE. Simulation results support the validity of our approach. Furthermore, it is shown that the PLqE is robust, while the penalized maximum likelihood estimator is not.

关键词： generalized linear models Penalized Lq-likelihood estimator Oracle property Influence function

来源：评论

学校读者我要写书评

暂无评论

引用

STATISTICS & PROBABILITY LETTERS 2024年 211卷

作者： Geng, Shuli Zhang, Lixin Zhejiang Univ Sch Math Sci Hangzhou 31000 ZJ Peoples R China

This paper focuses on decorrelated empirical likelihood -based inference for longitudinal data with ultrahigh -dimensional covariates. The primary issues we aim to address involve parameter estimation and hypothesis testing for a low -dimensional parameter of interest. Under the framework of the generalized linear model, we initially consider the within -subject correlation by linearizing the precision matrix with certain known matrices, which retains optimality even if the working correlated structure is misspecified. Coupled with the decorrelated matrix, we then eliminate the influence of nuisance parameters on the estimation procedure. The proposed approach not only yields more efficient estimators compared to generalized decorrelated estimating equations but also shares the same asymptotic variance as quadratic decorrelated inference function based methods. Furthermore, we define the decorrelated empirical loglikelihood ratio test statistic to assess the significance of regression coefficients. Finally, to evaluate the performance of the proposed procedure, we conduct simulation studies and apply it to a real data example.

关键词： generalized linear models Empirical likelihood Decorrelated matrix Quadratic inference functions High-dimensional longitudinal data

来源：评论

学校读者我要写书评

暂无评论

Outcome dependent subsampling divide and conquer in generalized linear models for massive data

引用

JOURNAL OF STATISTICAL PLANNING AND INFERENCE 2025年 237卷

作者： Yin, Jie Ding, Jieli Yang, Changming Wuhan Univ Sch Math & Stat Wuhan 430072 Hubei Peoples R China BeiGene Beijing Co Ltd Wuhan Branch Wuhan 430000 Hubei Peoples R China

In order to break the constraints and barriers caused by limited computing power in processing massive datasets, we propose an outcome dependent subsampling divide and conquer strategy in this paper. The proposed strategy can process data on multiple blocks in parallel and concentrate the computing resources of each block on regions with the most information. We develop a distributed statistical inference method and propose a computation-efficient algorithm in the generalized linear models for massive data. The proposed method only need to preserve some summary statistics from each data block and then use them to directly construct the proposed estimator. The asymptotic properties of the proposed method are established. Simulation studies and real data analysis are conducted to illustrate the merits of the proposed method.

关键词： Divide and conquer Biased subsampling Massive data Semiparametric empirical likelihood generalized linear models

来源：评论

学校读者我要写书评

暂无评论

generalized case-control sampling under generalized linear models

引用

BIOMETRICS 2023年第1期79卷 332-343页

作者： Maronge, Jacob M. Tao, Ran Schildcrout, Jonathan S. Rathouz, Paul J. Univ Texas MD Anderson Canc Ctr Dept Biostat Houston TX 77030 USA Vanderbilt Univ Med Ctr Dept Biostat Nashville TN USA Vanderbilt Univ Vanderbilt Genet Inst Med Ctr Nashville TN USA Univ Texas Austin Med Sch Dept Populat Hlth Austin TX 78712 USA

A generalized case-control (GCC) study, like the standard case-control study, leverages outcome-dependent sampling (ODS) to extend to nonbinary responses. We develop a novel, unifying approach for analyzing GCC study data using the recently developed semiparametric extension of the generalized linear model (GLM), which is substantially more robust to model misspecification than existing approaches based on parametric GLMs. For valid estimation and inference, we use a conditional likelihood to account for the biased sampling design. We describe analysis procedures for estimation and inference for the semiparametric GLM under a conditional likelihood, and we discuss problems with estimation and inference under a conditional likelihood when the response distribution is misspecified. We demonstrate the flexibility of our approach over existing ones through extensive simulation studies, and we apply the methodology to an analysis of the Asset and Health Dynamics Among the Oldest Old study, which motives our research. The proposed approach yields a simple yet versatile solution for handling ODS in a wide variety of possible response distributions and sampling schemes encountered in practice.

关键词： conditional likelihood efficiency generalized case-control studies generalized linear models outcome-dependent sampling

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：