检索结果-内蒙古大学图书馆

Local and Overall Deviance R-Squared Measures for Mixtures of generalized linear models

JOURNAL OF CLASSIFICATION 2023年第2期40卷 233-266页

作者： Di Mari, Roberto Ingrassia, Salvatore Punzo, Antonio Univ Catania Dipartimento Econ & Impresa Catania Italy

In generalized linear models (GLMs), measures of lack of fit are typically defined as the deviance between two nested models, and a deviance-based R-2 is commonly used to evaluate the fit. In this paper, we extend deviance measures to mixtures of GLMs, whose parameters are estimated by maximum likelihood (ML) via the EM algorithm. Such measures are defined both locally, i.e., at cluster-level, and globally, i.e., with reference to the whole sample. At the cluster-level, we propose a normalized two-term decomposition of the local deviance into explained, and unexplained local deviances. At the sample-level, we introduce an additive normalized decomposition of the total deviance into three terms, where each evaluates a different aspect of the fitted model: (1) the cluster separation on the dependent variable, (2) the proportion of the total deviance explained by the fitted model, and (3) the proportion of the total deviance which remains unexplained. We use both local and global decompositions to define, respectively, local and overall deviance R-2 measures for mixtures of GLMs, which we illustrate-for Gaussian, Poisson and binomial responses-by means of a simulation study. The proposed fit measures are then used to assess, and interpret clusters of COVID-19 spread in Italy in two time points.

关键词： generalized linear models Deviance R-squared Maximum likelihood Mixture models

来源：评论

学校读者我要写书评

暂无评论

Bootstrap adjusted predictive classification for identification of subgroups with differential treatment effects under generalized linear models

引用

ELECTRONIC JOURNAL OF STATISTICS 2023年第1期17卷 548-606页

作者： Li, Na Song, Yanglei Lin, C. Devon Tu, Dongsheng Queens Univ Dept Math & Stat Kingston ON K7L 3N6 Canada Queens Univ Dept Publ Hlth Sci Canadian Canc Trials Grp Kingston ON K7L 3N6 Canada

Predictive classification considered in this paper concerns the problem of identifying subgroups based on a continuous biomarker through estimation of an unknown cutpoint and assessing whether these subgroups differ in treatment effect relative to some clinical outcome. The problem is considered under a generalized linear model framework for clinical out-comes and formulated as testing the significance of the interaction between the treatment and the subgroup indicator. When the main effect of the subgroup indicator does not exist, the cutpoint is non-identifiable under the null. Existing procedures are not adaptive to the identifiability issue, and do not work well when the main effect is small. In this work, we pro-pose profile score-type and Wald-type test statistics, and further m-out-of -n bootstrap techniques to obtain their critical values. The proposed proce-dures do not rely on the knowledge about the model identifiability, and we establish their asymptotic size validity and study the power under local alternatives in both cases. Further, we show that the standard bootstrap is inconsistent for the non-identifiable case. Simulation results corroborate our theory, and the proposed method is applied to a dataset from a clinical trial on advanced colorectal cancer.

关键词： generalized linear models m-out-of-n boot-strap nonstandard asymptotics predictive classification

来源：评论

学校读者我要写书评

暂无评论

Comparative Performance Analysis for Poisson and Poisson-Inverse generalized linear models on Supply Chain Management 7th

Comparative Performance Analysis for Poisson and Poisson-Inv...

引用

7th International Conference on Logistics Operations Management (GOL)

作者： Fri, Mouhsene Rouky, Naoufal Mselmi, Farouk Euromed Univ Fes Euromed Res Ctr Euromed Polytech Sch Fes Morocco Hassan First Univ Fac Sci & Technol Settat Settat Morocco

ISBN: (纸本)9783031686276;9783031686283

This research paper provides a comprehensive analysis of three distinct generalized linear models (GLMs): the traditional linear regression, the Poisson GLM, and the Poisson-Inverse Gaussian GLM. The study applies these models to the domain of Supply Chain Management for product demand modeling. To evaluate the goodness of fit of our models, we assess them by comparing their performance against the associated deviance function. Our findings indicate that the Poisson-Inverse Gaussian GLM outperforms both the Poisson GLM and the linear regression model in terms of goodness of fit.

关键词： generalized linear models Machine Learning Demand Modeling Smart Supply Chain

来源：评论

学校读者我要写书评

暂无评论

Distill knowledge of additive tree models into generalized linear models: a new learning approach for non-smooth generalized additive models

引用

ANNALS OF ACTUARIAL SCIENCE 2024年第3期18卷 692-711页

作者： Maillart, Arthur Robert, Christian Detralytics Saint Josse Ten Noode Belgium Univ Lyon 1 Inst Sci Financiere & Assurances Lyon France Ctr Res Econ & Stat Lab Finance & Insurance LFA CREST Paris France

generalized additive models (GAMs) are a leading model class for interpretable machine learning. GAMs were originally defined with smooth shape functions of the predictor variables and trained using smoothing splines. Recently, tree-based GAMs where shape functions are gradient-boosted ensembles of bagged trees were proposed, leaving the door open for the estimation of a broader class of shape functions (e.g. Explainable Boosting Machine (EBM)). In this paper, we introduce a competing three-step GAM learning approach where we combine (i) the knowledge of the way to split the covariates space brought by an additive tree model (ATM), (ii) an ensemble of predictive linear scores derived from generalized linear models (GLMs) using a binning strategy based on the ATM, and (iii) a final GLM to have a prediction model that ensures auto-calibration. Numerical experiments illustrate the competitive performances of our approach on several datasets compared to GAM with splines, EBM, or GLM with binarsity penalization. A case study in trade credit insurance is also provided.

关键词： Additive tree ensembles auto-calibration generalized additive models generalized linear models partitioning methods XAI

来源：评论

学校读者我要写书评

暂无评论

Fast Optimal Subsampling Probability Approximation for generalized linear models

引用

ECONOMETRICS AND STATISTICS 2024年 29卷 224-237页

作者： Lee, JooChul Schifano, Elizabeth D. Wang, HaiYing Univ Connecticut Dept Stat Storrs CT 06269 USA

For massive data, subsampling techniques are popular to mitigate computational burden by reducing the data size. In a subsampling approach, subsampling probabilities for each data point are specified to obtain an informative sub-data, and then estimates based on the sub-data are obtained to approximate estimates from the full data. Assigning subsampling probabilities based on minimization of the asymptotic mean squared error of the estimator from a general subsample (A-optimality criterion) is a popular approach, however, it is still computationally demanding to calculate the probabilities under this setting. To efficiently approximate the A-optimal subsampling probabilities for generalized linear models, randomized algorithms are proposed. To develop the algorithms, the Johnson-Lindenstrauss Transform and Subsampled Randomized Hadamard Transform are used. Additionally, optimal subsampling probabilities are derived for the Gaussian linear model in the case where both the regression coefficients and dispersion parameter are of interest, and algorithms are developed to approximate the optimal subsampling probabilities. Simulation studies indicate that the estimators based on the developed algorithms have excellent performance for statistical inference and have substantial savings in computing time compared to the direct calculation of the A-optimal subsampling probabilities.(c) 2021 EcoSta Econometrics and Statistics. Published by Elsevier B.V. All rights reserved.

关键词： generalized linear models Massive data Optimal subsampling Randomized algorithm

来源：评论

学校读者我要写书评

暂无评论

Sparsifying generalized linear models 2024

Sparsifying Generalized Linear Models

引用

56th Annual ACM Symposium on Theory of Computing (STOC)

作者： Jambulapati, Arun Lee, James R. Liu, Yang P. Sidford, Aaron Simons Inst Theory Comp Berkeley CA 94720 USA Univ Washington Seattle WA 98195 USA Inst Adv Study Olden Lane Princeton NJ 08540 USA Stanford Univ Stanford CA 94305 USA

We consider the sparsification of sums F : R-n -> R+ where F(x) = f(1)() + ... + f(m)() for vectors a(1), . . . ,a(m) is an element of R-n and functions f(1), . . . , f(m) : R -> R+. We show that (1+epsilon)-app... 详细信息

ISBN: (纸本)9798400703836

We consider the sparsification of sums F : R-n -> R+ where F(x) = f(1)(< a(1), x >) + ... + f(m)(< a(m),x >) for vectors a(1), . . . ,a(m) is an element of R-n and functions f(1), . . . , f(m) : R -> R+. We show that (1+epsilon)-approximate sparsifiers of F with support size n/epsilon(2) (log n/epsilon)(O(1)) exist whenever the functions f(1), . . . , f(m) are symmetric, monotone, and satisfy natural growth bounds. Additionally, we give efficient algorithms to compute such a sparsifier assuming each f(i) can be evaluated efficiently. Our results generalize the classical case of l(p) sparsification, where f(i)(z) = vertical bar z vertical bar(p), for p is an element of (0, 2], and give the first near-linear size sparsifiers in the well-studied setting of the Huber loss function and its generalizations, e.g., f(i)(z) = min{vertical bar z vertical bar(p), vertical bar z vertical bar(2)} for 0 < p <= 2. Our sparsification algorithm can be applied to give near-optimal reductions for optimizing a variety of generalized linear models including l(p) regression for p is an element of (1, 2] to high accuracy, via solving (logn)(O(1)) sparse regression instances with m <= n(log n)(O(1)), plus runtime proportional to the number of nonzero entries in the vectors a(1), . . . , a(m).

关键词： generalized linear models sparsification chaining Lewis weights

来源：评论

学校读者我要写书评

暂无评论

Optimal Combination of linear and Spectral Estimators for generalized linear models

引用

FOUNDATIONS OF COMPUTATIONAL MATHEMATICS 2022年第5期22卷 1513-1566页

作者： Mondelli, Marco Thrampoulidis, Christos Venkataramanan, Ramji IST Austria Campus 1 A-3400 Klosterneuburg Austria Univ British Columbia Dept Elect & Comp Engn 5500-2332 Main Mall Vancouver BC V6T 1Z4 Canada Univ Cambridge Dept Engn Trumpington St Cambridge CB2 1PZ England

We study the problem of recovering an unknownsignal x givenmeasurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator (x) over cap (L) and a spectral estimator (x) over cap (s). The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show howto optimally combine (x) over cap (L) and (x) over cap (s). At the heart of our analysis is the exact characterization of the empirical joint distribution of (x, (x) over cap (L), (x) over cap (s)) in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of (x) over cap (L) and (x) over cap (s), given the limiting distribution of the signal x. When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form. (x) over cap (L) + (x) over cap (s) and we derive the optimal combination coefficient. In order to establish the limiting distribution of ( x, (x) over cap (L), (x) over cap (s)), we design and analyze an approximate message passing algorithm whose iterates give (x) over cap (L) and approach (x) over cap (s). Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.

关键词： linear estimator Spectral estimator generalized linear models Bayes optimality Approximate message passing Weak recovery

来源：评论

学校读者我要写书评

暂无评论

generalized linear models in non-interactive local differential privacy with public data

The Journal of Machine Learning Research

引用

The Journal of Machine Learning Research 2023年第1期24卷 6027-6083页

作者： Di Wang Lijie Hu Huanyu Zhang Marco Gaboardi Jinhui Xu CEMSE King Abdullah University of Science and Technology Thuwal Saudi Arabia Meta New York NY Department of Computer Science Boston University Boston MA Department of Computer Science and Engineering University at Buffalo SUNY Buffalo NY

In this paper, we study the problem of estimating smooth generalized linear models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. Unlike its classical setting, our model allows the server to access additional public but unlabeled data. In the first part of the paper, we focus on GLMs. Specifically, we first consider the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by the Stein's lemma, we present an (ε, δ)-NLDP algorithm for GLMs. Moreover, the sample complexity of public and private data for the algorithm to achieve an ℓ2-norm estimation error of α (with high probability) is O(pα-2) and Õ(p3α-2ε-2) respectively, where p is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in α-1, or exponential in p sample complexities of GLMs with no public data. Then we consider a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded ℓ1-norm. Based on a variant of Stein's lemma, we propose an (ε, δ)-NLDP algorithm for GLMs whose sample complexity of public and private data to achieve an ℓ∞-norm estimation error of α is O(p2α-2) and Õ(p2α-2ε-2) respectively, under some mild assumptions and if α is not too small (i.e., α≥Ω(1/√p)). In the second part of the paper, we extend our idea to the problem of estimating non-linear regressions and show similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real-world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLMs and non-linear regressions in the NLDP model with unlabeled public data.

关键词： differential privacy generalized linear models local differential privacy

来源：评论

学校读者我要写书评

暂无评论

Post-selection inference of generalized linear models based on the lasso and the elastic net

引用

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS 2022年第14期51卷 4739-4756页

作者： Shi, Xiang-yu Liang, Bo Zhang, Qi Qingdao Univ Sch Math & Stat Qingdao 266071 Peoples R China

Post-selection inference has been an active research topic recently. A lot of work provided different ways to solve practical problems in many fields such as medicine, finance, and so on. In particular, post-selection inference under the linear model is widely discussed. We extend it to generalized linear model and present new approaches for post-selection inference for penalized least squares method. The core of this framework is the distribution function of the post-selection estimation conditioned on the selection event. Then, lasso and elastic net are used to select models to construct the effective confidence interval of the selected coefficient. The theoretical results and the numerical comparisons show that our methods are better than the existing ones. Finally, the proposed methods are applied to the analysis of real data sets.

关键词： Post-selection inference generalized linear models confidence interval lasso elastic net

来源：评论

学校读者我要写书评

暂无评论

Distribution-Independent Regression for generalized linear models with Oblivious Corruptions 36

Distribution-Independent Regression for Generalized Linear M...

引用

36th Annual Conference on Learning Theory (COLT)

作者： Diakonikolas, Ilias Karmalkar, Sushrut Park, Jongho Tzamos, Christos Univ Wisconsin Madison Madison WI 53706 USA KRAFTON Inc Seongnam South Korea

We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples (x, y) where y is a noisy measurement of g(w* center dot x). In particular, y = g(w* center dot x) + xi + epsilon where. is the oblivious noise drawn independently of x, satisfying Pr[xi = 0] =>= o(1), and epsilon similar to N(0, sigma(2)). Our goal is to accurately recover a function g(w center dot x) with arbitrarily small error when compared to the true values g(w* center dot x), rather than the noisy measurements y. We present an algorithm that tackles the problem in its most general distribution-independent setting, where the solution may not be identifiable. The algorithm is designed to return the solution if it is identifiable, and otherwise return a small list of candidates, one of which is close to the true solution. Furthermore, we characterize a necessary and sufficient condition for identifiability, which holds in broad settings. The problem is identifiable when the quantile at which xi + epsilon = 0 is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated g(w* center dot x) + A for some real number A, while also having large error when compared to g(w* center dot x). This is the first result for GLM regression which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression with oblivious noise, and giving algorithms under more restrictive assumptions.

关键词： Oblivious noise Regression generalized linear models

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：