检索结果-内蒙古大学图书馆

generalized linear models for symbolic polygonal data

KNOWLEDGE-BASED SYSTEMS 2024年 290卷

作者： do Nascimento, Rafaella L. S. de Souza, Renata M. C. R. Cysneiros, Francisco Jose de A. Univ Fed Pernambuco Ctr Informat Recife Brazil Univ Fed Pernambuco Dept Estat Recife Brazil

Symbolic data analysis data has provided several advances in regression models concerning the type of symbolic variable. Due to the advantages of using symbolic polygonal data, this paper introduces a linear regression approach for polygonal data based on the generalize linear model theory that provides a unified method to broad range of modeling problems for different types of response as asymmetric continuous and discrete. Ordinary polygonal residuals and a way for finding model inadequacies are presented. Moreover, a quality measure of fit for polygons is also proposed in this paper. Experimental evaluation results illustrate the usefulness of the proposed approach regarding synthetic and real polygonal data.

关键词： generalized linear models Symbolic data analysis Polygonal data Residual analysis

来源：评论

学校读者我要写书评

暂无评论

Transfer Learning Under High-Dimensional generalized linear models

引用

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2023年第544期118卷 2684-2697页

作者： Tian, Ye Feng, Yang Columbia Univ Dept Stat New York NY USA NYU Sch Global Publ Hlth Dept Biostat New York NY 10012 USA

In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its l(1)/l(2)-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and sources are sufficiently close to each other, these bounds could be improved over those of the classical penalized estimator using only target data under mild conditions. When we don't know which sources to transfer, an algorithm-free transferable source detection approach is introduced to detect informative sources. The detection consistency is proved under the high-dimensional GLM transfer learning setting. We also propose an algorithm to construct confidence intervals of each coefficient component, and the corresponding theories are provided. Extensive simulations and a real-data experiment verify the effectiveness of our algorithms. We implement the proposed GLM transfer learning algorithms in a new R package glmtrans, which is available on CRAN. Supplementary materials for this article are available online.

关键词： generalized linear models High-dimensional inference Lasso Negative transfer Sparsity Transfer learning

来源：评论

学校读者我要写书评

暂无评论

A general adaptive ridge regression method for generalized linear models: an iterative re-weighting approach

引用

COMMUNICATIONS IN STATISTICS-THEORY AND METHODS 2023年第18期52卷 6420-6443页

作者： Guo, Zijun Chen, Mengxing Fan, Yali Song, Yan Univ Shanghai Sci & Technol Coll Sci Shanghai 200093 Peoples R China Univ Shanghai Sci & Technol Dept Control Sci & Engn Shanghai Peoples R China

This article is concerned with the problem of variable selection and estimation for high dimensional generalized linear models. In this article, we introduce a general iteratively reweighted adaptive ridge regression method (GAR). We show that the GAR estimator possesses oracle property and grouping effect. A data-driven parameter gamma is introduced in the GAR method to adapt the different cases of the true model. Then, such an adaptive parameter gamma is adequately taken into consideration to establish a gamma-dependent sufficient condition to guarantee the oracle property and the grouping effect. Furthermore, to apply the GAR method more efficiently, a coordinate-wise Newton algorithm is employed to successfully avoid the inverse matrix operation and the numerical instability caused by iteration. Extensive numerical simulation results show that the GAR method outperforms the commonly used methods, and the GAR method is tested on the gastric cancer dataset for further illustration.

关键词： Variable selection oracle property grouping effect generalized linear models adaptive ridge regression

来源：评论

学校读者我要写书评

暂无评论

Distributed adaptive lasso penalized generalized linear models for big data

引用

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION 2023年第4期52卷 1679-1698页

作者： Fan, Ye Fan, Suning Cent Univ Finance & Econ Sch Stat & Math Beijing 100081 Peoples R China Shanxi Univ Sch Math Sci Taiyuan Peoples R China

Adaptive lasso penalized generalized linear models (GLMs) are a powerful tool for analyzing the high-dimensional sparse data where the classical linear or normal assumption is not met. In non-distributed environments, the estimation problem of adaptive lasso penalized GLMs is often solved by the coordinate descent based algorithm developed in Friedman, Hastie, and Tibshirani (2010), which has been well implemented in the R package glmnet. However, when applied to distributed big data, this algorithm is usually inflexible or even infeasible due to its non-parallel implementation, especially when the communication costs between the central and local machines are expensive, or the storage and computing capabilities of the central machine are insufficient. In this paper, we propose a new method, QAGLM-alasso, for the adaptive lasso penalized GLMs problem in distributed big data by applying the quadratic approximation representation of GLMs, and further develop a path-following algorithm for its estimation based on the Least Angle Regression (LARS). Theoretical analyses show that, under mild regularity conditions, the QAGLM-alasso enjoys the oracle property, and the obtained estimator is asymptotically equivalent to the original adaptive lasso. Simulation studies demonstrate that the new algorithm has similar estimation accuracy with glmnet, but is significantly faster than glmnet in distributed environments. We further illustrate the practical performance of the proposed method by analyzing a supersymmetric (SUSY) benchmark data set.

关键词： Adaptive lasso Distributed big data generalized linear models Parallel implementation Quadratic approximation Regularization path

来源：评论

学校读者我要写书评

暂无评论

Large-Scale generalized linear models for Longitudinal Data with Grouped Patterns of Unobserved Heterogeneity

引用

JOURNAL OF BUSINESS & ECONOMIC STATISTICS 2023年第3期41卷 983-994页

作者： Ando, Tomohiro Bai, Jushan Univ Melbourne Melbourne Business Sch 200 Leicester St Carlton Vic 3053 Australia Columbia Univ Dept Econ New York NY 10027 USA

This article provides methods for flexibly capturing unobservable heterogeneity from longitudinal data in the context of an exponential family of distributions. The group memberships of individual units are left unspecified, and their heterogeneity is influenced by group-specific unobservable factor structures. The model includes, as special cases, probit, logit, and Poisson regressions with interactive fixed effects along with unknown group membership. We discuss a computationally efficient estimation method and derive the corresponding asymptotic theory. Uniform consistency of the estimated group membership is established. To test heterogeneous regression coefficients within groups, we propose a Swamy-type test that allows for unobserved heterogeneity. We apply the proposed method to the study of market structure of the taxi industry in New York City. Our method unveils interesting and important insights from large-scale longitudinal data that consist of over 450 million data points.

关键词： Clustering Factor analysis generalized linear models Interactive fixed effects Longitudinal data Unobserved heterogeneity

来源：评论

学校读者我要写书评

暂无评论

Penalized empirical likelihood for high-dimensional generalized linear models with longitudinal data

引用

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2023年第10期93卷 1515-1531页

作者： Chen, Xia Tan, Xiaoyan Yan, Li Shaanxi Normal Univ Sch Math & Stat Xian 710119 Peoples R China

In this paper, we consider the application of penalized empirical likelihood to the high-dimensional generalized linear models with longitudinal data. Under regular conditions, it is shown that the penalized empirical likelihood has the oracle property. That is, with probability converging to one, the penalized empirical likelihood identifies the true model and estimates the nonzero coefficients as efficiently as if the sparsity of the true model was known in advance. Also, we find the asymptotic distribution of the penalized empirical likelihood ratio test statistic is the chi-square distribution. Some simulations and a real data analysis are conducted to illustrate the proposed method.

关键词： Penalized empirical likelihood generalized linear models high-dimensional longitudinal data variable selection hypothesis testing

来源：评论

学校读者我要写书评

暂无评论

Online inference in high-dimensional generalized linear models with streaming data

引用

ELECTRONIC JOURNAL OF STATISTICS 2023年第2期17卷 3443-3471页

作者： Luo, Lan Han, Ruijian Lin, Yuanyuan Huang, Jian Rutgers Sch Publ Hlth Dept Biostat & Epidemiol Piscataway NJ USA Hong Kong Polytech Univ Dept Appl Math Hong Kong Peoples R China Chinese Univ Hong Kong Dept Stat Hong Kong Peoples R China

In this paper we develop an online statistical inference approach for high-dimensional generalized linear models with streaming data for realtime estimation and inference. We propose an online debiased lasso method that aligns with the data collection scheme of streaming data. Online debiased lasso differs from offline debiased lasso in two important aspects. First, it updates component-wise confidence intervals of regression coefficients with only summary statistics of the historical data. Second, online debiased lasso adds an additional term to correct approximation errors accumulated throughout the online updating procedure. We show that our proposed online debiased estimators in generalized linear models are asymptotically normal. This result provides a theoretical basis for carrying out real-time interim statistical inference with streaming data. Extensive numerical experiments are conducted to evaluate the performance of our proposed online debiased lasso method. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. Furthermore, we illustrate the application of our method with a high-dimensional text dataset.

关键词： Confidence interval generalized linear models online debiased lasso high-dimensional data

来源：评论

学校读者我要写书评

暂无评论

Model misspecification and robust analysis for outcome-dependent sampling designs under generalized linear models

引用

STATISTICS IN MEDICINE 2023年第9期42卷 1338-1352页

作者： Maronge, Jacob M. Schildcrout, Jonathan S. Rathouz, Paul J. Univ Texas Dept Biostat MD Anderson Canc Ctr Houston TX USA Vanderbilt Univ Dept Biostat Med Ctr Nashville TN USA Univ Texas Austin Dept Populat Hlth Med Sch Austin TX USA Univ Texas Dept Biostat MD Anderson Canc Ctr 1400 Pressler St Houston TX 77030 USA

Outcome-dependent sampling (ODS) is a commonly used class of sampling designs to increase estimation efficiency in settings where response information (and possibly adjuster covariates) is available, but the exposure is expensive and/or cumbersome to collect. We focus on ODS within the context of a two-phase study, where in Phase One the response and adjuster covariate information is collected on a large cohort that is representative of the target population, but the expensive exposure variable is not yet measured. In Phase Two, using response information from Phase One, we selectively oversample a subset of informative subjects in whom we collect expensive exposure information. Importantly, the Phase Two sample is no longer representative, and we must use ascertainment-correcting analysis procedures for valid inferences. In this paper, we focus on likelihood-based analysis procedures, particularly a conditional-likelihood approach and a full-likelihood approach. Whereas the full-likelihood retains incomplete Phase One data for subjects not selected into Phase Two, the conditional-likelihood explicitly conditions on Phase Two sample selection (ie, it is a "complete case" analysis procedure). These designs and analysis procedures are typically implemented assuming a known, parametric model for the response distribution. However, in this paper, we approach analyses implementing a novel semi-parametric extension to generalized linear models (SPGLM) to develop likelihood-based procedures with improved robustness to misspecification of distributional assumptions. We specifically focus on the common setting where standard GLM distributional assumptions are not satisfied (eg, misspecified mean/variance relationship). We aim to provide practical design guidance and flexible tools for practitioners in these settings.

关键词： efficiency generalized linear models outcome-dependent sampling semi-parametric models two-phase studies

来源：评论

学校读者我要写书评

暂无评论

Efficient multiple change point detection for high-dimensional generalized linear models

引用

CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE 2023年第2期51卷 596-629页

作者： Wang, Xianru Liu, Bin Zhang, Xinsheng Liu, Yufeng Fudan Univ Sch Management Dept Stat & Data Sci Shanghai Peoples R China Univ N Carolina Carolina Ctr Genome Sci Linberger Comprehens Canc Ctr Dept Stat & Operat ResDept GenetDept Biostat Chapel Hill NC 27515 USA

Change point detection for high-dimensional data is an important yet challenging problem for many applications. In this article, we consider multiple change point detection in the context of high-dimensional generalized linear models, allowing the covariate dimension p to grow exponentially with the sample size n. The model considered is general and flexible in the sense that it covers various specific models as special cases. It can automatically account for the underlying data generation mechanism without specifying any prior knowledge about the number of change points. Based on dynamic programming and binary segmentation techniques, two algorithms are proposed to detect multiple change points, allowing the number of change points to grow with n. To further improve the computational efficiency, a more efficient algorithm designed for the case of a single change point is proposed. We present theoretical properties of our proposed algorithms, including estimation consistency for the number and locations of change points as well as consistency and asymptotic distributions for the underlying regression coefficients. Finally, extensive simulation studies and application to the Alzheimer's Disease Neuroimaging Initiative data further demonstrate the competitive performance of our proposed methods.

关键词： Binary segmentation dynamic programming generalized linear models high dimensions

来源：评论

学校读者我要写书评

暂无评论

Penalized likelihood ratio test for a biomarker threshold effect in clinical trials based on generalized linear models

引用

CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE 2023年第1期51卷 199-215页

作者： Gavanji, Parisa Jiang, Wenyu Chen, Bingshu E. Queens Univ Dept Math & Stat Kingston ON Canada Queens Univ Dept Publ Hlth Sci Kingston ON Canada Queens Univ Canadian Canc Trials Grp Kingston ON Canada

In a clinical trial, the responses to the new treatment may vary among patient subsets with different characteristics in a biomarker. It is often necessary to examine whether there is a cutpoint for the biomarker that divides the patients into two subsets of those with more favourable and less favourable responses. More generally, we approach this problem as a test of homogeneity in the effects of a set of covariates in generalized linear regression models. The unknown cutpoint results in a model with nonidentifiability and a nonsmooth likelihood function to which the ordinary likelihood methods do not apply. We first use a smooth continuous function to approximate the indicator function defining the patient subsets. We then propose a penalized likelihood ratio test to overcome the model irregularities. Under the null hypothesis, we prove that the asymptotic distribution of the proposed test statistic is a mixture of chi-squared distributions. Our method is based on established asymptotic theory, is simple to use, and works in a general framework that includes logistic, Poisson, and linear regression models. In extensive simulation studies, we find that the proposed test works well in terms of size and power. We further demonstrate the use of the proposed method by applying it to clinical trial data from the Digitalis Investigation Group (DIG) on heart failure.

关键词： Biomarker cutpoint clinical trials generalized linear models penalized likelihood ratio test predictive biomarker

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：