检索结果-内蒙古大学图书馆

Online Bayesian Variable Selection and Bayesian Model Averaging for Streaming Data

STAT 2025年第1期14卷

作者： Ghosh, Joyee Tan, Aixin Luo, Lan Univ Iowa Dept Stat & Actuarial Sci Iowa City IA 52242 USA Rutgers State Univ Dept Biostat & Epidemiol New Brunswick NJ USA

There is an increasing prevalence of streaming data generation in diverse fields like healthcare, finance, social media, and weather forecasting. In order to acquire helpful insights from these massive datasets, timely analysis is essential. In this article, we assume that the streaming data are analysed in batches. Traditional offline methods, which involve storing and analysing all individual records, can be repeatedly applied to the cumulative data, but encounter significant challenges in storage and computing costs. Existing online methods offer faster approximations but most methods neglect model uncertainty, causing overconfidence and instability. To bridge this gap, we propose novel online Bayesian approaches that incorporate model uncertainty within a Bayesian model averaging (BMA) framework, for generalized linear models (GLMs). We propose computationally efficient methods to update the posterior, with individual records from the latest batch of data and summary statistics from previous batches. We demonstrate using simulation studies and real data that our methods can offer much faster analysis compared to traditional methods, with no substantial drop in accuracy.

关键词： generalized linear models logistic regression Markov Chain Monte Carlo posterior inclusion probability separation

来源：评论

学校读者我要写书评

暂无评论

Developed first-order approximated estimators for the gamma distributed response variable

引用

COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION 2023年第8期52卷 3919-3938页

作者： Cetinkaya, Merve Kandemir Kaciranlar, Selahattin Kurtoglu, Fikriye Finance Revenue Adm Dept Audit & Compliance Management Minist Treasury TR-06450 Ankara Turkey Cukurova Univ Dept Stat Adana Turkey Natl Def Univ Turkish Naval Acad Dept Basic Sci Istanbul Turkey

generalized linear models (GLM) applications have become very popular in recent years. However, if there is a high degree of relationship between the independent variables, the problem of multicollinearity arises in these models. In this paper, we introduce new first-order approximated (FOA) estimators in the case of gamma distributed response variables in GLMs. Also, the generalization of some estimation methods for ridge and Liu parameters in gamma regression models (GRM) are provided. The superiority of these estimators is assessed by the estimated mean squared error (EMSE) via Monte Carlo simulation study where the response follows a gamma distribution with the log link function. We finally consider a real data application. The proposed estimators are compared and interpreted.

关键词： Gamma regression models generalized linear models First-order approximated estimators Log link Monte Carlo Multicollinearity

来源：评论

学校读者我要写书评

暂无评论

Turning the information-sharing dial: Efficient inference from different data sources

引用

ELECTRONIC JOURNAL OF STATISTICS 2024年第2期18卷 2974-3020页

作者： Hector, Emily C. Martin, Ryan North Carolina State Univ Dept Stat Raleigh NC 27695 USA

A fundamental aspect of statistics is the integration of data from different sources. Classically, Fisher and others were focused on how to integrate homogeneous (or only mildly heterogeneous) sets of data. More recently, as data are becoming more accessible, the question of if data sets from different sources should be integrated is becoming more relevant. The current literature treats this as a question with only two answers: integrate or don't. Here we take a different approach, motivated by information- sharing principles coming from the shrinkage estimation literature. In particular, we deviate from the do/don't perspective and propose a dial parameter that controls the extent to which two data sources are integrated. How far this dial parameter should be turned is shown to depend, for example, on the informativeness of the different data sources as measured by Fisher information. In the context of generalized linear models, this more nuanced data integration framework leads to relatively simple parameter estimates and valid tests/confidence intervals. Moreover, we demonstrate both theoretically and empirically that setting the dial parameter according to our recommendation leads to more efficient estimation compared to other binary data integration schemes.

关键词： Data enrichment generalized linear models Kullback-Leibler divergence transfer learning ridge regression

来源：评论

学校读者我要写书评

暂无评论

deMULTIplex2: robust sample demultiplexing for scRNA-seq

引用

GENOME BIOLOGY 2024年第1期25卷 1页

作者： Zhu, Qin Conrad, Daniel N. Gartner, Zev J. Univ Calif San Francisco Dept Pharmaceut Chem San Francisco CA 94158 USA Chan Zuckerberg Biohub San Francisco CA 94158 USA Univ Calif San Francisco Ctr Cellular Construct San Francisco CA 94158 USA

Sample multiplexing enables pooled analysis during single-cell RNA sequencing workflows, thereby increasing throughput and reducing batch effects. A challenge for all multiplexing techniques is to link sample-specific barcodes with cell-specific barcodes, then demultiplex sample identity post-sequencing. However, existing demultiplexing tools fail under many real-world conditions where barcode cross-contamination is an issue. We therefore developed deMULTIplex2, an algorithm inspired by a mechanistic model of barcode cross-contamination. deMULTIplex2 employs generalized linear models and expectation-maximization to probabilistically determine the sample identity of each cell. Benchmarking reveals superior performance across various experimental conditions, particularly on large or noisy datasets with unbalanced sample compositions.

关键词： scRNA-seq Sample multiplexing Demultiplex generalized linear models Expectation-maximization

来源：评论

学校读者我要写书评

暂无评论

Bootstrap estimation of the proportion of outliers in robust regression

引用

STATISTICS AND COMPUTING 2025年第1期35卷 1-14页

作者： Heng, Qiang Lange, Kenneth Southeast Univ Sch Math Nanjing Peoples R China UCLA Dept Computat Med Los Angeles CA USA UCLA Dept Human Genet & Stat Los Angeles CA USA

This paper presents a nonparametric bootstrap method for estimating the proportions of inliers and outliers in robust regression models. Our approach is based on the concept of stability, providing robustness against distributional assumptions and eliminating the need for pre-specified confidence levels. Through numerical experiments, we demonstrate that this method yields more accurate and stable estimates than existing alternatives. Additionally, the generated instability paths offer a valuable graphical tool for understanding the inlier and outlier distributions within the data. The method naturally extends to generalized linear models, where we find that variance-stabilizing transformations produce residuals that are well-suited for outlier detection. Applications to two real-world datasets further illustrate the practical utility of our approach in identifying outliers.

关键词： Robustness Outliers generalized linear models Stability Statistical depth Variance stabilizing transformations Bootstrap

来源：评论

学校读者我要写书评

暂无评论

A Novel One-Sample Mendelian Randomization Approach for Count-Type Outcomes That Is Robust to Correlated and Uncorrelated Pleiotropic Effects

引用

GENETIC EPIDEMIOLOGY 2025年第1期49卷 e22602页

作者： Liyanage, Janaka S. S. Hankins, Jane S. Estepp, Jeremie H. Srivastava, Deokumar Rashkin, Sara R. Takemoto, Clifford Li, Yun Cui, Yuehua Mori, Motomi Weiss, Mitchell J. Kang, Guolian Wayne State Univ Karmanos Canc Inst Sch Med Biostat CoreDept Oncol Detroit MI USA St Jude Childrens Res Hosp Dept Biostat Memphis TN 38105 USA St Jude Childrens Res Hosp Dept Hematol Memphis TN USA St Jude Childrens Res Hosp Dept Global Pediat Med & Hematol Memphis TN USA Univ North Carolina Chapel Hill Dept Biostat Chapel Hill NC USA Univ North Carolina Chapel Hill Dept Genet Chapel Hill NC USA Univ North Carolina Chapel Hill Dept Comp Sci Chapel Hill NC USA Michigan State Univ Dept Stat & Probabil E Lansing MI USA

We propose two novel one-sample Mendelian randomization (MR) approaches to causal inference from count-type health outcomes, tailored to both equidispersion and overdispersion conditions. Selecting valid single-nucleotide polymorphisms (SNPs) as instrumental variables (IVs) poses a key challenge for MR approaches, as it requires meeting the necessary IV assumptions. To bolster the proposed approaches by addressing violations of IV assumptions, we incorporate a process for removing invalid SNPs that violate the assumptions. In simulations, our proposed approaches demonstrate robustness to the violations, delivering valid estimates, and interpretable type-I errors and statistical power. This increases the practical applicability of the models. We applied the proposed approaches to evaluate the causal effect of fetal hemoglobin (HbF) on the vaso-occlusive crisis and acute chest syndrome (ACS) events in patients with sickle cell disease (SCD) and revealed the causal relation between HbF and ACS events in these patients. We also developed a user-friendly Shiny web application to facilitate researchers' exploration of causal relations.

关键词： count data generalized linear models genetic causal inferences instrumental variables Mendelian randomization

来源：评论

学校读者我要写书评

暂无评论

Graphical tools for model selection in generalized linear models

引用

STATISTICS IN MEDICINE 2013年第25期32卷 4438-4451页

作者： Murray, K. Heritier, S. Mueller, S. Univ Sydney Sch Math & Stat Sydney NSW 2006 Australia Univ Western Australia Ctr Appl Stat M019 Crawley WA 6009 Australia Univ Sydney George Inst Global Hlth Sydney NSW 2050 Australia

Model selection techniques have existed for many years;however, to date, simple, clear and effective methods of visualising the model building process are sparse. This article describes graphical methods that assist in the selection of models and comparison of many different selection criteria. Specifically, we describe for logistic regression, how to visualize measures of description loss and of model complexity to facilitate the model selection dilemma. We advocate the use of the bootstrap to assess the stability of selected models and to enhance our graphical tools. We demonstrate which variables are important using variable inclusion plots and show that these can be invaluable plots for the model building process. We show with two case studies how these proposed tools are useful to learn more about important variables in the data and how these tools can assist the understanding of the model building process. Copyright (c) 2013 John Wiley & Sons, Ltd.

关键词： model selection curves Akaike information criterion graphical methods Bayesian information criterion variable selection model selection generalized linear models

来源：评论

学校读者我要写书评

暂无评论

Gaining insight with recursive partitioning of generalized linear models

引用

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION 2013年第7期83卷 1301-1315页

作者： Rusch, Thomas Zeileis, Achim Vienna Univ Econ & Business WU Inst Stat & Math Vienna Austria Univ Innsbruck Dept Stat A-6020 Innsbruck Austria

Recursive partitioning algorithms separate a feature space into a set of disjoint rectangles. Then, usually, a constant in every partition is fitted. While this is a simple and intuitive approach, it may still lack interpretability as to how a specific relationship between dependent and independent variables may look. Or it may be that a certain model is assumed or of interest and there is a number of candidate variables that may non-linearly give rise to different model parameter values. We present an approach that combines generalized linear models (GLM) with recursive partitioning that offers enhanced interpretability of classical trees as well as providing an explorative way to assess a candidate variable's influence on a parametric model. This method conducts recursive partitioning of a GLM by (1) fitting the model to the data set, (2) testing for parameter instability over a set of partitioning variables, (3) splitting the data set with respect to the variable associated with the highest instability. The outcome is a tree where each terminal node is associated with a GLM. We will show the method's versatility and suitability to gain additional insight into the relationship of dependent and independent variables by two examples, modelling voting behaviour and a failure model for debt amortization, and compare it to alternative approaches.

关键词： model-based recursive partitioning generalized linear models model trees functional trees parameter instability maximum likelihood

来源：评论

学校读者我要写书评

暂无评论

Application of generalized linear models and generalized Estimation Equations to model at-haulback mortality of blue sharks captured in a pelagic longline fishery in the Atlantic Ocean

引用

FISHERIES RESEARCH 2013年 145卷 66-75页

作者： Coelho, Rui Infante, Paulo Santos, Miguel N. Inst Portugues Mare & Atmosfera IPMA IP P-8700305 Olhao Portugal ECT Univ Evora Ctr Invest Matemat & Aplicacoes CIMA UE P-7000671 Evora Portugal ECT Univ Evora Dept Matemat P-7000671 Evora Portugal

At-haulback mortality of blue shark (Prionace glauca) captured by the Portuguese pelagic longline fishery targeting swordfish in the Atlantic was modeled. Data was collected by onboard fishery observers that monitored 762 fishing sets (1 005 486 hooks) and recorded information on 26 383 blue sharks. The sample size distribution ranged from 40 to 305 cm fork length, with 13.3% of the specimens captured dead at-haulback. Data modeling was carried out with generalized linear models (GLM) and generalized Estimation Equations (GEE), given the fishery-dependent source of the data. The explanatory variables influencing blue shark mortality rates were year, specimen size, fishing location, sex, season and branch line material. Model diagnostics and validation were performed with residual analysis, the Hosmer-Lemeshow test, a receiver operating characteristic (ROC) curve, and a 10-fold cross validation procedure. One important conclusion of this study was that blue shark sizes are important predictors for estimating at-haulback mortality rates, with the probabilities of dying at-haulback decreasing with increasing specimen sizes. The effect in terms of odds-ratios are non-linear, with the changing odds-ratios of surviving higher for the smaller sharks (as sharks grow in size) and then stabilizing as sharks reach larger sizes. The models presented in this study seem valid for predicting blue shark at-haulback mortality in this fishery, and can be used by fisheries management organizations for assessing the efficacy of management and conservation initiatives for the species in the future. (C) 2013 Elsevier B.V. All rights reserved.

关键词： generalized Estimation Equations generalized linear models At-haulback mortality Longline fisheries Pelagic sharks

来源：评论

学校读者我要写书评

暂无评论

Outcome Prediction for Heart Failure Telemonitoring Via generalized linear models with Functional Covariates

引用

SCANDINAVIAN JOURNAL OF STATISTICS 2013年第3期40卷 403-416页

作者： Baraldo, Stefano Ieva, Francesca Paganoni, Anna Maria Vitelli, Valeria Politecn Milan Dipartimento Matemat F Brioschi MOX Modeling & Sci Comp I-20133 Milan Italy Ecole Cent Paris Chair Syst Sci & Energet Challenge European Fdn New Energy Elect France Paris France Ecole Super Elect Supelec Paris France

An effective methodology for dealing with data extracted from clinical surveys on heart failure linked to the Public Health Database is proposed. A model for recurrent events is used for modelling the occurrence of hospital readmissions in time, thus deriving a suitable way to compute individual cumulative hazard functions. Estimated cumulative hazard trajectories are then treated as functional data, and they are used as covariates along with clinical survey data within the framework of generalized linear models with functional covariates.

关键词： functional data analysis generalized linear models Public Health Database recurrent events processes

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：