检索结果-内蒙古大学图书馆

Dynamic weak-significant variable selection of error component model via robust mm algorithm

COmmUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION 2025年

作者： Xu, Lin Gao, Xinyi Xiang, Sijia Zhejiang Univ Finance & Econ Dept Stat Hangzhou Zhejiang Peoples R China Westlake Univ Intelligent Technol Res Ctr Hangzhou Zhejiang Peoples R China

The penalized-likelihood-based variable selection methods rely heavily on fixed thresholding functions to carry out static variable selection, and as a result, weak-significant variables (i.e. variables that are deemed important, but whose regression coefficients are small in absolute values) are often kicked out completely. In addition, the tuning parameters of these methods are usually selected by cross-validation (CV), which only use the average information of partial data. In this article, based on an mm algorithm, we propose a dynamic threshold function for variable selection, which use the information of the complete dataset and can retain important variables with weak signals. The methodology is applied to panel data with random effects, and a two-step estimation procedure is proposed. We show that the new majorizing function has the same convergence property as the original one, and the performance of the two functions is compared numerically. Numerical studies show that when error distributions are heavy-tailed or skewed, our methods work better than existing variable selection techniques, especially in keeping important variables with weak signals.

关键词： Dynamic thresholding Error component model Minorizing functions mm algorithms Quantile regressions Weak-significant variables selection

来源：评论

学校读者我要写书评

暂无评论

Variational algorithms for biclustering models

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2015年 89卷 12-24页

作者： Duy Vu Aitkin, Murray Univ Melbourne Dept Math & Stat Melbourne Vic 3010 Australia

Biclustering is an important tool in exploratory statistical analysis which can be used to detect latent row and column groups of different response patterns. However, few studies include covariate data directly into their biclustering models to explain these variations. A novel biclustering framework that considers both stochastic block structures and covariate effects is proposed to address this modeling problem. Fast approximation estimation algorithms are also developed to deal with a large number of latent variables and covariate coefficients. These algorithms are derived from the variational generalized expectation-maximization (EM) framework where the goal is to increase, rather than maximize, the likelihood lower bound in both E and M steps. The utility of the proposed biclustering framework is demonstrated through two block modeling applications in model-based collaborative filtering and microarray analysis. (C) 2015 Elsevier B.V. All rights reserved.

关键词： Biclustering Stochastic block models EM algorithms Generalized EM algorithms Variational EM algorithms mm algorithms

来源：评论

学校读者我要写书评

暂无评论

Large-scale estimation of random graph models with local dependence

引用

COMPUTATIONAL STATISTICS & DATA ANALYSIS 2020年 152卷 107029-107029页

作者： Babkin, Sergii Stewart, Jonathan R. Long, Xiaochen Schweinberger, Michael Microsoft Redmond WA USA Florida State Univ Dept Stat Tallahassee FL 32306 USA Rice Univ Dept Stat 6100 Main St Houston TX 77005 USA

A class of random graph models is considered, combining features of exponential-family models and latent structure models, with the goal of retaining the strengths of both of them while reducing the weaknesses of each of them. An open problem is how to estimate such models from large networks. A novel approach to large-scale estimation is proposed, taking advantage of the local structure of such models for the purpose of local computing. The main idea is that random graphs with local dependence can be decomposed into subgraphs, which enables parallel computing on subgraphs and suggests a two-step estimation approach. The first step estimates the local structure underlying random graphs. The second step estimates parameters given the estimated local structure of random graphs. Both steps can be implemented in parallel, which enables large-scale estimation. The advantages of the two-step estimation approach are demonstrated by simulation studies with up to 10,000 nodes and an application to a large Amazon product recommendation network with more than 10,000 products. (C) 2020 Elsevier B.V. All rights reserved.

关键词： Exponential random graph models Latent structure models Stochastic block models Variational methods EM algorithms mm algorithms

来源：评论

学校读者我要写书评

暂无评论

MODEL-BASED CLUSTERING OF LARGE NETWORKS

引用

ANNALS OF APPLIED STATISTICS 2013年第2期7卷 1010-1039页

作者： Vu, Duy Q. Hunter, David R. Schweinberger, Michael Univ Melbourne Dept Math & Stat Melbourne Vic 3010 Australia Penn State Univ Dept Stat University Pk PA 16802 USA Rice Univ Dept Stat Houston TX 77251 USA

We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger data sets than those seen elsewhere in the literature. The more flexible framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms are based on variational generalized EM algorithms, where the E-steps are augmented by a minorization-maximization (mm) idea. The bootstrapped standard error estimates are based on an efficient Monte Carlo network simulation idea. Last, we demonstrate the usefulness of the model-based clustering framework by applying it to a discrete-valued network with more than 131,000 nodes and 17 billion edge variables.

关键词： Social networks stochastic block models finite mixture models EM algorithms generalized EM algorithms variational EM algorithms mm algorithms

来源：评论

学校读者我要写书评

暂无评论

ANALYSIS OF GENERALIZED BREGMAN SURROGATE algorithms FOR NONSMOOTH NONCONVEX STATISTICAL LEARNING

引用

ANNALS OF STATISTICS 2021年第6期49卷 3434-3459页

作者： She, Yiyuan Wang, Zhifeng Jin, Jiuwu Florida State Univ Dept Stat Tallahassee FL 32306 USA

Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The recharacterization via generalized Bregman functions enables us to construct suitable error measures and establish global convergence rates for nonconvex and nonsmooth objectives in possibly high dimensions. For sparse learning problems with a composite objective, under some regularity conditions, the obtained estimators as the surrogate's fixed points, though not necessarily local minimizers, enjoy provable statistical guarantees, and the sequence of iterates can be shown to approach the statistical truth within the desired accuracy geometrically fast. The paper also studies how to design adaptive momentum based accelerations without assuming convexity or smoothness by carefully controlling stepsize and relaxation parameters.

关键词： Nonconvex optimization nonsmooth optimization mm algorithms Bregman divergence statistical algorithmic analysis momentum-based acceleration

来源：评论

学校读者我要写书评

暂无评论

High-Performance Statistical Computing in the Computing Environments of the 2020s

引用

STATISTICAL SCIENCE 2022年第4期37卷 494-518页

作者： Ko, Seyoon Zhou, Hua Zhou, Jin J. Won, Joong-Ho UCLA Dept Biostat Fielding Sch Publ Hlth Los Angeles CA 90095 USA UCLA David Geffen Sch Med Dept Med Los Angeles CA 90095 USA Univ Arizona Mel & Enid Zuckerman Coll Publ Hlth Dept Epidemiol & Biostat Tucson AZ 85724 USA Seoul Natl Univ Dept Stat Seoul South Korea

Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere-from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and Li-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC Li-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.

关键词： High-performance statistical computing graphics processing units (GPUs) cloud computing deep learning mm algorithms ADmm PDHG Cox regression

来源：评论

学校读者我要写书评

暂无评论

Modeling paired binary data by a new bivariate Bernoulli model with flexible beta kernel correlation

引用

TEST 2024年第4期33卷 1180-1224页

作者： Li, Xun-Jian Li, Shuang Tian, Guo-Liang Shi, Jianhua Southern Univ Sci & Technol Dept Stat & Data Sci Shenzhen 518055 Guangdong Peoples R China Hong Kong Polytech Univ Dept Appl Math Kowloon Hong Kong 100872 Peoples R China Dongguan Univ Technol Dept Math Dongguan 523808 Guangdong Peoples R China Minnan Normal Univ Sch Math & Stat Zhangzhou 363000 Fujian Peoples R China

Paired binary data often appear in studies of subjects with two sites such as eyes, ears, lungs, kidneys, feet and so on. Three popular models [i.e., (Rosner in Biometrics 38:105-114, 1982) R model, (Dallal in Biometrics 44:253-257, 1988) model and (Donner in Biometrics 45:605-661, 1989) model] were proposed to fit such twin data by considering the intra-person correlation. However, Rosner's R model can only fit the twin data with an increasing correlation coefficient, Dallal's model may incur the problem of over-fitting, while Donner's model can only fit the twin data with a constant correlation. This paper aims to propose a new bivariate Bernoulli model with flexible beta kernel correlation (denoted by Bernoulli(2)(bk)) for fitting the paired binary data with a wide range of group-specific disease probabilities. The correlation coefficient of the Bernoulli(2)(bk)model could be increasing, or decreasing, or unimodal, or convex with respect to the disease probability of one eye. To obtain the maximum likelihood estimates (MLEs) of parameters, we develop a series of minorization-maximization (mm) algorithms by constructing four surrogate functions with closed-form expressions at each iteration of the mm algorithms. Simulation studies are conducted, and two real datasets are analyzed to illustrate the proposed model and methods.

关键词： Beta kernel correlation Bivariate Bernoulli model mm algorithms Paired binary data Rosner's R model

来源：评论

学校读者我要写书评

暂无评论

Semi-parametric estimation for conditional independence multivariate finite mixture models

引用

STATISTICS SURVEYS 2015年第none期9卷 1-31页

作者： Chauveau, Didier Hunter, David R. Levine, Michael Univ Orleans CNRS MAPMO UMR 7349 Orleans France Penn State Univ Dept Stat University Pk PA 16802 USA Purdue Univ Dept Stat W Lafayette IN 47907 USA

The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.

关键词： Kernel density estimation mm algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：