The penalized-likelihood-based variable selection methods rely heavily on fixed thresholding functions to carry out static variable selection, and as a result, weak-significant variables (i.e. variables that are deeme...
详细信息
The penalized-likelihood-based variable selection methods rely heavily on fixed thresholding functions to carry out static variable selection, and as a result, weak-significant variables (i.e. variables that are deemed important, but whose regression coefficients are small in absolute values) are often kicked out completely. In addition, the tuning parameters of these methods are usually selected by cross-validation (CV), which only use the average information of partial data. In this article, based on an mm algorithm, we propose a dynamic threshold function for variable selection, which use the information of the complete dataset and can retain important variables with weak signals. The methodology is applied to panel data with random effects, and a two-step estimation procedure is proposed. We show that the new majorizing function has the same convergence property as the original one, and the performance of the two functions is compared numerically. Numerical studies show that when error distributions are heavy-tailed or skewed, our methods work better than existing variable selection techniques, especially in keeping important variables with weak signals.
Biclustering is an important tool in exploratory statistical analysis which can be used to detect latent row and column groups of different response patterns. However, few studies include covariate data directly into ...
详细信息
Biclustering is an important tool in exploratory statistical analysis which can be used to detect latent row and column groups of different response patterns. However, few studies include covariate data directly into their biclustering models to explain these variations. A novel biclustering framework that considers both stochastic block structures and covariate effects is proposed to address this modeling problem. Fast approximation estimation algorithms are also developed to deal with a large number of latent variables and covariate coefficients. These algorithms are derived from the variational generalized expectation-maximization (EM) framework where the goal is to increase, rather than maximize, the likelihood lower bound in both E and M steps. The utility of the proposed biclustering framework is demonstrated through two block modeling applications in model-based collaborative filtering and microarray analysis. (C) 2015 Elsevier B.V. All rights reserved.
A class of random graph models is considered, combining features of exponential-family models and latent structure models, with the goal of retaining the strengths of both of them while reducing the weaknesses of each...
详细信息
A class of random graph models is considered, combining features of exponential-family models and latent structure models, with the goal of retaining the strengths of both of them while reducing the weaknesses of each of them. An open problem is how to estimate such models from large networks. A novel approach to large-scale estimation is proposed, taking advantage of the local structure of such models for the purpose of local computing. The main idea is that random graphs with local dependence can be decomposed into subgraphs, which enables parallel computing on subgraphs and suggests a two-step estimation approach. The first step estimates the local structure underlying random graphs. The second step estimates parameters given the estimated local structure of random graphs. Both steps can be implemented in parallel, which enables large-scale estimation. The advantages of the two-step estimation approach are demonstrated by simulation studies with up to 10,000 nodes and an application to a large Amazon product recommendation network with more than 10,000 products. (C) 2020 Elsevier B.V. All rights reserved.
We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent ...
详细信息
We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger data sets than those seen elsewhere in the literature. The more flexible framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms are based on variational generalized EM algorithms, where the E-steps are augmented by a minorization-maximization (mm) idea. The bootstrapped standard error estimates are based on an efficient Monte Carlo network simulation idea. Last, we demonstrate the usefulness of the model-based clustering framework by applying it to a discrete-valued network with more than 131,000 nodes and 17 billion edge variables.
Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear app...
详细信息
Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The recharacterization via generalized Bregman functions enables us to construct suitable error measures and establish global convergence rates for nonconvex and nonsmooth objectives in possibly high dimensions. For sparse learning problems with a composite objective, under some regularity conditions, the obtained estimators as the surrogate's fixed points, though not necessarily local minimizers, enjoy provable statistical guarantees, and the sequence of iterates can be shown to approach the statistical truth within the desired accuracy geometrically fast. The paper also studies how to design adaptive momentum based accelerations without assuming convexity or smoothness by carefully controlling stepsize and relaxation parameters.
Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud c...
详细信息
Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere-from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and Li-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC Li-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.
Paired binary data often appear in studies of subjects with two sites such as eyes, ears, lungs, kidneys, feet and so on. Three popular models [i.e., (Rosner in Biometrics 38:105-114, 1982) R model, (Dallal in Biometr...
详细信息
Paired binary data often appear in studies of subjects with two sites such as eyes, ears, lungs, kidneys, feet and so on. Three popular models [i.e., (Rosner in Biometrics 38:105-114, 1982) R model, (Dallal in Biometrics 44:253-257, 1988) model and (Donner in Biometrics 45:605-661, 1989) model] were proposed to fit such twin data by considering the intra-person correlation. However, Rosner's R model can only fit the twin data with an increasing correlation coefficient, Dallal's model may incur the problem of over-fitting, while Donner's model can only fit the twin data with a constant correlation. This paper aims to propose a new bivariate Bernoulli model with flexible beta kernel correlation (denoted by Bernoulli(2)(bk)) for fitting the paired binary data with a wide range of group-specific disease probabilities. The correlation coefficient of the Bernoulli(2)(bk)model could be increasing, or decreasing, or unimodal, or convex with respect to the disease probability of one eye. To obtain the maximum likelihood estimates (MLEs) of parameters, we develop a series of minorization-maximization (mm) algorithms by constructing four surrogate functions with closed-form expressions at each iteration of the mm algorithms. Simulation studies are conducted, and two real datasets are analyzed to illustrate the proposed model and methods.
The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the...
详细信息
The conditional independence assumption for nonparametric multivariate finite mixture models, a weaker form of the well-known conditional independence assumption for random effects models for longitudinal data, is the subject of an increasing number of theoretical and algorithmic developments in the statistical literature. After presenting a survey of this literature, including an in-depth discussion of the all-important identifiability results, this article describes and extends an algorithm for estimation of the parameters in these models. The algorithm works for any number of components in three or more dimensions. It possesses a descent property and can be easily adapted to situations where the data are grouped in blocks of conditionally independent variables. We discuss how to adapt this algorithm to various location-scale models that link component densities, and we even adapt it to a particular class of univariate mixture problems in which the components are assumed symmetric. We give a bandwidth selection procedure for our algorithm. Finally, we demonstrate the effectiveness of our algorithm using a simulation study and two psychometric datasets.
暂无评论