In this paper, we focus on the feature extraction and variable selection of massive data which is divided and stored in different linked computers. Specifically, we study the distributed model selection with the Smoot...
详细信息
In this paper, we focus on the feature extraction and variable selection of massive data which is divided and stored in different linked computers. Specifically, we study the distributed model selection with the Smoothly Clipped Absolute Deviation (SCAD) penalty. Based on the Alternating Direction Method of Multipliers (admm) algorithm, we propose distributed SCAD algorithm and prove its convergence. The results of variable selection of the distributed approach are same with the results of the non-distributed approach. Numerical studies show that our method is both effective and efficient which performs well in distributed data analysis.
Customized personal rate offering is of growing importance in the insurance industry. To achieve this, an important step is to identify subgroups of insureds from the corresponding heterogeneous claim frequency data. ...
详细信息
Customized personal rate offering is of growing importance in the insurance industry. To achieve this, an important step is to identify subgroups of insureds from the corresponding heterogeneous claim frequency data. In this paper, a penalized Poisson regression approach for subgroup analysis in claim frequency data is proposed. Subjects are assumed to follow a zero-inflated Poisson regression model with group-specific intercepts, which capture group characteristics of claim frequency. A penalized likelihood function is derived and optimized to identify the group-specific intercepts and effects of individual covariates. To handle the challenges arising from the optimization of the penalized likelihood function, an alternating direction method of multipliers algorithm is developed and its convergence is established. Simulation studies and real applications are provided for illustrations. (C) 2019 Elsevier B.V. All rights reserved.
In this article, we study time-varying graphical models based on data measured over a temporal grid. Such models are motivated by the needs to describe and understand evolving interacting relationships among a set of ...
详细信息
In this article, we study time-varying graphical models based on data measured over a temporal grid. Such models are motivated by the needs to describe and understand evolving interacting relationships among a set of random variables in many real applications, for instance, the study of how stock prices interact with each other and how such interactions change over time. We propose a new model, LOcal Group Graphical Lasso Estimation (loggle), under the assumption that the graph topology changes gradually over time. Specifically, loggle uses a novel local group-lasso type penalty to efficiently incorporate information from neighboring time points and to impose structural smoothness of the graphs. We implement an admm-based algorithm to fit the loggle model. This algorithm utilizes blockwise fast computation and pseudo-likelihood approximation to improve computational efficiency. An R package loggle has also been developed and is available at . We evaluate the performance of loggle by simulation experiments. We also apply loggle to S&P 500 stock price data and demonstrate that loggle is able to reveal the interacting relationships among stock prices and among industrial sectors in a time period that covers the recent global financial crisis. The supplemental materials for this article are available online.
This paper explores a novel high-dimensional sparse multiplicative model, which deal with data with positive responses, particularly in economical and biomedical researches. The proposed regularized method is conducte...
详细信息
This paper explores a novel high-dimensional sparse multiplicative model, which deal with data with positive responses, particularly in economical and biomedical researches. The proposed regularized method is conducted on the least product relative error (LPRE), and can be applied on various penalties including adaptive Lasso, SCAD, and MCP. An adjusted admm algorithm is adopted to obtain the estimators based on LPRE loss. Additionally, we prove the consistency and compute the convergence rates of the estimator. To validate the effectiveness of the proposed method, we conduct extensive numerical studies and real data analysis, yielding valuable insights and practical applications, utilizing well-known datasets of the Boston housing data and gold price data.
In this paper, we propose a new semiparametric method to simultaneously select important variables, identify the model structure and estimate covariate effects in the additive AFT model, for which the dimension of cov...
详细信息
In this paper, we propose a new semiparametric method to simultaneously select important variables, identify the model structure and estimate covariate effects in the additive AFT model, for which the dimension of covariates is allowed to increase with sample size. Instead of directly approximating the non-parametric effects as in most existing studies, we take a linear effect out to weak the condition required for model identifiability. To compute the proposed estimates numerically, we use an alternating direction method of multipliers algorithm so that it can be implemented easily and achieve fast convergence rate. Our method is proved to be selection consistent and possess an asymptotic oracle property. The performance of the proposed methods is illustrated through simulations and the real data analysis.
The emerging field of precision medicine is transforming statistical analysis from the classical paradigm of population-average treatment effects into that of personal treatment effects. This new scientific mission ha...
详细信息
The emerging field of precision medicine is transforming statistical analysis from the classical paradigm of population-average treatment effects into that of personal treatment effects. This new scientific mission has called for adequate statistical methods to assess heterogeneous covariate effects in regression analysis. This paper focuses on a subgroup analysis that consists of two primary analytic tasks: identification of treatment effect subgroups and individual group memberships, and statistical inference on treatment effects by subgroup. We propose an approach to synergizing supervised clustering analysis via alternating direction method of multipliers (admm) algorithm and statistical inference on subgroup effects via expectation-maximization (EM) algorithm. Our proposed procedure, termed as hybrid operation for subgroup analysis (HOSA), enjoys computational speed and numerical stability with interpretability and reproducibility. We establish key theoretical properties for both proposed clustering and inference procedures. Numerical illustration includes extensive simulation studies and analyses of motivating data from two randomized clinical trials to learn subgroup treatment effects.
In this article, a new method is proposed for clustering longitudinal curves. In the proposed method, clusters of mean functions are identified through a weighted concave pairwise fusion method. The EM algorithm and t...
详细信息
In this article, a new method is proposed for clustering longitudinal curves. In the proposed method, clusters of mean functions are identified through a weighted concave pairwise fusion method. The EM algorithm and the alternating direction method of multipliers algorithm are combined to estimate the group structure, mean functions and principal components simultaneously. The proposed method also allows to incorporate the prior neighborhood information to have more meaningful groups by adding pairwise weights in the pairwise penalties. In the simulation study, the performance of the proposed method is compared to some existing clustering methods in terms of the accuracy for estimating the number of subgroups and mean functions. The results suggest that ignoring the covariance structure will have a great effect on the performance of estimating the number of groups and estimating accuracy. The effect of including pairwise weights is also explored in a spatial lattice setting to take into consideration of the spatial information. The results show that incorporating spatial weights will improve the performance. A real example is used to illustrate the proposed method.
The large-scale blackouts after natural disasters have attracted much more concern these days. When a blackout occurs in a power transmission system, the whole system will be decomposed into several sub-systems and ta...
详细信息
The large-scale blackouts after natural disasters have attracted much more concern these days. When a blackout occurs in a power transmission system, the whole system will be decomposed into several sub-systems and take much more time to recover. As the capacity of renewable energy continues to increase, the difficulties of the black start and restoration for a power transmission system also keep intensifying. To this end, this paper proposes a power transmission system partitioned black-start and restoration method while applying renewable energy sources to support and improve the recovery process. We first establish the power system black-start and restoration model and linearize the power flow constraints and unit generation constraints. Considering the power system can be decomposed into several sub-systems with high-capacity renewable energy sources, we propose a system partitioning method and a two-level parallel restoration model. The model realizes the independent restoration of each sub-system and is solved by the alternating direction multipliers method. Case studies on a modified New England 39-bus power system and a practical power system verify that the proposed method can offer high feasibility and effectiveness for power transmission systems with high-penetrated renewable energy sources when facing a large-scale blackout.
Hyperspectral images contain a huge amount of spatial and spectral information so that, almost any type of Earth feature can be discriminated from any other feature. But, for this classification to be possible, it is ...
详细信息
Hyperspectral images contain a huge amount of spatial and spectral information so that, almost any type of Earth feature can be discriminated from any other feature. But, for this classification to be possible, it is to be ensured that there is as less noise as possible in the captured data. Unfortunately, noise is unavoidable in nature and most hyperspectral images need denoising before they can be processed for classification work. In this paper, we are presenting a new approach for denoising hyperspectral images based on Least Square Regularization. Then, the hyperspectral data is classified using Basis Pursuit classifier, a constrained L1 minimization problem. To improve the time requirement for classification, Alternating Direction Method of Multipliers (admm) solver is used instead of CVX (convex optimization) solver. The method proposed is compared with other existing denoising methods such as Legendre-Fenchel (LF), Wavelet thresholding and Total Variation (TV). It is observed that the proposed Least Square (LS) denoising method improves classification accuracy much better than other existing denoising techniques. Even with fewer training sets, the proposed denoising technique yields better classification accuracy, thus proving least square denoising to be a powerful denoising technique.
In this paper, we propose a parallel algorithm for a fund of fund (FOF) optimization model. Based on the structure of objective function, we create an augmented Lagrangian function and separate the quadratic term from...
详细信息
In this paper, we propose a parallel algorithm for a fund of fund (FOF) optimization model. Based on the structure of objective function, we create an augmented Lagrangian function and separate the quadratic term from the nonlinear term by the alternate direction multiplier method (admm), which creates two new subproblems that are much easier to be computed. To accelerate the convergence speed of the proposed algorithm, we use an adaptive step size method to adjust the step parameter according to the residual of the dual problem at every iterate. We show the parallelization of the proposed algorithm and implement it on CUDA with block storage for the structured matrix, which is shown to be up to two orders of magnitude faster than the CPU implementation on large-scale problems.
暂无评论