Technology advances have enabled researchers to collect large amounts of data with lots of covariates. Because of the high volume (large n) and high variety (large p) properties, model estimation with such big data ha...
详细信息
Technology advances have enabled researchers to collect large amounts of data with lots of covariates. Because of the high volume (large n) and high variety (large p) properties, model estimation with such big data has posed great challenges for statisticians. In this paper, we focus on the algorithmic aspect of these challenges. We propose a numerical procedure for solving large scale regression estimation problems involving a structured l0-norm penalty function. This numerical procedure blends the ideas of randomization, blockwise coordinate descent algorithms, and a closed-form representation of the proximal operator of the structured l0-norm penalty function. In particular, it adopts an "attention" mechanism that exploits the iteration errors to build a sampling distribution for picking up regression coefficients for updates. Simulation study shows the proposed numerical procedure is competitive when comparing with other algorithms for sparse estimation in terms of runtime and statistical accuracy when both the sample size and the number of covariates become large.
暂无评论