检索结果-内蒙古大学图书馆

A partitioned quasi-likelihood for distributed statistical inference

COMPUTATIONAL STATISTICS 2020年第4期35卷 1577-1596页

作者： Guo, Guangbao Sun, Yue Jiang, Xuejun Shandong Univ Technol Dept Stat Zibo 255000 Peoples R China Southern Univ Sci & Technol Dept Math Shenzhen 518000 Peoples R China

In the big data setting, working data sets are often distributed on multiple machines. However, classical statistical methods are often developed to solve the problems of single estimation or inference. We employ a novel parallel quasi-likelihood method in generalized linear models, to make the variances between different sub-estimators relatively similar. Estimates are obtained from projection subsets of data and later combined by suitably-chosen unknown weights. We also show the proposed method to produce better asymptotic efficiency than using the simple average. Furthermore, simulation examples show that the proposed method can significantly improve statistical inference.

关键词： distributed statistical inference Parallel computing Quasi-likelihood Projection matrix distributed data

来源：评论

学校读者我要写书评

暂无评论

INCORPORATING AUXILIARY INFORMATION FOR IMPROVED statistical inference AND ITS EXTENSIONS TO distributed ALGORITHMS WITH AN APPLICATION TO PERSONAL CREDIT

引用

ANNALS OF APPLIED STATISTICS 2024年第4期18卷 2863-2886页

作者： Yu, Miaomiao Jiang, Zhongfeng Li, Jiaxuan Zhou, Yong East China Normal Univ Sch Stat Key Lab Adv Theory & Applicat Stat & Data Sci MOE Shanghai Peoples R China East China Normal Univ Acad Stat & Interdisciplinary Sci Shanghai Peoples R China Chinese Acad Sci Acad Math & Syst Sci Beijing Peoples R China Chengdu 7 High Sch Chengdu Peoples R China

Personal credits have always been a hot topic in the society. Among all of them, the evaluation of default risk is particularly concerned since robust estimation, based on personal information, can both help needy individuals to get loans and financial institutions to avoid losses. So far, there have been no good solutions due to limited data, especially default information. With the advent of the era of big data, it is possible to improve the effectiveness of estimates by using auxiliary information from external studies or public domains. However, the individual-level data can not be gained directly because of the emphasis on data privacy;that is, only some summarized statistics with auxiliary information are allowed to be shared. To effectively utilize external integrated auxiliary information to improve the accuracy of default risk estimation, this paper introduces a unified auxiliary information framework, which is referred as enhanced GEE method, to effectively incorporate various external summary results by employing the generalized estimating equations (GEE) approach and augmenting a weighted logarithm of confidence density on GEE function. We establish asymptotic properties for the new method and prove that it can achieve the gain of statistical efficiency compared to the study-specific estimator without any auxiliary information. Besides, a low-cost Map-Reduce procedure for the distributed statistical inference of enhanced GEE method in big data is developed that can achieve the same efficiency as the oracle enhanced GEE approach under mild condition. This method is demonstrated by an application to predict the loan default risk of bank customers in Shanghai and shown to be more effective and reliable compared with the method based on the own data only. Furthermore, the superiorities of our approach, especially the construction of the tighter confidence intervals, are also illustrated with extensive simulation studies and a real personal default risk case.

关键词： External auxiliary information individual-level data generalized estimating equa- tions confidence density distributed statistical inference

来源：评论

学校读者我要写书评

暂无评论

CEDAR: communication efficient distributed analysis for regressions

引用

BIOMETRICS 2023年第3期79卷 2357-2369页

作者： Chang, Changgee Bu, Zhiqi Long, Qi Univ Penn Dept Biostat Epidemiol & Informat Perelman Sch Med Philadelphia PA 19104 USA

Electronic health records (EHRs) offer great promises for advancing precision medicine and, at the same time, present significant analytical challenges. Particularly, it is often the case that patient-level data in EHRs cannot be shared across institutions (data sources) due to government regulations and/or institutional policies. As a result, there are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data. To tackle such challenges, we propose a novel communication efficient method that aggregates the optimal estimates of external sites, by turning the problem into a missing data problem. In addition, we propose incorporating posterior samples of remote sites, which can provide partial information on the missing quantities and improve efficiency of parameter estimates while having the differential privacy property and thus reducing the risk of information leaking. The proposed approach, without sharing the raw patient level data, allows for proper statistical inference. We provide theoretical investigation for the asymptotic properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses in comparison with several recently developed methods.

关键词： communication efficient differential privacy distributed learning distributed statistical inference

来源：评论

学校读者我要写书评

暂无评论

distributed empirical likelihood inference with privacy guarantees

引用

STATISTICS AND COMPUTING 2025年第4期35卷 1-27页

作者： Liu, Qianqian Li, Zhouping Lanzhou Univ Ctr Data Sci Sch Math & Stat Lanzhou 730000 Peoples R China

With the rapid advancement in information technology, data analysis has become increasingly vital in various fields. Balancing the utility of data while protecting individual privacy has become a hot topic for both academic research and practical applications. As a technology that can provide strict privacy guarantees, differential privacy has attracted widespread attention in recent years. In this paper, we study statistical inference for differentially private data based on empirical likelihood. Specifically, we develop two novel privacy-preserving-based statistical inference methods, including differentially private distributed empirical likelihood and balanced augmented differentially private distributed empirical likelihood. Under some mild conditions, the asymptotic properties of the proposed methods are derived. We also illustrate the finite sample performance of the proposed approaches via simulation studies and real data analysis.

关键词： Data privacy Differential privacy distributed statistical inference Empirical likelihood.

来源：评论

学校读者我要写书评

暂无评论

distributed sparse normal means estimation with sublinear communication

引用

INFORMATION AND inference-A JOURNAL OF THE IMA 2022年第3期11卷 1109-1142页

作者： Amiraz, Chen Krauthgamer, Robert Nadler, Boaz Weizmann Inst Sci Dept Comp Sci Herzl St 234 IL-7630031 Rehovot Israel

We consider the problem of sparse normal means estimation in a distributed setting with communication constraints. We assume there are M machines, each holding d-dimensional observations of a K-sparse vector mu corrupted by additive Gaussian noise. The M machines are connected in a star topology to a fusion center, whose goal is to estimate the vector mu with a low communication budget. Previous works have shown that to achieve the centralized minimax rate for the l(2) risk, the total communication must be high-at least linear in the dimension d. This phenomenon occurs, however, at very weak signals. We show that at signal-to-noise ratios (SNRs) that are sufficiently high-but not enough for recovery by any individual machine-the support of mu can be correctly recovered with significantly less communication. Specifically, we present two algorithms for distributed estimation of a sparse mean vector corrupted by either Gaussian or sub-Gaussian noise. We then prove that above certain SNR thresholds, with high probability, these algorithms recover the correct support with total communication that is sublinear in the dimension d. Furthermore, the communication decreases exponentially as a function of signal strength. If in addition KM << d/logd then with an additional round of sublinear communication, our algorithms achieve the centralized rate for the l(2) risk. Finally, we present simulations that illustrate the performance of our algorithms in different parameter regimes.

关键词： distributed statistical inference sparse normal mean estimation sublinear communication support recovery

来源：评论

学校读者我要写书评

暂无评论

distributed Estimator of Market Beta under Extreme Conditions

引用

Journal of Applied Mathematics and Physics 2023年第11期11卷 3676-3701页

作者： Suyu Zhu School of Mathematics and Statistics Southwest University Chongqing China

Market beta is a measure of the volatility or systematic risk of a security or portfolio compared to the market as a whole. This paper considers the distributed estimation of market beta in the case of massive data, and obtains the consistency and asymptotic normality of the estimator. Further, simulations show the finite sample properties of this estimator.

关键词： Heavy Tail Tail Dependence distributed statistical inference Market Beta

来源：评论

学校读者我要写书评

暂无评论

Econometrics with Privacy Preservation

引用

OPERATIONS RESEARCH 2019年第4期67卷 905-926页

作者： Cai, Ning Kou, Steven Hong Kong Univ Sci & Technol Dept Ind Engn & Decis Analyt Hong Kong Peoples R China Boston Univ Questrom Sch Business Dept Finance Boston MA 02215 USA

Many data are sensitive in areas such as finance, economics, and other social sciences. We propose an ER (encryption and recovery) algorithm that allows a central administration to do statistical inference based on the encrypted data, while still preserving each party's privacy even for a colluding majority in the presence of cyber attack. We demonstrate the applications of our algorithm to linear regression, logistic regression, maximum likelihood estimation, the method of moments, and estimation of empirical distributions. Moreover, our algorithm can help to address another practically significant issue-privacy preservation for distributed statistical inference when data are allocated to different parties who are unwilling to share their own data with others. Finally, we provide two extensions of the applications of our algorithm, including the combination of our algorithm and Fourier transforms and the development of a modified root-finding method for recovering quantiles with privacy preservation.

关键词： ER algorithm invariant equidistribution privacy preservation distributed statistical inference Fourier transforms quantile functions

来源：评论

学校读者我要写书评

暂无评论

Aggregated inference

引用

WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS 2019年第1期11卷 e1451-e1451页

作者： Huo, Xiaoming Cao, Shanshan Georgia Inst Technol Sch Ind & Syst Engn 755 Ferst Dr Atlanta GA 30332 USA

Aggregated inference on distributed data becomes more and more important due to the larger size of data collected in different industries. Modeling and inference are needed in the case where data cannot be obtained at a central location;aggregated statistical inference is a major tool to solve the aforementioned problems. In the literature, problems under the setting of regression model (more generally, M-estimator) are extensively studied. There are at least two popular techniques for distributed estimation: (a) averaging estimators from local locations and (b) the one-step approach, which combines the simple averaging estimator with a classical Newton's method (using the local Hessian matrices) to generate a "one-step" estimator. It is proved that under certain assumptions, the above constructed estimators enjoy the same asymptotic properties as the centralized estimator, which is obtained as if all data were available at a central location. We review the aforementioned two major estimations. It can be seen that, in Big-Data problems, dividing the data to multiple machines and then using the aggregation technique to solve the estimation problem in parallel can speed up the computation with little compromise of the quality of the estimators. We discuss potential extensions to other models, such as support vector machine, principle component analysis, and so on. Numerical examples are omitted due to the space limitation;they can be easily found in the literature. This article is categorized under: statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods statistical Models > Fitting Models statistical and Graphical Methods of Data Analysis > Modeling Methods and Algorithms

关键词： aggregated inference averaging estimator distributed statistical inference M-estimation one-step estimator

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：