Estimating individualized treatment rules-particularly in the context of right-censored outcomes-is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While th...
详细信息
Estimating individualized treatment rules-particularly in the context of right-censored outcomes-is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom's Clinical Practice Research Datalink.
Precision medicine is a rapidly expanding area of health research wherein patient level information is used to inform treatment decisions. A statistical framework helps to formalize the individualization of treatment ...
详细信息
Precision medicine is a rapidly expanding area of health research wherein patient level information is used to inform treatment decisions. A statistical framework helps to formalize the individualization of treatment decisions that characterize personalized management plans. Numerous methods have been proposed to estimate individualized treatment rules that optimize expected patient outcomes, many of which have desirable properties such as robustness to model misspecification. However, while individual data are essential in this context, there may be concerns about data confidentiality, particularly in multi-center studies where data are shared externally. To address this issue, we compared two approaches to privacy preservation: (i) data pooling, which is a covariate microaggregation technique and (ii) distributed regression. These approaches were combined with the doubly robust yet user-friendly method of dynamic weighted ordinary least squares to estimate individualized treatment rules. In simulations, we extensively evaluated the performance of the methods in estimating the parameters of the decision rule under different assumptions. The results demonstrate that double robustness is not maintained in data pooling setting and that this can result in bias, whereas the distributed regression provides good performance. We illustrate the methods via an analysis of optimal Warfarin dosing using data from the International Warfarin Consortium.
We consider the problem of function estimation by a multi-agent system consisting of two agents and a fusion center. Each agent receives data comprising of samples of an independent variable (input) and the correspond...
详细信息
We consider the problem of function estimation by a multi-agent system consisting of two agents and a fusion center. Each agent receives data comprising of samples of an independent variable (input) and the corresponding values of the dependent variable (output). The data remains local and is not shared with other members in the system. The objective of the system is to collaboratively estimate the function from the input to the output. To this end, we present an iterative distributed algorithm for this function estimation problem. Each agent solves a local estimation problem in a Reproducing Kernel Hilbert Space (RKHS) and uploads the function to the fusion center. At the fusion center, the functions are fused by first estimating the data points that would have generated the uploaded functions and then subsequently solving a least squares estimation problem using the estimated data from both functions. The fused function is downloaded by the agents and is subsequently used for estimation at the next iteration along with incoming data. This procedure is executed sequentially and stopped when the difference between consecutively estimated functions becomes small enough. With respect to the algorithm, we prove existence of basis functions for suitable representation of estimated functions and present closed form solutions to the estimation problems at the agents and the fusion center.
Ensemble methods achieve state-of-the-art performance in many real-world regression problems while enjoying structural compatibility for modern decentralized computing architectures. However, the implementation of ens...
详细信息
ISBN:
(数字)9781665485470
ISBN:
(纸本)9781665485470
Ensemble methods achieve state-of-the-art performance in many real-world regression problems while enjoying structural compatibility for modern decentralized computing architectures. However, the implementation of ensemble regression on distributed systems may compromise its cutting-edge performance due to computing and communication reliability issues. This paper introduces robust ensemble combining techniques designed to integrate multiple noisy predictions into a single reliable prediction. Experiments conducted with synthetic and real-world datasets in various noise regimes illustrate our robust methods' superiority over non-robust counterparts.
Due to the ubiquitous existence of large-scale data in today's real-world applications, including learning on cross-media data, we propose a semi-supervised learning method, named Multiple Binary Subspace Regressi...
详细信息
Due to the ubiquitous existence of large-scale data in today's real-world applications, including learning on cross-media data, we propose a semi-supervised learning method, named Multiple Binary Subspace regression, for cross-media data concept detection. In order to mine the common features among the data with multiple modalities, we project the original cross-media data onto the same subspace-level representation simultaneously by mapping to the corresponding subspaces for dimensionality reduction. All the subspaces are set to be binary, which only involve the addition operations and omit the multiplication operations in the subsequent computation owing to the good property of the binary values. The dimensionality reduction to a binary subspace and the concept detection on this subspace are also optimized simultaneously leading to a semi-supervised model. For dealing with large-scale data, our learning method is easily implemented to run in a MapReduce-based Hadoop system. Empirical studies demonstrate its competitive performance on convergence, efficiency, and scalability in comparison with the state-ofthe- art literature.
Due to the ubiquitous existence of large-scale data in today's real-world applications including learning on cross media data, we propose a semi-supervised learning method named Multiple Binary Subspace regression...
详细信息
ISBN:
(纸本)9781479947614
Due to the ubiquitous existence of large-scale data in today's real-world applications including learning on cross media data, we propose a semi-supervised learning method named Multiple Binary Subspace regression (MBSR) for cross media data classification. In order to mine the common features among the data with multiple modalities, we project the original cross-media data into the same low-rank representation simultaneously by mapping to the corresponding subspaces for dimension reduction. All the subspaces are set to be binary, which only involve the addition operations and omit the multiplication operations in the subsequent computation owing to the good property of the binary values. The dimension reduction to a binary subspace and the classification on this subspace are also optimized simultaneously leading to a semi-supervised model. For dealing with large-scale data, our learning method is easily implemented to run in a MapReduce-based Hadoop system. Empirical studies demonstrate its competitive performance on convergence, efficiency, and scalability in comparison with the state-of-the-art literature.
暂无评论