Recent years have witnessed increasing privacy concerns towards machine learning. To protect privacy in machine learning, federated learning has been proposed as a decentralized privacy-preserving framework where clie...
详细信息
ISBN:
(纸本)9781728190549
Recent years have witnessed increasing privacy concerns towards machine learning. To protect privacy in machine learning, federated learning has been proposed as a decentralized privacy-preserving framework where clients upload the parameters rather than private data. However, training a fair federated learning model in heterogeneous environments is still challenging. First, heterogeneous data distributions lead the global model fail to show high accuracy on all distributions. Second, the federated learning training process exposes and exacerbates potential biases in heterogeneous training data. Third, the local bias of each client can be propagated through parameter sharing, biasing the global model. In this work, we propose a two-stage fairness-aware federated learning framework (HeteroFair) to achieve fairness under heterogeneous data distributions. Initially, we introduce the fairness constraint to the loss function and propose a local adaptive weighting algorithm to adjust the proportion of the fairness constraint, achieving fair training in heterogeneous environments. Then, we present a fairness-aware aggregation reweighting algorithm that reduces the mismatch between local and global fairness to achieve fair federated learning. Extensive evaluation results demonstrate the effectiveness of our proposed framework in achieving fairness and high accuracy under heterogeneous data distributions.
Biological data often tend to have heterogeneous, discontinuous non-normal distributions. Statistical non-parametric tests, like the Mann-Whitney U-test or the extension for more than two samples, the Kruskal-Wallis t...
详细信息
Biological data often tend to have heterogeneous, discontinuous non-normal distributions. Statistical non-parametric tests, like the Mann-Whitney U-test or the extension for more than two samples, the Kruskal-Wallis test, are often used in these cases, although they assume certain preconditions which are often ignored. We developed a permutation test procedure that uses the ratio of the interquartile distances and the median differences of the original non-classified data to assess the properties of the real distribution more appropriately than the classical methods. We used this test on a heterogeneous, skewed biological data set on invertebrate dispersal and showed how different the reactions of the KruskalWallis test and the permutation approach are. We then evaluated the new testing procedure with reproducible data that were generated from the normal distribution. Here, we tested the influence of four different experimental trials on the new testing procedure in comparison to the Kruskal-Wallis test. These trials showed the impact of data that were varying in terms of (a) negative correlation between variances and means of the samples, (b) changing variances that were not correlated with the means of the samples, (c) constant variances and means, but different sample sizes and in trials (d) we evaluated the testing power of the new procedure. Due to the different test statistics, the permutation test reacted more sensibly to the data presented in trials (a) and c) and non-uniformly in trial (b). In the evaluation of the testing power, no significant differences between the Kruskal-Wallis test and the new permutation testing procedure could be detected. We consider this test to be an alternative for working on heterogeneousdata where the preconditions of the classical non-parametric tests are not met. (c) 2006 Elsevier Masson SAS. All rights reserved.
暂无评论