咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Heterogeneity-Aware Clustering... 收藏

Heterogeneity-Aware Clustering and Intra-Cluster Uniform Data Sampling for Federated Learning

作     者:Chen, Jian Zhang, Peifeng Chen, Jiahui Lau, Terry Shue Chien 

作者机构:Guangdong Univ Technol Sch Comp Sci & Technol Guangzhou 510006 Peoples R China Multimedia Univ Fac Comp & Informat Cyberjaya 63100 Malaysia 

出 版 物:《IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE》 (IEEE Trans. Emerging Topics Comp. Intell.)

年 卷 期:2025年第9卷第3期

页      面:2545-2556页

核心收录:

基  金:Foundation of the State Key Laboratory of Public Big Data [PBD2022-01] Guangdong Special Support Plan for Science and Technology Innovation Guangdong Provincial Key Laboratory of Power System Network Security [GPKLPSNS-2023-KF-04] 

主  题:Training Data models Vectors Federated learning Servers Computational modeling Privacy Computational intelligence Speech recognition Optimization quantity imbalance category distribution heterogeneity 

摘      要:Federated learning (FL) is an innovative privacy-preserving machine learning paradigm that enables clients to train a global model without sharing their local data. However, the coexistence of category distribution heterogeneity and quantity imbalance frequently occurs in real-world FL scenarios. On the one side, due to the category distribution heterogeneity, local models are optimized based on distinct local objectives, resulting in divergent optimization directions. On the other side, quantity imbalance in widely used uniform client sampling of FL may hinder the active participation of clients with larger datasets in model training, and potentially make the model get suboptimal performance. To tackle this, we propose a framework that incorporates heterogeneity-aware clustering and intra-cluster uniform data sampling. More precisely, we firstly do heterogeneity-aware clustering that performs clustering on clients based on category distribution vectors. Then, we implement intra-cluster uniform data sampling, where local data from each client within a cluster is randomly selected based on a predetermined probability. Furthermore, to address privacy concerns, we incorporate homomorphic encryption to protect clients category distribution vectors and sample sizes. Finally, the experimental results on multiple benchmark datasets demonstrate that the proposed framework validate the superiority of our approach.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分