版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Guangdong Univ Technol Sch Comp Sci & Technol Guangzhou 510006 Peoples R China Multimedia Univ Fac Comp & Informat Cyberjaya 63100 Malaysia
出 版 物:《IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE》 (IEEE Trans. Emerging Topics Comp. Intell.)
年 卷 期:2025年第9卷第3期
页 面:2545-2556页
核心收录:
基 金:Foundation of the State Key Laboratory of Public Big Data [PBD2022-01] Guangdong Special Support Plan for Science and Technology Innovation Guangdong Provincial Key Laboratory of Power System Network Security [GPKLPSNS-2023-KF-04]
主 题:Training Data models Vectors Federated learning Servers Computational modeling Privacy Computational intelligence Speech recognition Optimization quantity imbalance category distribution heterogeneity
摘 要:Federated learning (FL) is an innovative privacy-preserving machine learning paradigm that enables clients to train a global model without sharing their local data. However, the coexistence of category distribution heterogeneity and quantity imbalance frequently occurs in real-world FL scenarios. On the one side, due to the category distribution heterogeneity, local models are optimized based on distinct local objectives, resulting in divergent optimization directions. On the other side, quantity imbalance in widely used uniform client sampling of FL may hinder the active participation of clients with larger datasets in model training, and potentially make the model get suboptimal performance. To tackle this, we propose a framework that incorporates heterogeneity-aware clustering and intra-cluster uniform data sampling. More precisely, we firstly do heterogeneity-aware clustering that performs clustering on clients based on category distribution vectors. Then, we implement intra-cluster uniform data sampling, where local data from each client within a cluster is randomly selected based on a predetermined probability. Furthermore, to address privacy concerns, we incorporate homomorphic encryption to protect clients category distribution vectors and sample sizes. Finally, the experimental results on multiple benchmark datasets demonstrate that the proposed framework validate the superiority of our approach.