版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Concordia Univ CIISE Montreal PQ H3G 1M8 Canada Ericsson Res S-22362 Lund Sweden
出 版 物:《IEEE TRANSACTIONS ON MACHINE LEARNING IN COMMUNICATIONS AND NETWORKING》 (IEEE. Trans. Mach. Learn. Commun. Netw.)
年 卷 期:2025年第3卷
页 面:176-194页
核心收录:
主 题:Microservice architectures Anomaly detection Real-time systems Computational modeling Machine learning Data models Computer architecture Image edge detection Federated learning Servers distributed data federated learning microservice trace
摘 要:The complexity and dynamicity of microservice architectures in cloud environments present substantial challenges to the reliability and availability of the services built on these architectures. Therefore, effective anomaly detection is crucial to prevent impending failures and resolve them promptly. Distributed data analysis techniques based on machine learning (ML) have recently gained attention in detecting anomalies in microservice systems. ML-based anomaly detection techniques mostly require centralized data collection and processing, which may raise scalability and computational issues in practice. In this paper, we propose an Asynchronous Real-Time Federated Learning (ART-FL) approach for anomaly detection in cloud-based microservice systems. In our approach, edge clients perform real-time learning with continuous streaming local data. At the edge clients, we model intra-service behaviors and inter-service dependencies in multi-source distributed data based on a Span Causal Graph (SCG) representation and train a model through a combination of Graph Neural Network (GNN) and Positive and Unlabeled (PU) learning. Our FL approach updates the global model in an asynchronous manner to achieve accurate and efficient anomaly detection, addressing computational overhead across diverse edge clients, including those that experience delays. Our trace-driven evaluations indicate that the proposed method outperforms the state-of-the-art anomaly detection methods by 4% in terms of F-1 -score while meeting the given time efficiency and scalability requirements.